The analysis of overrepresented sequences will spot an increase in any exactly duplicated sequences, but there are a different subset of problems where it will not work.
This module counts the enrichment of every 5-mer within the sequence library. It calculates an expected level at which this k-mer should have been seen based on the base content of the library as a whole and then uses the actual count to calculate an observed/expected ratio for that k-mer. In addition to reporting a list of hits it will draw a graph for the top 6 hits to show the pattern of enrichment of that Kmer across the length of your reads. This will show if you have a general enrichment, or if there is a pattern of bias at different points over your read length.
Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module.
To allow this module to run in a reasonable time only 20% of the whole library is analysed and the results are extrapolated to the rest of the library.
This module will issue a warning if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position.
This module will issue a error if any k-mer is enriched more than 10 fold at any individual base position.