Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data

FastQC

Function	A quality control tool for high throughput sequence data.
Language	Java
Requirements	A suitable Java Runtime Environment The Picard BAM/SAM Libraries (included in download)
Code Maturity	Stable. Mature code, but feedback is appreciated.
Code Released	Yes, under GPL v3 or later.
Initial Contact	Simon Andrews
Download Now

View our tutorial video

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

The main functions of FastQC are

Import of data from BAM, SAM or FastQ files (any variant)
Providing a quick overview to tell you in which areas there may be problems
Summary graphs and tables to quickly assess your data
Export of results to an HTML based permanent report
Offline operation to allow automated generation of reports without running the interactive application

Documentation

A copy of the FastQC documentation is available for you to try before you buy (well download..).

Example Reports

Changelog

01-03-23: Version 0.12.0 released
- Fix a bug in file type detection on OSX
01-03-23: Version 0.12.0 released
- Add total base count to basic stats
- Add dup_length option to set the level of truncation for duplicate finding
- Make default truncation length always 50bp
- Removed the deduplicated duplication line from the duplicate plot
- Improve memory handling and add a --memory option to the command line
- Move BAM parsing to htsjdk
- Make colours colourblind friendly
- Generate SVG versions of graphs, and add a --svg option to use these in the report
- Add line numbers to parsing errors
- Change the default adapter sequences to search
08-01-19: Version 0.11.9 released
- Fixed a bug when analysing empty files
- Added support for multi-read fast5 files
- Fixed a corner case bug in adapter detection
- Bundled a JRE with the OSX build so you don't have to install it
- Fixed a hang if the program runs out of memory
04-10-18: Version 0.11.8 released
- Fixed a performance bug in highly duplicated sequences
- Changed the behaviour of the sequence length module when run with --nogroup
- Other minor bug fixes
10-01-18: Version 0.11.7 released
- Fixed a crash if the first sequence in a file was shorter than 12bp
21-12-17: Version 0.11.6 released
- Disabled the Kmer plot by default
- Fixed a bug when long custom adapters were being used
- Changed the tile number cutoff to accommodate the novaseq
- Fixed various format changes in nanopore data from ONT
- Added new Clontech sequences to the contaminant list
- Added a --min-length option to remove short sequences
- Added an option to specify the output name of data streamed into the program
08-03-16: Version 0.11.5 released
- Fixed the smallRNA adapter sequence so that abundance isn't under-represented in the adapter content plot
- Fixed a bug in the warn / error code for the per-base sequence content plot
- Fixed a typo in the documentation for the duplication plot
09-10-15: Version 0.11.4 released
- Changed the OSX launcher to not rely on the internal JVM framework, but use any command line java which is found
- Fixed a typo in one of the adapter sequences
- Fixed a bug which meant that some file extensions weren't removed from report names in non-interactive mode
- Made the per-tile module not collect any stats if it's disabled in limits.txt
- Fixed a bug in the calculation of duplication for highly duplicated, ordered files with very small numbers of sequences
- Fixed an incorrect error flag in the per-base quality module where there were less than 100 observations in a read group
25-3-15: Version 0.11.3 released
- Fixed a bug when disabling the per-tile plot from limits.txt
- Fixed a bug which caused the program to continue when processing of multiple files was actually complete
- Fixed a bug which meant format selection in the interactive application didn't work
- Added checks for mis-itentifying tile numbers in confusing sample ids
- Added the SOLID smallRNA adapter to the standard search set
- Fixed a bug when extracting casava names from uncompressed fastq files
- Added support for processing files of Oxford Nanopore reads
6-6-14: Version 0.11.2 released
- Fixed incorrect warn/fail defaults for per-seq quality plot
- Fixed memory leaks in Kmer and per-seq quality modules
- Added an option to use a custom limits file
- Fixed a bug in the naming of the folder inside the zip output file
- Fixed a bug in the --extract option
2-6-14: Version 0.11.1 released
- Added configurable warn/fail thresholds for all modules
- Allow modules to be selectively turned off
- Added a per-tile quality plot for Illumina libraries
- Added an adapter content plot
- Improved the duplication plot
- Improved the Kmer module
- Used embedded graphics in the HTML output so you can distribute a single file
- Added the ability to read data from stdin
- Changed how base grouping works to better accommodate long reads
- Dropped support for Solexa64 format (NB not Phred 64 which is still supported)
3-5-12: Version 0.10.1 released
- Added a workround to allow the analysis of concatenated gzipped files
- Fixed a bug when FastQC was installed in a path containing characters needing to be escaped in a URL
- Added an option to specify the location of the java interpreter on the command line
9-9-11: Version 0.10.0 released
- Added a Casava mode to sanely process the multiple fastq files produced by the latest illumina pipeline
- Fixed a bug in Kmer analysis which missed of the last possible Kmer in each sequence
- Fixed a classpath bug if using the wrapper script under windows
31-8-11: Version 0.9.6 released
- Fixed a crash in libraries where every sequence ended in poly-N
- Fixed the launch wrapper to set the classpath correctly on OSX
16-8-11: Version 0.9.5 released
- Fixed a bug in text output for the per-base sequence content module
- Made progress reporting absolute, and not approximate
- Added a print CSS style so reports are printable again
13-7-11: Version 0.9.4 released
- Improved the error reporting for failed files in the offline application
16-6-11: Version 0.9.3 released
- Added support for bzip2 compressed fastq files
- Added new CSS theme for HTML reports, contributed by Phil Ewels
16-5-11: Version 0.9.2 released
- Fixed a bug where grouped base numbers weren't reported in the per-base quality text report
- Fixed a crash in the Kmer analysis when analysing small files
30-3-11: Version 0.9.1 released
- Added --quiet and --nogroup options to command line
- Added encoding type to the basic stats
- Added detection of Illumina <1.3 1.3 1.5 and 1.9 encodings
10-2-11: Version 0.9.0 released
- Added support for very long reads (esp 454 and PacBio)
- Duplication detection now uses only the first 50bp of each read
21-1-11: Version 0.8.0 released
- Made all graphs easier to interpret
- Added an option to analyse only mapped sequences from a BAM/SAM file
- Added an option to analyse two or more files in parallel
24-11-10: Version 0.7.2 released
- Fixed bug when analysing libraries with no unique sequences
- Added an option to specify a custom contaminant list on the command line
24-11-10: Version 0.7.1 released
- Improved the command line interface with proper options and error handling
- Added an option to force the file format where guessing from the filename doesn't work
27-10-10: Version 0.7.0 released
- Added a Kmer enrichment analysis to find non-aligned enriched sequences
- Cleaned up axis labels on all graphs
27-10-10: Version 0.6.1 released
- Fixed a bug which caused some sequences and qualities from BAM/SAM files to be reversed
18-10-10: Version 0.6.0 released
- Sequences can now be read from SAM/BAM format files
- Added smoother lines to the graphs
29-09-10: Version 0.5.1 released
- Fixed a formatting bug in the text output
- Fixed the %GC plot to work well with reads over 100bp
- Improved the fitting of the modelled curve to the %GC plot
- Added more illumina oligos to the contaminants file
16-09-10: Version 0.5.0 released
- Improved the fitting of the normal distribution to %GC plot
- Calculated the total duplicated sequence % in the duplicate sequence module
- Added pass/fail/warn icons next to each section of the HTML report
- Put Icons and Images into subfolders in the HTML report
30-07-10: Version 0.4.3 released
- Fixed the reporting of sequence counts in the Basic Stats module
- Added a warning before overwriting reports in the interactive application
26-07-10: Version 0.4.2 released
- Fixed y-axis scale on per-base quality plot
- Added fail / warn checks to modules which lacked them and improved existing checks
- Added a modelled distribtion to the per-sequence GC plot
- Scale the width of report graphs for long sequence reads
24-06-10: Version 0.4.1 released
- Changed the duplicate module to reduce memory usage for long sequences
- Changed the way duplicate levels are counted to be more realistic
18-06-10: Version 0.4 released
- Added a sequence duplication level module
- Added a lauch wrapper for easier use from the commandline
- Added full machine parsable output for integration into pipelines
28-05-10: Version 0.3.1 released
- Fixed a bug where invalid template files caused a crash
- Non-interactive use now correctly reports progress for all files, not just the first one
- Added some missing documentation
13-05-10: Version 0.3 released
- Added support for gzip compressed fastq files
- Added identification of overrepresented sequences
- Improved colorspace support
- Added an option to save non-interactive reports to a specific directory
06-05-10: Version 0.2 released
- Added support for colorspace fastq files
- Added templating support to allow customisation of HTML reports
- Unzipped non-interactive reports by default, and added an option to turn this off
- Added easily computer readable summary file to reports
28-04-10: Version 0.1.1 released
- Fixed a bug which prevented non-interactive use on a headless system
26-04-10: Version 0.1 released
- Initial set of 9 modules
- Interactive and offline operation functional

FastQC

Download Now

Documentation

Example Reports

Changelog