FastQC
| Function | A quality control tool for high throughput sequence data. | 
|---|---|
| Language | Java | 
| Requirements | A suitable Java Runtime Environment
	     The Picard BAM/SAM Libraries (included in download)  | 
	  
| Code Maturity | Stable. Mature code, but feedback is appreciated. | 
| Code Released | Yes, under GPL v3 or later. | 
| Initial Contact | Simon Andrews | 
Download Now | 
	|
	
      
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
The main functions of FastQC are
- Import of data from BAM, SAM or FastQ files (any variant)
 - Providing a quick overview to tell you in which areas there may be problems
 - Summary graphs and tables to quickly assess your data
 - Export of results to an HTML based permanent report
 - Offline operation to allow automated generation of reports without running the interactive application
 
Documentation
A copy of the FastQC documentation is available for you to try before you buy (well download..).
Example Reports
- Good Illumina Data
 - Bad Illumina Data
 - Adapter dimer contaminated run
 - Small RNA with read-through adapter
 - Reduced Representation BS-Seq
 - PacBio
 - 454
 
Changelog
- 01-03-23: Version 0.12.0 released
 - 
	  
- Fix a bug in file type detection on OSX
 
 - 01-03-23: Version 0.12.0 released
 - 
	  
- Add total base count to basic stats
 - Add dup_length option to set the level of truncation for duplicate finding
 - Make default truncation length always 50bp
 - Removed the deduplicated duplication line from the duplicate plot
 - Improve memory handling and add a --memory option to the command line
 - Move BAM parsing to htsjdk
 - Make colours colourblind friendly
 - Generate SVG versions of graphs, and add a --svg option to use these in the report
 - Add line numbers to parsing errors
 - Change the default adapter sequences to search
 
 - 08-01-19: Version 0.11.9 released
 - 
	  
- Fixed a bug when analysing empty files
 - Added support for multi-read fast5 files
 - Fixed a corner case bug in adapter detection
 - Bundled a JRE with the OSX build so you don't have to install it
 - Fixed a hang if the program runs out of memory
 
 - 04-10-18: Version 0.11.8 released
 - 
	  
- Fixed a performance bug in highly duplicated sequences
 - Changed the behaviour of the sequence length module when run with --nogroup
 - Other minor bug fixes
 
 - 10-01-18: Version 0.11.7 released
 - 
	  
- Fixed a crash if the first sequence in a file was shorter than 12bp
 
 - 21-12-17: Version 0.11.6 released
 - 
	  
- Disabled the Kmer plot by default
 - Fixed a bug when long custom adapters were being used
 - Changed the tile number cutoff to accommodate the novaseq
 - Fixed various format changes in nanopore data from ONT
 - Added new Clontech sequences to the contaminant list
 - Added a --min-length option to remove short sequences
 - Added an option to specify the output name of data streamed into the program
 
 - 08-03-16: Version 0.11.5 released
 - 
	  
- Fixed the smallRNA adapter sequence so that abundance isn't under-represented in the adapter content plot
 - Fixed a bug in the warn / error code for the per-base sequence content plot
 - Fixed a typo in the documentation for the duplication plot
 
 - 09-10-15: Version 0.11.4 released
 - 
	  
- Changed the OSX launcher to not rely on the internal JVM framework, but use any command line java which is found
 - Fixed a typo in one of the adapter sequences
 - Fixed a bug which meant that some file extensions weren't removed from report names in non-interactive mode
 - Made the per-tile module not collect any stats if it's disabled in limits.txt
 - Fixed a bug in the calculation of duplication for highly duplicated, ordered files with very small numbers of sequences
 - Fixed an incorrect error flag in the per-base quality module where there were less than 100 observations in a read group
 
 - 25-3-15: Version 0.11.3 released
 - 
	  
- Fixed a bug when disabling the per-tile plot from limits.txt
 - Fixed a bug which caused the program to continue when processing of multiple files was actually complete
 - Fixed a bug which meant format selection in the interactive application didn't work
 - Added checks for mis-itentifying tile numbers in confusing sample ids
 - Added the SOLID smallRNA adapter to the standard search set
 - Fixed a bug when extracting casava names from uncompressed fastq files
 - Added support for processing files of Oxford Nanopore reads
 
 - 6-6-14: Version 0.11.2 released
 - 
	  
- Fixed incorrect warn/fail defaults for per-seq quality plot
 - Fixed memory leaks in Kmer and per-seq quality modules
 - Added an option to use a custom limits file
 - Fixed a bug in the naming of the folder inside the zip output file
 - Fixed a bug in the --extract option
 
 - 2-6-14: Version 0.11.1 released
 - 
	  
- Added configurable warn/fail thresholds for all modules
 - Allow modules to be selectively turned off
 - Added a per-tile quality plot for Illumina libraries
 - Added an adapter content plot
 - Improved the duplication plot
 - Improved the Kmer module
 - Used embedded graphics in the HTML output so you can distribute a single file
 - Added the ability to read data from stdin
 - Changed how base grouping works to better accommodate long reads
 - Dropped support for Solexa64 format (NB not Phred 64 which is still supported)
 
 - 3-5-12: Version 0.10.1 released
 - 
	  
- Added a workround to allow the analysis of concatenated gzipped files
 - Fixed a bug when FastQC was installed in a path containing characters needing to be escaped in a URL
 - Added an option to specify the location of the java interpreter on the command line
 
 - 9-9-11: Version 0.10.0 released
 - 
	  
- Added a Casava mode to sanely process the multiple fastq files produced by the latest illumina pipeline
 - Fixed a bug in Kmer analysis which missed of the last possible Kmer in each sequence
 - Fixed a classpath bug if using the wrapper script under windows
 
 - 31-8-11: Version 0.9.6 released
 - 
	  
- Fixed a crash in libraries where every sequence ended in poly-N
 - Fixed the launch wrapper to set the classpath correctly on OSX
 
 - 16-8-11: Version 0.9.5 released
 - 
	  
- Fixed a bug in text output for the per-base sequence content module
 - Made progress reporting absolute, and not approximate
 - Added a print CSS style so reports are printable again
 
 - 13-7-11: Version 0.9.4 released
 - 
	  
- Improved the error reporting for failed files in the offline application
 
 - 16-6-11: Version 0.9.3 released
 - 
	  
- Added support for bzip2 compressed fastq files
 - Added new CSS theme for HTML reports, contributed by Phil Ewels
 
 - 16-5-11: Version 0.9.2 released
 - 
	  
- Fixed a bug where grouped base numbers weren't reported in the per-base quality text report
 - Fixed a crash in the Kmer analysis when analysing small files
 
 - 30-3-11: Version 0.9.1 released
 - 
	  
- Added --quiet and --nogroup options to command line
 - Added encoding type to the basic stats
 - Added detection of Illumina <1.3 1.3 1.5 and 1.9 encodings
 
 - 10-2-11: Version 0.9.0 released
 - 
	  
- Added support for very long reads (esp 454 and PacBio)
 - Duplication detection now uses only the first 50bp of each read
 
 - 21-1-11: Version 0.8.0 released
 - 
	  
- Made all graphs easier to interpret
 - Added an option to analyse only mapped sequences from a BAM/SAM file
 - Added an option to analyse two or more files in parallel
 
 - 24-11-10: Version 0.7.2 released
 - 
	  
- Fixed bug when analysing libraries with no unique sequences
 - Added an option to specify a custom contaminant list on the command line
 
 - 24-11-10: Version 0.7.1 released
 - 
	  
- Improved the command line interface with proper options and error handling
 - Added an option to force the file format where guessing from the filename doesn't work
 
 - 27-10-10: Version 0.7.0 released
 - 
	  
- Added a Kmer enrichment analysis to find non-aligned enriched sequences
 - Cleaned up axis labels on all graphs
 
 - 27-10-10: Version 0.6.1 released
 - 
	  
- Fixed a bug which caused some sequences and qualities from BAM/SAM files to be reversed
 
 - 18-10-10: Version 0.6.0 released
 - 
	  
- Sequences can now be read from SAM/BAM format files
 - Added smoother lines to the graphs
 
 - 29-09-10: Version 0.5.1 released
 - 
	  
- Fixed a formatting bug in the text output
 - Fixed the %GC plot to work well with reads over 100bp
 - Improved the fitting of the modelled curve to the %GC plot
 - Added more illumina oligos to the contaminants file
 
 - 16-09-10: Version 0.5.0 released
 - 
	  
- Improved the fitting of the normal distribution to %GC plot
 - Calculated the total duplicated sequence % in the duplicate sequence module
 - Added pass/fail/warn icons next to each section of the HTML report
 - Put Icons and Images into subfolders in the HTML report
 
 - 30-07-10: Version 0.4.3 released
 - 
	  
- Fixed the reporting of sequence counts in the Basic Stats module
 - Added a warning before overwriting reports in the interactive application
 
 - 26-07-10: Version 0.4.2 released
 - 
	  
- Fixed y-axis scale on per-base quality plot
 - Added fail / warn checks to modules which lacked them and improved existing checks
 - Added a modelled distribtion to the per-sequence GC plot
 - Scale the width of report graphs for long sequence reads
 
 - 24-06-10: Version 0.4.1 released
 - 
	  
- Changed the duplicate module to reduce memory usage for long sequences
 - Changed the way duplicate levels are counted to be more realistic
 
 - 18-06-10: Version 0.4 released
 - 
	  
- Added a sequence duplication level module
 - Added a lauch wrapper for easier use from the commandline
 - Added full machine parsable output for integration into pipelines
 
 - 28-05-10: Version 0.3.1 released
 - 
	  
- Fixed a bug where invalid template files caused a crash
 - Non-interactive use now correctly reports progress for all files, not just the first one
 - Added some missing documentation
 
 - 13-05-10: Version 0.3 released
 - 
	  
- Added support for gzip compressed fastq files
 - Added identification of overrepresented sequences
 - Improved colorspace support
 - Added an option to save non-interactive reports to a specific directory
 
 - 06-05-10: Version 0.2 released
 - 
	  
- Added support for colorspace fastq files
 - Added templating support to allow customisation of HTML reports
 - Unzipped non-interactive reports by default, and added an option to turn this off
 - Added easily computer readable summary file to reports
 
 - 28-04-10: Version 0.1.1 released
 - 
	  
- Fixed a bug which prevented non-interactive use on a headless system
 
 - 26-04-10: Version 0.1 released
 - 
	  
- Initial set of 9 modules
 - Interactive and offline operation functional