HiCUP (Hi-C User Pipeline)
| Function | A tool for mapping and performing quality control on Hi-C data |
|---|---|
| Language | Perl |
| Requirements | A functional version of Bowtie is
required. A functional verion of Perl. Unix-based operating system Gzip |
| Code Maturity | Beta - HiCUP is routinely being used to process real data, however it is still under active development. |
| Code Released | Yes, under GNU GPL v3 or later. |
| Initial Contact | Steven Wingett |
Download Now |
|
Hi-C, developed from 3C, identifies long-range genomic interactions. The Hi-C protocol involves formaldehyde-fixing cells to create DNA-protein bonds that cross-link interacting DNA loci. The DNA is then digested and ligated to generate a library of products that were spatially close to each other in the nucleus.
HiCUP is designed to take the raw sequence output from a HiC experiment and produce a filtered set of mapped interaction pairs, suitable for subsequent analysis. It will also produce a set of metrics which can be used to assess the quality of the data and help improve the construction of future libraries.
Main features:
- Identifies putative Hi-C junctions in sequence reads and truncates accordingly with a view to improving mapping efficiency
- Maps each read end independently using parameters suitable for Hi-C datasets
- Pairs forward and reverse reads for each di-tag, producing output
in SAM or BAM format
- Filters data to remove common artefacts e.g. di-tags where both reads map to the same restriction fragment
- Compatible with restriction enzyme/sonication or restriction enzyme-only protocols
Changelog
- 02-11-12: Version 0.3.0 released
-
- hicup_sorter removed from the pipeline
- The pipeline determines automatically the FASTQ format adopted if not specified
- The pipeline determines the path to Bowtie if not specified by the user. Also, fixed a bug affecting how HiCUP identifies the location of SAMtools
- Improved how the pipeline checks Bowtie indices have been specified correctly
- 03-08-12: Version 0.2.2 released
-
- The mapping process is now less memory intensive
- HiCUP can process files in a separate folder from the hicup.conf configuration file
- The hicup master script terminates immediately if another pipeline script dies
- 19-07-12: Version 0.2.1 released
-
- hicup_filter reports the number of read-pairs generated by circularized restriction fragments
- hicup_filter reports the absolute number of read-pairs by category
- 26-06-12: Version 0.2.0 released
-
- Added new script 'hicup_deduplicator' for removing duplicate di-tags
- hicup_mapper and hicup_pairer combined into a single script
- hicup_filter, when processing Hi-C data generated using the Hi-C sonication protocol, now rejects di-tags on the basis of size AFTER all other filters have been passed
- hicup_filter and hicup_deduplicator produce pie charts summarising the results
- hicup_filter modified so when following the sonication protocol it identifies and rejects di-tags containing re-ligated fragments, not simply those on adjacent fragments
- hicup_truncater now reports the average length of truncated sequences
- Fixed a bug causing hicup_digester to only process the last chromosome in a file containing multiple chromosomes
- 22-03-12: Version 0.1.1 released
-
- Initial release
- All basic functions working