RELEASE NOTES FOR SNPsplit v0.3.1_dev (03 Nov 2016) --------------------------------------------------- SNPsplit ======== Minor fixed for path to Samtools when checking files for BS. SNPsplit_genome_preparation =========================== Option --genome_build [NAME] should now work as intended (used to be --build only). Added a new version of the genome preparation script that can deal with the latest version of the VCF file for the old NCBIM37 genome build ("mgp.v2.snps.annot.reformat.vcf.gz"). The script is called "SNPsplit_genome_preparation_v2VCF" and may be found in the folder "outdated_VCF_versions" on Github. RELEASE NOTES FOR SNPsplit v0.3.1 (18 Jul 2016) ----------------------------------------------- Added a new section to the SNPsplit manual explaining in more detail how the SNP filtering step works and which lines and fields are relevant. RELEASE NOTES FOR SNPsplit v0.3.0 (18 May 2016) ----------------------------------------------- SNPsplit ======== Added the # of SNPs used for the allele-discrimination to the report file to make it easier to spot errors. Now removing CR and LF line endings when reading in the SNP file. For SNP annotation files copied from a Windows machine we saw problems with no allele-specific reads for genome 2 at all which was due to the invisible \r character for the SNP call. Changed sorting command for BAM files to also work with Samtools versions 1.3+. The sorting report for single-end files is now also written to the report files. SNPsplit_genome_preparation =========================== Added whole new functionality to construct single- or dual-hybrid genomes starting from VCF files which are obtainable from the Mouse Genomes Project (http://www.sanger.ac.uk/science/data/mouse-genomes-project), here is a brief description of what it does: SNPsplit_genome_preparation is designed to read in a variant call files from the Mouse Genomes Project (e.g. this latest file: ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v5.merged.snps_all.dbSNP142.vcf.gz) and generate new genome versions where the strain SNPs are either incorporated into the new genome (full sequence) or masked by the ambiguity nucleobase 'N' (N-masking). SNPsplit_genome_preparation may be run in two different modes: Single strain mode: - The VCF file is read and filtered for high-confidence SNPs in the strain specified with strain - The reference genome (given with --reference_genome ) is read into memory, and the filtered high-confidence SNP positions are incorporated either as N-masking (default) or full sequence (option --full_sequence) Dual strain mode: - The VCF file is read and filtered for high-confidence SNPs in the strain specified with --strain - The reference genome (given with --reference_genome ) is read into memory, and the filtered high-confidence SNP positions are incorporated as full sequence and optionally as N-masking - The VCF file is read one more time and filtered for high-confidence SNPs in strain 2 specified with --strain2 - The filtered high-confidence SNP positions of strain 2 are incorporated as full sequence and optionally as N-masking - The SNP information of strain and strain 2 relative to the reference genome build are compared, and a new Ref/SNP annotation is constructed whereby the new Ref/SNP information will be Strain/Strain2 (and no longer the standard reference genome strain Black6 (C57BL/6J)) 6.The full genome sequence given with --strain is read into memory, and the high-confidence SNP positions between Strain and Strain2 are incorporated as full sequence and optionally as N-masking - The resulting .fa files are ready to be indexed with your favourite aligner. Proved and tested aligners include Bowtie2, Tophat, STAR, Hisat2, HiCUP and Bismark. Please note that STAR and Hisat2 may require you to disable soft-clipping, please see the SNPsplit manual more details Both the SNP filtering and the genome preparation write out little report files for record keeping RELEASE NOTES FOR SNPsplit v0.2.0 (19 08 2015) ---------------------------------------------- The name of the SNP annotation file is now displayed on screen and written to the report files. Added new option --bisulfite. This assumes Bisulfite-Seq data processed with Bismark (www.bioinformatics.babraham.ac.uk/projects/bismark/) as input. In paired-end mode (--paired), Read 1 and Read 2 of a pair are expected to follow each other in consecutive lines. SNPsplit will run a quick check at the start of a run to see if the file provided file appears to be a Bismark file, and set the flags --bisulfite and/or --paired automatically. In addition it will perform a quick check to see if a paired-end file appears to have been positionally sorted, and if not will set the --no_sort flag. Reads having the unmapped FLAG set in the BAM/SAM file (0x4 bit) are now skipped and excluded from the tagging and sorting process. Improved file renaming settings when input file was in SAM format (no longer deletes the input files..). Also changed renaming settings to only change .bam at the end of reads. RELEASE NOTES FOR SNPsplit v0.1.3 (04 Nov 2014) ---------------------------------------------- SNPsplit v0.1.3 is an initial beta release and as such is still a work in progress. All basic functions are working. Feedback or bugs can be asked to: felix.krueger@babraham.ac.uk