RELEASE NOTES FOR SNPsplit v0.3.1_dev (03 Nov 2016)
---------------------------------------------------

SNPsplit
========

Minor fixed for path to Samtools when checking files for BS.


SNPsplit_genome_preparation
===========================

Option --genome_build [NAME] should now work as intended (used to be --build only).

Added a new version of the genome preparation script that can deal with the latest version
of the VCF file for the old NCBIM37 genome build ("mgp.v2.snps.annot.reformat.vcf.gz").
The script is called "SNPsplit_genome_preparation_v2VCF" and may be found in the folder
"outdated_VCF_versions" on Github.


RELEASE NOTES FOR SNPsplit v0.3.1 (18 Jul 2016)
-----------------------------------------------

Added a new section to the SNPsplit manual explaining in more detail how the SNP filtering
step works and which lines and fields are relevant.


RELEASE NOTES FOR SNPsplit v0.3.0 (18 May 2016)
-----------------------------------------------

SNPsplit
========

Added the # of SNPs used for the allele-discrimination to the report file to
make it easier to spot errors.

Now removing CR and LF line endings when reading in the SNP file. For SNP annotation
files copied from a Windows machine we saw problems with no allele-specific reads for 
genome 2 at all which was due to the invisible \r character for the SNP call.

Changed sorting command for BAM files to also work with Samtools versions 1.3+.

The sorting report for single-end files is now also written to the report files.


SNPsplit_genome_preparation
===========================

Added whole new functionality to construct single- or dual-hybrid genomes starting from VCF
files which are obtainable from the Mouse Genomes Project
(http://www.sanger.ac.uk/science/data/mouse-genomes-project), here is a brief description of
what it does:

SNPsplit_genome_preparation is designed to read in a variant call files from the Mouse Genomes
Project (e.g. this latest file: ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v5.merged.snps_all.dbSNP142.vcf.gz)
and generate new genome versions where the strain SNPs are either incorporated into the new
genome (full sequence) or masked by the ambiguity nucleobase 'N' (N-masking).

SNPsplit_genome_preparation may be run in two different modes:

Single strain mode:
- The VCF file is read and filtered for high-confidence SNPs in the strain specified with strain
- The reference genome (given with --reference_genome <genome>) is read into memory, and the
  filtered high-confidence SNP positions are incorporated either as N-masking (default) or full
  sequence (option --full_sequence)

Dual strain mode:
- The VCF file is read and filtered for high-confidence SNPs in the strain specified with --strain <name>
- The reference genome (given with --reference_genome <genome>) is read into memory, and the filtered
  high-confidence SNP positions are incorporated as full sequence and optionally as N-masking
- The VCF file is read one more time and filtered for high-confidence SNPs in strain 2 specified with
   --strain2 <name>
- The filtered high-confidence SNP positions of strain 2 are incorporated as full sequence and
  optionally as N-masking
- The SNP information of strain and strain 2 relative to the reference genome build are compared,
  and a new Ref/SNP annotation is constructed whereby the new Ref/SNP information will be Strain/Strain2
  (and no longer the standard reference genome strain Black6 (C57BL/6J)) 6.The full genome sequence given
  with --strain <name> is read into memory, and the high-confidence SNP positions between Strain and Strain2
  are incorporated as full sequence and optionally as N-masking
- The resulting .fa files are ready to be indexed with your favourite aligner. Proved and tested aligners
  include Bowtie2, Tophat, STAR, Hisat2, HiCUP and Bismark. Please note that STAR and Hisat2 may require
  you to disable soft-clipping, please see the SNPsplit manual more details

Both the SNP filtering and the genome preparation write out little report files for record keeping


RELEASE NOTES FOR SNPsplit v0.2.0 (19 08 2015)
----------------------------------------------

The name of the SNP annotation file is now displayed on screen and written to
the report files.

Added new option --bisulfite. This assumes Bisulfite-Seq data processed with 
Bismark (www.bioinformatics.babraham.ac.uk/projects/bismark/) as input. In
paired-end mode (--paired), Read 1 and Read 2 of a pair are expected to follow
each other in consecutive lines. SNPsplit will run a quick check at the start
of a run to see if the file provided file appears to be a Bismark file, and
set the flags --bisulfite and/or --paired automatically. In addition it will
perform a quick check to see if a paired-end file appears to have been positionally
sorted, and if not will set the --no_sort flag.

Reads having the unmapped FLAG set in the BAM/SAM file (0x4 bit) are now
skipped and excluded from the tagging and sorting process.

Improved file renaming settings when input file was in SAM format (no longer
deletes the input files..). Also changed renaming settings to only change .bam
at the end of reads.


RELEASE NOTES FOR SNPsplit v0.1.3 (04 Nov 2014)
----------------------------------------------

SNPsplit v0.1.3 is an initial beta release and as such is still a
work in progress. All basic functions are working.


Feedback or bugs can be asked to: felix.krueger@babraham.ac.uk