The Active transcription quantitation pipeline

The active transcription quantitation pipeline is very similar to the RNA-Seq quantitation pipeline except that it tries to specifically measure active transcription rather than the standing pool of mRNA. To do this it bases its counts on reads falling into introns rather than exons. All of the options for this pipeline mirror the corresponding options in the RNA-Seq pipeline.

RPKM Pipeline Options

The pipeline generates a set of probes covering every transcript in the genome, but then quantitates each of these based on the number of reads falling within the introns of those transcripts - ignoring any reads found in exons. This method of quantitation can only be achieved using this pipeline and cannot be performed using the standard probe generation and quantitation tools. Transcripts with no introns will be ignored by the pipeline.

The counts produced are by default corrected for the total number of sequences in the dataset. Also by default they are log transformed to form an easier distribution to work with. You can optionally correct for the length of the introns in the transcripts to produce RPKM values.

Options

The options you can set for this pipeline are:

  1. The feature type to use for this analysis. This will default to mRNA if an mRNA track exists in your genome, but can be changed to whichever transcript feature track is appropriate
  2. The type of library you are quantitating. Some RNA-Seq libraries are strand specific and in these cases the pipeline can ignore reads coming from the wrong strand. You can also choose between strand specific libraries which produce reads on the same strand as the feature or the opposing strand.
  3. Whether to merge the isoforms for each gene into a single measure. Using this option relies on the transcripts using the standard Ensembl notation of gene-xxx (where xxx is a number) to denote transcripts. The exons from all transcripts will be merged and a single probe will be generated over the full extent of the gene.
  4. Whether to generate raw counts. If you are using this option to generate data to put forward for statistical analysis by a count based method (eg DESeq, EdgeR etc), then you will need completely raw uncorrected counts. Selecting this option disables all correction and normalisation and produces raw read counts.
  5. Whether the results should be log2 transformed. Data analysis and visualisation of RNA-Seq data is often easier when performed on a log scale. If this option is selected then empty transcripts will be given a count of 0.9 bases (or 0.9 reads if read length correction is applied). This count is applied before read length or total read count correction is applied.
  6. Whether to correct for the length of each transcript. If this option is selected then the quantitated values are expressed per kilobase of introns. This option is only useful if you need to compare expression levels between multiple transcripts in the same sample. If you want to compare expression between different datasets then you should generally not select this option since it will cause the error profile for your data, which is generally correlated with the level of observation, to become confounded with the length of the introns - making it harder to accurately identify differentially expressed transcripts.