Quality control plots

After importing and normalising your data it can be useful to make a quality control assesment to see how consistent your data is, and how well it has been normalised. ChIP Monk can draw two different kinds of plots. Each of these can be accessed by right clicking (or apple-clicking on a mac) on either a data set or a data group in the data viewer and selecting the appropriate option from the popup menu.

In addition to the plots ChIPMonk can also construct a data set correlation tree from your data sets so you can see how related they are to each other. This can be useful to see how consistent your replicates are, and to judge the relatedness of different sample groups. The data set tree can be created by selecting View > DataSet Tree.

If you want to assess the quality of your data using the available plots you should do this before you apply any per-probe normalisation as this will skew the data (per probe normalisation only adjusts the experiment channel). You should apply per-array normalisation before looking at plots though.

Scatter Plot

A scatter plot plots, on a log scale, the intensity in the experiment channel of an array with the intensity of the control channel. It should show a broad swathe of points running on a 45 degree angle from the bottom left of the plot to the top right.

A properly normalised array should show that the main body of points sits over the 45 degree line shown on the plot. The body of points shouldn't be too wide and should usually taper in at higher intensities (though scattered points of higher intensity are OK). There should be only 1 main body of points which should be straight (possibly with a slight curve at the low end of the instensity scale).

MA Plot

The MA plot is quite similar to the scatter plot, except that the data is transformed so that it sits on the horizontal axis. Again, the main body of points should be roughly symetrical around the x-axis, and the body of points should not spread too far from this axis.

DataSet Tree

The data set tree is a graphical representation of the degree of relatedness of all of your data sets. It is constructed based on a calculation of (Pearson's) correlation for all pairs of data sets. The correltaion is calculated using the currently active probe set so you might want to do some basic filtering before calculating the tree - however you shouldn't really apply any statistical filters as these will skew the results of the tree.

The distance between two sets on the tree is a function of the length of the horizontal path you have to take to move between them (both up and down the branches).

Normally you would expect that replicates of the same sample would cluster closely together. If this is not the case you should examine the plots of any outliers to assess whether the data may be of low quality, or incorrectly annotated.

Return to index