Return to index

Data Normalisation and Annotation

Normalisation

After importing your raw data the first thing you need to do is to apply a normalisation scheme to the data. Normalisation is simply a mathematical transformation of your data which ensures that you can accurately compare either channels on the same array, or different arrays in the same experiment.

To view and change the current normalisation settings select Data > Normalisation from the main menu. You will be given a default set of normalisation options when your data is first loaded but you should check that these are appropriate for your data.

There are two different types of normalisation, per array and per probe, and there may be different options under each of these headings.

Per Array Normalisation

The purpose of per array normalisation is to adjust each channel on your arrays so that they are comparable both to each other, but also to channels on other arrays.

Per array normalisation should be the first type of normalisation applied to your array data.

Global Normalisation

The simplest and most commonly used form of per array normalisation is the Global Normalisation. Measurements on arrays are quoted in intensity units. These are arbitary units and although the relative intensities between spots in a single channel are accurate the overall intensity of the whole channel can be affected by a number of factors.

Global normalisation applies a constant scaling factor to every measurement on an array channel so that they all have the same median intensity. Because a constant scaling factor is applied, the relative intensity between spots in the same channel is unchanged.

Global normalisation makes an assumption about your data. The assumption is that for the majority of spots on your array you expect to see a ratio of experiment:control of about 1. This assumption is somewhat dubious for ChIP on Chip data since in theory the experimental channel should contain just a small fraction of the control channel. However, in practice it seems that even the experimental channel shows some measure for most spots, with the small fraction being significantly enriched. Global normalisation therefore seems to work well with ChIP on Chip data.

Lowess Normalisation

A more advanced form of per array normalisation is Lowess normalisation. The difference between global normalisation and Lowess normalisation is that in a global normalisation the whole array is adjusted by a constant factor. In Lowess normalisation the correction factor changes depending on the intensity of the spot being corrected.

The reason a Lowess normalisation is useful has to do with the physical properties of the fluorescent dyes used on arrays. These do not have a completely linear relationship between concentration and intensity. This wouldn't matter too much, except that the dyes have slightly different intensity curves, particularly at low intensities. The upshot of all of this is that array data shows a dye bias at low intensities, which can't be corrected by global normalisation. Lowess normalisation can correct this dye bias to provide more accurate measurements at low intensities.

Per probe normalisation

Whilst per array normalisation is probably a good idea on all data, per probe normalisation is an optional step which is useful when you want to compare two different experimental conditions more easily.

In per probe normalisation the intensity of each probe is adjusted so that the median intensity for that probe over all arrays is 1. This makes it much easier to see the relative changes between conditions rather than looking at the absolute ratio of any single condition.

Per probe normalisation won't make any difference to any statistical analysis you perform, it's simply a different way of representing the same data.

Grouping

After normalising your data you should then create data groups to combine replicate arrays. You can create and edit groups by going to Data > Edit groups from the main menu.

You can create as many groups as you like from your data and a group can contain as many arrays as you like. A single array can be put into more than one group if this makes sense for your data.

Many of the statistical analyses you can apply to your data can only be applied to data groups so it is important that you spend the time to group your data after importing it.

Return to index