Return to index

Filters

Filters are the means by which you can reduce your full list of probes to just the subset in which you are interested. They create probe lists which appear in your data view.

All filters start by using your currently selected probe list and will create a new list which is a subset of the original list.

Simple Filters

Filter on values

The values filter allows you to set limits on the minimum and maximum values for the normalised ratio (expt/control), raw signal, raw control normalised signal or normalised control. You can select any number of data sets or groups and choose how many of those sets/groups the probe needs to pass the test in in order to be included in the output.

You can run the values filter against either each probe individually, or take a windowed approach where probes are grouped together over a distance you specify and the average value is used for the filter.

Filter on differences

The differences filter is very similar to the average filter except that you use the magnitude of the difference between data stores. You can select more than 2 data stores to work with and in each case every pairwise combination of stores is considered. You can choose to use the maximum, minimum or average difference from all the comparisons made.

Differences are always expressed as magnitudes (ie are always positive) since the selected stores have no intrinsic order, therefore a difference of +2 and -2 would both be considered as +2. When setting limits on the values you want to keep you must therefore always use positive values.

When filtering using the ratio value the differences filter insists that you use log2 ratio rather than the raw ratio. This is because unless you log transform a ratio you do not get even treatment of ratios above and below 1 (a +2 fold ratio has a difference of 2, a -2 fold ratio has a difference of -0.5, in log2 these values would be +1 and -1).

Filter by Position

The positional filter allows you to pick a region of a chromosome and to pull out only probes which lie in that region.

Filter by Features

The features filter allows you to select probes which have a defined position relative to a class of features, for example selecting probes which are within the first 1000bp upstream of an mRNA feature, or selecting probes which are within a CDS feature.

Combine Existing Lists

The combine filter allows you to select two probe lists from your existing set and choose what sort of relationship (AND / OR / BUTNOT) to enforce between them. This filter allows you to increase the power of the other filters by combining their results.

Statistical Tests

Multiple Testing Correction

Before discussing the various statistical tests available within ChipMonk it is important to raise the issue of multiple testing correction.

The results of most statistical tests is a p-value. This is a probability over the range 0-1 that a given observation could have occurred by chance. A p-value of 0.2 would mean that there was only a 1 in 5 chance of an observation occurring by chance.

However, p-values are only really accurate when only a single test is performed. A p-value of 0.05 is generally considered significant when only a single test is performed. Were you to perform 100 tests though then you would expect to see 5 observations with p<=0.05 just on random data.

In array analysis you are often performing a large number of tests (one for each probe), so a simple p-value cutoff is a poor indicator of significance.

To make p-values more relevant when performing large numbers of tests it is possible to correct the p-value to take into account the number of tests performed. There are a number of different algorithms which can do this, but the one implemented in ChipMonk is the Benjamini and Hochberg False Discovery Rate correction. This corrects the p-value so that it has a slightly different meaning.

A standard p-value is the likelyhood of a single observation occurring by chance. After a false discovery rate correction the p-value cutoff set is the proportion of the probes identified as changing which really are. So a p-value cutoff of 0.05 after correction will mean that 5% of the identified genes will be false positives, and 95% will be really changing.

Individual probes

The basic statistical filter can only be used on data groups containing at least 3 data sets. It treats each probe independently and calcualtes a p-value for how likely it is to be changing. If only one group is selected then a 1-factor t-test tests whether each probe's ratio deviates significantly from 1. If two groups are selected then a 2-factor t-test tests whether a probe is significantly different between the groups. If more than two groups are selected then an anova test is used to determine if there is any significant variation between any of the individual group means.

Windowed Mean

The windowed mean statistical filter is similiar to the simple statistical filter in that it operates only on data groups containing at least 3 data sets. However in this test, instead of considering each probe independently you can define a window of a fixed size. This window is then slid over the genome and at each point the probes which are found within it are averaged, and it is this average which is compared.

By combining the probes in this way a more consistent effect over the size of the window is required to achieve significance. It is therefore a more stringent and relevant test.

A sensible size for the window value would correspond to the minimum size of the binding effect you expect to see in your data plus about half of the average size of the fragments you're generating in your experiment.

Windowed Replicate

The windowed replicate filter takes a slightly different approach to the analsis of probes in a window. Instead of treating the individual probes as a single measure and averaging them, it treats them as technical replicates of the same measurement and uses the individual probe measures in the subsequent statistical test. This means that this test can be used on single data sets as well as data groups.

In summary the windowed mean filter averages the probes within the window of each individual data set, and uses the multiple data sets in a group to perform a statistical test. The replicate filter averages the intensities for each individual probe across the sets in a group (or just uses a single set) and uses the individual measures within a window in the statistical test.

Our experience has been that by combining the lists of probes seen as significant using both the windowed mean and windowed replicate t-tests produces a robust list of probes which are actually changing. If this approach is taken then the same window settings should be used for the two tests, and no other filters should be applied before running the windowed tests, although others could be applied afterwards.

Return to index