ACT Comparison Help


Background

With the completion of the human mouse genomes, scientists are looking to make large-scale comparisons of syntetic regions of these two, or other genomes.

Usually a comparison between two sequences would be made by making an alignment between them, and then reading this to see the level of conservation. Comparions of large stretches of genomic sequence are not amenable to this kind of analysis for several reasons.

  1. The size of sequences coming out of the genome sequencing projects (often over 1Mbase), are too large for traditional pairwise comparison programs to cope with.
  2. The increased size of sequences also means that a sequence level alignment is usually too big to be read by eye. A more compact way of summarising the conservation between sequences is required/
  3. Genomic sequences are often littered with repetitive sequences, which will be found in a highly conserved form all through the genome. These sequences confuse alignment algorithms, which will usually produce alignments which are sub-optimal (or just plain wrong) for sequences which contain them.
  4. Although corresponding chromosomal regions in different species may contain the same genes, they are not necessarily in the same linear order. Local rearrangements and inversions mean that a traditional sequence alignment view is not flexible enough to illustrate all of the relationships between genomic sequences.

ACT

To get around the problems previously described, the most common form of comparison for large genomic sequences is a synteny plot. This shows the two sequences, one above the other, and draws blocked segments between the regions which are idenified as being equivalent (syntenic) between the two.

Several programs have been developed to display synteny plots. One of the most common is the Artemis comparison tool (ACT) This is a development of Artemis one of the most popular genome browsers.

ACT allows you to view a synteny plot between two sequences. It also displays features within the sequeces, such as genes and repeats. The program allows you to zoom in and out of the seuqences to whatever degree you wish, and also allows you to set a significance cutoff for the syntenic regions it displays, so you can choose how divergent the matches you are seeing are allowed to be.

ACT does not perform a comparison of your sequences, it merely displays a comparison you have previously performed. The advantage of this is that it can take a long time to perform a synteny comparion and by displaying a pre-computed comparison, you only need to compare any two sequences once. ACT can then display the synteny plot very quickly, as often as you like.

Information about how to get ACT can be found here.

If you're interested in making this sort of comparison, but aren't sure where to start, then you could consider going on the Babraham ACT course

Performing the ACT comparison

This program is used to generate the comparison file which is used by ACT. It creates this file by performing several different functions.

  1. It reads in the files you wish to compare. Since ACT only understands EMBL or Genbank format, the files are usually in this form. The program converts them to fasta format for further processing.
  2. Both sequences are masked for repeats commonly found in whatever species they originated. The species must be selected from a list when specifying the sequences to be used.
  3. The two masked sequences are compared to each other using the blast program.
  4. The blast output is processed into the format which ACT expects.

Options

Sequences

You are required to supply two sequences for the comparison. Due to the likely sizes of such sequences there is no option to copy and paste them, but instead you must select them from a local file.

In theory the sequences can be in pretty much any format, but since these should be the same sequences you eventually use in ACT, they really should be in either EMBL or genbank format. Please note that ACT will not read sequences in GCG format.

Organism

Beacuse part of the analysis is masking the sequences for repeats it is necessary to know the origin of the sequences. Select the closest match from the options available (which are the default options from the repeatmasker program).

Output

Whilst the program is running you will be regulary updated on the progress of your job. You don't usually need to preserve this information. However, if your job dies then the log file should tell you why, and you can report any bugs to the bioinformatics group as necessary.

When the program has finished running you will be presented with three different files you can download. You also have some filtering options which are only applicable when you are downloading the comparison file.

Comparison File

The comparison file is the list of regions of synteny between your two sequences and is the only file you absolutely require to get ACT to work.

Optionally you can also apply a filter to the list of hits to restrict them on the basis of the length of match, or their percentage identity. You can interactively alter the stringency of matches displayed from within ACT, but the filter option may be useful if you know in advance that you have hard cutoff limits below which you aren't interested in any hits.

GFF files

In addition to generating the comparison file the program also allows you to download GFF (general feature format) files for both of your sequences which contain information about the position and type of repeats found in the sequences. These files can be imported into the ACT program, and will annotate the sequence view with the positions of the repeats which were removed before the comparison.