The TSNE Plot

The TSNE plot is a dimensionality reduction technique which is a way to graphically simplify very large datasets. Within seqmonk it can be used to cluster data stores on the basis of the current quantitation across a large number of probes.

Conceptually the TSNE plot is similar to a PCA, but with some important differences:

  1. TSNE always produces a 2D separationk, in contrast to PCA which can produce many different components
  2. TSNE is non-deterministic, meaning you won't get exactly the same output each time you run it (though the results are likely to be similar
  3. TSNE tends to cope better with non-linear signals in your data, so odd outliers tend to have less of an effect, and often the visible separation between relevant groups is improved
  4. TSNE offers no ability to reverse engineer the groups it identifies so in contrast to PCA you can't make a probe list based on the separation you see


The TSNE plot will work on whichever data stores are currently displayed in the chromosome view and will use the currently selected probe list. TSNE tends to become very resource (both CPU and memory) hungry as the number of probes used increases so we'd recommend limiting the plot to cases where your number of probes is no more than a couple of thousand.

With the TSNE plot you can put your mouse over any individual point, which will then cause the name of that point to be drawn under it so you can tell which point is which. You can also tick the labels box to see all sample labels (though this might get a bit messy). There is also an option to highlight any replicate sets you've made in the project so you can see if groups of data stores which you would expect to cluster together actually behave that way in your data.