We’ve recently been using the R tsne package to implement the TSNE plotting in SeqMonk, but with somewhat unexpectedly poor results, even from quite different datasets. I’ve now been made aware that there is also an Rtsne package, and that several people have reported problems with the tsne package which weren’t present in Rtnse so I wanted to test both performance and output from the two.

I have a smallish dataset (available from here) with big differneces in it so it should be really easy to separate.

setwd("C:/Users/andrewss/Desktop/Tsne_test/")
read.delim("tsne_data.txt") -> tsne.data
rownames(tsne.data) <- tsne.data$Probe
tsne.data[14:31] -> tsne.data
tsne.data
dim(tsne.data)
[1] 1510   18

PCA

As a point of comparison we’ll start with a simple PCA to see how well that does.

start.time <- proc.time()
prcomp(t(tsne.data)) -> pca.result
proc.time()-start.time
   user  system elapsed 
   0.01    0.00    0.03 

We can plot this to see how well it separates.

plot(
  pca.result$x[,1], 
  pca.result$x[,2], 
  pch=19, xlab="PCA PC1", 
  ylab="PCA PC2", 
  main="Separation with PCA",
  col=as.factor(gsub("_.*","",colnames(tsne.data)))
  )
text(pca.result$x[,1], pca.result$x[,2],gsub("_.*","",colnames(tsne.data)),pos=1, cex=0.7)

So this data separates ridiculously well (it was selected to), and works very quickly through PCA.

tsne

Now let’s try the same thing with TSNE.

We’ll try the tsne package first

library(tsne)
start.time <- proc.time()
tsne(t(tsne.data), perplexity = 5) -> tsne.result
proc.time()-start.time
   user  system elapsed 
   9.67    0.00    9.69 

So for a pretty small dataset (18 samples and 1510 measures), it’s pretty slow (~300x slower than PCA).

Let’s plot the results to see if it looks sensible

plot(
  tsne.result, 
  pch=19,
  col=as.factor(gsub("_.*","",colnames(tsne.data))),
  xlab="TSNE dim1", 
  ylab="TSNE dim2", 
  main="Separation with tsne package")
text(tsne.result[,1],tsne.result[,2],gsub("_.*","",colnames(tsne.data)),pos=1, cex=0.7)

Not exactly earth shattering separation.

Rtsne

Now lets try the Rtsne package

library(Rtsne)

start.time <- proc.time()
Rtsne(t(tsne.data), perplexity = 5) -> rtsne.result
proc.time()-start.time

Well that’s a whole lot faster (about 3X slower than PCA) - which seems a lot more reasonable. Let’s look at the separation

plot(
  rtsne.result$Y,
  col=as.factor(gsub("_.*","",colnames(tsne.data))),
  pch=19, 
  xlab="TSNE dim1", 
  ylab="TSNE dim2", 
  main="Separation with the Rtsne package"
  )
text(rtsne.result$Y[,1],rtsne.result$Y[,2],gsub("_.*","",colnames(tsne.data)),pos=1, cex=0.5)

Well that looks much more reasonable, so the slowness and poor separation in the tnse package isn’t inherent to the procedure, but are something to do with the implementation.

We’re not the only ones to spot this - there’s even a patch which fixed a very similar looking error which was submitted back in 2016 and is supposedly in v1.3 (the one I used here), but doesn’t seem to have actually fixed it (or wasn’t included).

Regardless of any bugs, the speed difference alone would mean that you should go for Rtsne over tsne, but it would be good if the tsne package could at least be fixed.

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-2 Rtsne_0.13         tsne_0.1-3        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10    digest_0.6.12   rprojroot_1.2   jsonlite_1.3    backports_1.0.5 magrittr_1.5    evaluate_0.10  
 [8] stringi_1.1.3   rmarkdown_1.4   tools_3.3.1     stringr_1.2.0   yaml_2.1.14     base64enc_0.1-3 htmltools_0.3.5
[15] knitr_1.16     
LS0tDQp0aXRsZTogIlRTTkUgdGVzdCINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDogZGVmYXVsdA0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KV2UndmUgcmVjZW50bHkgYmVlbiB1c2luZyB0aGUgUiB0c25lIHBhY2thZ2UgdG8gaW1wbGVtZW50IHRoZSBUU05FIHBsb3R0aW5nIGluIFtTZXFNb25rXShodHRwOi8vd3d3LmJpb2luZm9ybWF0aWNzLmJhYnJhaGFtLmFjLnVrL3Byb2plY3RzL3NlcW1vbmsvKSwgYnV0IHdpdGggc29tZXdoYXQgdW5leHBlY3RlZGx5IHBvb3IgcmVzdWx0cywgZXZlbiBmcm9tIHF1aXRlIGRpZmZlcmVudCBkYXRhc2V0cy4gSSd2ZSBub3cgYmVlbiBtYWRlIGF3YXJlIHRoYXQgdGhlcmUgaXMgYWxzbyBhbiBSdHNuZSBwYWNrYWdlLCBhbmQgdGhhdCBzZXZlcmFsIHBlb3BsZSBoYXZlIHJlcG9ydGVkIHByb2JsZW1zIHdpdGggdGhlIHRzbmUgcGFja2FnZSB3aGljaCB3ZXJlbid0IHByZXNlbnQgaW4gUnRuc2Ugc28gSSB3YW50ZWQgdG8gdGVzdCBib3RoIHBlcmZvcm1hbmNlIGFuZCBvdXRwdXQgZnJvbSB0aGUgdHdvLg0KDQpJIGhhdmUgYSBzbWFsbGlzaCBkYXRhc2V0IChhdmFpbGFibGUgZnJvbSBbaGVyZV0oaHR0cDovL3d3dy5iaW9pbmZvcm1hdGljcy5iYWJyYWhhbS5hYy51ay90c25lL3RzbmVfZGF0YS50eHQpKSB3aXRoIGJpZyBkaWZmZXJuZWNlcyBpbiBpdCBzbyBpdCBzaG91bGQgYmUgcmVhbGx5IGVhc3kgdG8gc2VwYXJhdGUuDQoNCmBgYHtyfQ0Kc2V0d2QoIkM6L1VzZXJzL2FuZHJld3NzL0Rlc2t0b3AvVHNuZV90ZXN0LyIpDQpyZWFkLmRlbGltKCJ0c25lX2RhdGEudHh0IikgLT4gdHNuZS5kYXRhDQoNCnJvd25hbWVzKHRzbmUuZGF0YSkgPC0gdHNuZS5kYXRhJFByb2JlDQp0c25lLmRhdGFbMTQ6MzFdIC0+IHRzbmUuZGF0YQ0KDQp0c25lLmRhdGENCmBgYA0KDQpgYGB7cn0NCmRpbSh0c25lLmRhdGEpDQpgYGANCg0KYGBge3IgZWNobz1GQUxTRX0NCmxpYnJhcnkoUkNvbG9yQnJld2VyKQ0KcGFsZXR0ZShicmV3ZXIucGFsKDksIlNldDEiKSkNCg0KYGBgDQoNCg0KUENBDQotLS0NCg0KQXMgYSBwb2ludCBvZiBjb21wYXJpc29uIHdlJ2xsIHN0YXJ0IHdpdGggYSBzaW1wbGUgUENBIHRvIHNlZSBob3cgd2VsbCB0aGF0IGRvZXMuDQoNCg0KYGBge3J9DQpzdGFydC50aW1lIDwtIHByb2MudGltZSgpDQpwcmNvbXAodCh0c25lLmRhdGEpKSAtPiBwY2EucmVzdWx0DQpwcm9jLnRpbWUoKS1zdGFydC50aW1lDQoNCmBgYA0KDQpXZSBjYW4gcGxvdCB0aGlzIHRvIHNlZSBob3cgd2VsbCBpdCBzZXBhcmF0ZXMuDQoNCmBgYHtyIGZpZy5oZWlnaHQ9NiwgZmlnLndpZHRoPTZ9DQoNCnBsb3QoDQogIHBjYS5yZXN1bHQkeFssMV0sIA0KICBwY2EucmVzdWx0JHhbLDJdLCANCiAgcGNoPTE5LCB4bGFiPSJQQ0EgUEMxIiwgDQogIHlsYWI9IlBDQSBQQzIiLCANCiAgbWFpbj0iU2VwYXJhdGlvbiB3aXRoIFBDQSIsDQogIGNvbD1hcy5mYWN0b3IoZ3N1YigiXy4qIiwiIixjb2xuYW1lcyh0c25lLmRhdGEpKSkNCiAgKQ0KdGV4dChwY2EucmVzdWx0JHhbLDFdLCBwY2EucmVzdWx0JHhbLDJdLGdzdWIoIl8uKiIsIiIsY29sbmFtZXModHNuZS5kYXRhKSkscG9zPTEsIGNleD0wLjcpDQoNCmBgYA0KDQpTbyB0aGlzIGRhdGEgc2VwYXJhdGVzIHJpZGljdWxvdXNseSB3ZWxsIChpdCB3YXMgc2VsZWN0ZWQgdG8pLCBhbmQgd29ya3MgdmVyeSBxdWlja2x5IHRocm91Z2ggUENBLg0KDQp0c25lDQotLS0tDQoNCk5vdyBsZXQncyB0cnkgdGhlIHNhbWUgdGhpbmcgd2l0aCBUU05FLg0KDQpXZSdsbCB0cnkgdGhlIHRzbmUgcGFja2FnZSBmaXJzdA0KDQpgYGB7ciB3YXJuaW5ncz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgZXJyb3I9RkFMU0V9DQpsaWJyYXJ5KHRzbmUpDQoNCnN0YXJ0LnRpbWUgPC0gcHJvYy50aW1lKCkNCnRzbmUodCh0c25lLmRhdGEpLCBwZXJwbGV4aXR5ID0gNSkgLT4gdHNuZS5yZXN1bHQNCnByb2MudGltZSgpLXN0YXJ0LnRpbWUNCg0KYGBgDQoNClNvIGZvciBhIHByZXR0eSBzbWFsbCBkYXRhc2V0ICgxOCBzYW1wbGVzIGFuZCAxNTEwIG1lYXN1cmVzKSwgaXQncyBwcmV0dHkgc2xvdyAofjMwMHggc2xvd2VyIHRoYW4gUENBKS4NCg0KTGV0J3MgcGxvdCB0aGUgcmVzdWx0cyB0byBzZWUgaWYgaXQgbG9va3Mgc2Vuc2libGUNCg0KYGBge3IgZmlnLmhlaWdodD02LCBmaWcud2lkdGg9Nn0NCg0KcGxvdCgNCiAgdHNuZS5yZXN1bHQsIA0KICBwY2g9MTksDQogIGNvbD1hcy5mYWN0b3IoZ3N1YigiXy4qIiwiIixjb2xuYW1lcyh0c25lLmRhdGEpKSksDQogIHhsYWI9IlRTTkUgZGltMSIsIA0KICB5bGFiPSJUU05FIGRpbTIiLCANCiAgbWFpbj0iU2VwYXJhdGlvbiB3aXRoIHRzbmUgcGFja2FnZSIpDQp0ZXh0KHRzbmUucmVzdWx0WywxXSx0c25lLnJlc3VsdFssMl0sZ3N1YigiXy4qIiwiIixjb2xuYW1lcyh0c25lLmRhdGEpKSxwb3M9MSwgY2V4PTAuNykNCg0KYGBgDQpOb3QgZXhhY3RseSBlYXJ0aCBzaGF0dGVyaW5nIHNlcGFyYXRpb24uDQoNCg0KUnRzbmUNCi0tLS0tDQoNCk5vdyBsZXRzIHRyeSB0aGUgUnRzbmUgcGFja2FnZQ0KDQpgYGB7cn0NCmxpYnJhcnkoUnRzbmUpDQoNCnN0YXJ0LnRpbWUgPC0gcHJvYy50aW1lKCkNClJ0c25lKHQodHNuZS5kYXRhKSwgcGVycGxleGl0eSA9IDUpIC0+IHJ0c25lLnJlc3VsdA0KcHJvYy50aW1lKCktc3RhcnQudGltZQ0KDQpgYGANCg0KV2VsbCB0aGF0J3MgYSB3aG9sZSBsb3QgZmFzdGVyIChhYm91dCAzWCBzbG93ZXIgdGhhbiBQQ0EpIC0gd2hpY2ggc2VlbXMgYSBsb3QgbW9yZSByZWFzb25hYmxlLiAgTGV0J3MgbG9vayBhdCB0aGUgc2VwYXJhdGlvbg0KDQpgYGB7ciBmaWcuaGVpZ2h0PTYsIGZpZy53aWR0aD02fQ0KDQpwbG90KA0KICBydHNuZS5yZXN1bHQkWSwNCiAgY29sPWFzLmZhY3Rvcihnc3ViKCJfLioiLCIiLGNvbG5hbWVzKHRzbmUuZGF0YSkpKSwNCiAgcGNoPTE5LCANCiAgeGxhYj0iVFNORSBkaW0xIiwgDQogIHlsYWI9IlRTTkUgZGltMiIsIA0KICBtYWluPSJTZXBhcmF0aW9uIHdpdGggdGhlIFJ0c25lIHBhY2thZ2UiDQogICkNCnRleHQocnRzbmUucmVzdWx0JFlbLDFdLHJ0c25lLnJlc3VsdCRZWywyXSxnc3ViKCJfLioiLCIiLGNvbG5hbWVzKHRzbmUuZGF0YSkpLHBvcz0xLCBjZXg9MC41KQ0KDQpgYGANCg0KV2VsbCB0aGF0IGxvb2tzIG11Y2ggbW9yZSByZWFzb25hYmxlLCBzbyB0aGUgc2xvd25lc3MgYW5kIHBvb3Igc2VwYXJhdGlvbiBpbiB0aGUgdG5zZSBwYWNrYWdlIGlzbid0IGluaGVyZW50IHRvIHRoZSBwcm9jZWR1cmUsIGJ1dCBhcmUgc29tZXRoaW5nIHRvIGRvIHdpdGggdGhlIGltcGxlbWVudGF0aW9uLg0KDQpXZSdyZSBub3QgdGhlIG9ubHkgb25lcyB0byBbc3BvdCB0aGlzXShodHRwczovL3d3dy5iaW9zdGFycy5vcmcvcC8yMTQ1OTEvKSAtIHRoZXJlJ3MgZXZlbiBhIFtwYXRjaF0oaHR0cHM6Ly9naXN0LmdpdGh1Yi5jb20vbWlrZWxvdmUvNzRiYmY1YzQxMDEwYWUxZGM5NDI4MWNmYWNlOTBkMzIpIHdoaWNoIGZpeGVkIGEgdmVyeSBzaW1pbGFyIGxvb2tpbmcgZXJyb3Igd2hpY2ggd2FzIHN1Ym1pdHRlZCBiYWNrIGluIDIwMTYgYW5kIGlzIHN1cHBvc2VkbHkgaW4gdjEuMyAodGhlIG9uZSBJIHVzZWQgaGVyZSksIGJ1dCBkb2Vzbid0IHNlZW0gdG8gaGF2ZSBhY3R1YWxseSBmaXhlZCBpdCAob3Igd2Fzbid0IGluY2x1ZGVkKS4NCg0KUmVnYXJkbGVzcyBvZiBhbnkgYnVncywgdGhlIHNwZWVkIGRpZmZlcmVuY2UgYWxvbmUgd291bGQgbWVhbiB0aGF0IHlvdSBzaG91bGQgZ28gZm9yIFJ0c25lIG92ZXIgdHNuZSwgYnV0IGl0IHdvdWxkIGJlIGdvb2QgaWYgdGhlIHRzbmUgcGFja2FnZSBjb3VsZCBhdCBsZWFzdCBiZSBmaXhlZC4NCg0KDQpgYGB7cn0NCnNlc3Npb25JbmZvKCkNCmBgYA0KDQo=