We’ve recently been using the R tsne package to implement the TSNE plotting in SeqMonk, but with somewhat unexpectedly poor results, even from quite different datasets. I’ve now been made aware that there is also an Rtsne package, and that several people have reported problems with the tsne package which weren’t present in Rtnse so I wanted to test both performance and output from the two.
I have a smallish dataset (available from here) with big differneces in it so it should be really easy to separate.
setwd("C:/Users/andrewss/Desktop/Tsne_test/")
read.delim("tsne_data.txt") -> tsne.data
rownames(tsne.data) <- tsne.data$Probe
tsne.data[14:31] -> tsne.data
tsne.data
dim(tsne.data)
[1] 1510 18
PCA
As a point of comparison we’ll start with a simple PCA to see how well that does.
start.time <- proc.time()
prcomp(t(tsne.data)) -> pca.result
proc.time()-start.time
user system elapsed
0.01 0.00 0.03
We can plot this to see how well it separates.
plot(
pca.result$x[,1],
pca.result$x[,2],
pch=19, xlab="PCA PC1",
ylab="PCA PC2",
main="Separation with PCA",
col=as.factor(gsub("_.*","",colnames(tsne.data)))
)
text(pca.result$x[,1], pca.result$x[,2],gsub("_.*","",colnames(tsne.data)),pos=1, cex=0.7)

So this data separates ridiculously well (it was selected to), and works very quickly through PCA.
tsne
Now let’s try the same thing with TSNE.
We’ll try the tsne package first
library(tsne)
start.time <- proc.time()
tsne(t(tsne.data), perplexity = 5) -> tsne.result
proc.time()-start.time
user system elapsed
9.67 0.00 9.69
So for a pretty small dataset (18 samples and 1510 measures), it’s pretty slow (~300x slower than PCA).
Let’s plot the results to see if it looks sensible
plot(
tsne.result,
pch=19,
col=as.factor(gsub("_.*","",colnames(tsne.data))),
xlab="TSNE dim1",
ylab="TSNE dim2",
main="Separation with tsne package")
text(tsne.result[,1],tsne.result[,2],gsub("_.*","",colnames(tsne.data)),pos=1, cex=0.7)

Not exactly earth shattering separation.
Rtsne
Now lets try the Rtsne package
library(Rtsne)
start.time <- proc.time()
Rtsne(t(tsne.data), perplexity = 5) -> rtsne.result
proc.time()-start.time
Well that’s a whole lot faster (about 3X slower than PCA) - which seems a lot more reasonable. Let’s look at the separation
plot(
rtsne.result$Y,
col=as.factor(gsub("_.*","",colnames(tsne.data))),
pch=19,
xlab="TSNE dim1",
ylab="TSNE dim2",
main="Separation with the Rtsne package"
)
text(rtsne.result$Y[,1],rtsne.result$Y[,2],gsub("_.*","",colnames(tsne.data)),pos=1, cex=0.5)

Well that looks much more reasonable, so the slowness and poor separation in the tnse package isn’t inherent to the procedure, but are something to do with the implementation.
We’re not the only ones to spot this - there’s even a patch which fixed a very similar looking error which was submitted back in 2016 and is supposedly in v1.3 (the one I used here), but doesn’t seem to have actually fixed it (or wasn’t included).
Regardless of any bugs, the speed difference alone would mean that you should go for Rtsne over tsne, but it would be good if the tsne package could at least be fixed.
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 Rtsne_0.13 tsne_0.1-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 digest_0.6.12 rprojroot_1.2 jsonlite_1.3 backports_1.0.5 magrittr_1.5 evaluate_0.10
[8] stringi_1.1.3 rmarkdown_1.4 tools_3.3.1 stringr_1.2.0 yaml_2.1.14 base64enc_0.1-3 htmltools_0.3.5
[15] knitr_1.16
LS0tDQp0aXRsZTogIlRTTkUgdGVzdCINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDogZGVmYXVsdA0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KV2UndmUgcmVjZW50bHkgYmVlbiB1c2luZyB0aGUgUiB0c25lIHBhY2thZ2UgdG8gaW1wbGVtZW50IHRoZSBUU05FIHBsb3R0aW5nIGluIFtTZXFNb25rXShodHRwOi8vd3d3LmJpb2luZm9ybWF0aWNzLmJhYnJhaGFtLmFjLnVrL3Byb2plY3RzL3NlcW1vbmsvKSwgYnV0IHdpdGggc29tZXdoYXQgdW5leHBlY3RlZGx5IHBvb3IgcmVzdWx0cywgZXZlbiBmcm9tIHF1aXRlIGRpZmZlcmVudCBkYXRhc2V0cy4gSSd2ZSBub3cgYmVlbiBtYWRlIGF3YXJlIHRoYXQgdGhlcmUgaXMgYWxzbyBhbiBSdHNuZSBwYWNrYWdlLCBhbmQgdGhhdCBzZXZlcmFsIHBlb3BsZSBoYXZlIHJlcG9ydGVkIHByb2JsZW1zIHdpdGggdGhlIHRzbmUgcGFja2FnZSB3aGljaCB3ZXJlbid0IHByZXNlbnQgaW4gUnRuc2Ugc28gSSB3YW50ZWQgdG8gdGVzdCBib3RoIHBlcmZvcm1hbmNlIGFuZCBvdXRwdXQgZnJvbSB0aGUgdHdvLg0KDQpJIGhhdmUgYSBzbWFsbGlzaCBkYXRhc2V0IChhdmFpbGFibGUgZnJvbSBbaGVyZV0oaHR0cDovL3d3dy5iaW9pbmZvcm1hdGljcy5iYWJyYWhhbS5hYy51ay90c25lL3RzbmVfZGF0YS50eHQpKSB3aXRoIGJpZyBkaWZmZXJuZWNlcyBpbiBpdCBzbyBpdCBzaG91bGQgYmUgcmVhbGx5IGVhc3kgdG8gc2VwYXJhdGUuDQoNCmBgYHtyfQ0Kc2V0d2QoIkM6L1VzZXJzL2FuZHJld3NzL0Rlc2t0b3AvVHNuZV90ZXN0LyIpDQpyZWFkLmRlbGltKCJ0c25lX2RhdGEudHh0IikgLT4gdHNuZS5kYXRhDQoNCnJvd25hbWVzKHRzbmUuZGF0YSkgPC0gdHNuZS5kYXRhJFByb2JlDQp0c25lLmRhdGFbMTQ6MzFdIC0+IHRzbmUuZGF0YQ0KDQp0c25lLmRhdGENCmBgYA0KDQpgYGB7cn0NCmRpbSh0c25lLmRhdGEpDQpgYGANCg0KYGBge3IgZWNobz1GQUxTRX0NCmxpYnJhcnkoUkNvbG9yQnJld2VyKQ0KcGFsZXR0ZShicmV3ZXIucGFsKDksIlNldDEiKSkNCg0KYGBgDQoNCg0KUENBDQotLS0NCg0KQXMgYSBwb2ludCBvZiBjb21wYXJpc29uIHdlJ2xsIHN0YXJ0IHdpdGggYSBzaW1wbGUgUENBIHRvIHNlZSBob3cgd2VsbCB0aGF0IGRvZXMuDQoNCg0KYGBge3J9DQpzdGFydC50aW1lIDwtIHByb2MudGltZSgpDQpwcmNvbXAodCh0c25lLmRhdGEpKSAtPiBwY2EucmVzdWx0DQpwcm9jLnRpbWUoKS1zdGFydC50aW1lDQoNCmBgYA0KDQpXZSBjYW4gcGxvdCB0aGlzIHRvIHNlZSBob3cgd2VsbCBpdCBzZXBhcmF0ZXMuDQoNCmBgYHtyIGZpZy5oZWlnaHQ9NiwgZmlnLndpZHRoPTZ9DQoNCnBsb3QoDQogIHBjYS5yZXN1bHQkeFssMV0sIA0KICBwY2EucmVzdWx0JHhbLDJdLCANCiAgcGNoPTE5LCB4bGFiPSJQQ0EgUEMxIiwgDQogIHlsYWI9IlBDQSBQQzIiLCANCiAgbWFpbj0iU2VwYXJhdGlvbiB3aXRoIFBDQSIsDQogIGNvbD1hcy5mYWN0b3IoZ3N1YigiXy4qIiwiIixjb2xuYW1lcyh0c25lLmRhdGEpKSkNCiAgKQ0KdGV4dChwY2EucmVzdWx0JHhbLDFdLCBwY2EucmVzdWx0JHhbLDJdLGdzdWIoIl8uKiIsIiIsY29sbmFtZXModHNuZS5kYXRhKSkscG9zPTEsIGNleD0wLjcpDQoNCmBgYA0KDQpTbyB0aGlzIGRhdGEgc2VwYXJhdGVzIHJpZGljdWxvdXNseSB3ZWxsIChpdCB3YXMgc2VsZWN0ZWQgdG8pLCBhbmQgd29ya3MgdmVyeSBxdWlja2x5IHRocm91Z2ggUENBLg0KDQp0c25lDQotLS0tDQoNCk5vdyBsZXQncyB0cnkgdGhlIHNhbWUgdGhpbmcgd2l0aCBUU05FLg0KDQpXZSdsbCB0cnkgdGhlIHRzbmUgcGFja2FnZSBmaXJzdA0KDQpgYGB7ciB3YXJuaW5ncz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgZXJyb3I9RkFMU0V9DQpsaWJyYXJ5KHRzbmUpDQoNCnN0YXJ0LnRpbWUgPC0gcHJvYy50aW1lKCkNCnRzbmUodCh0c25lLmRhdGEpLCBwZXJwbGV4aXR5ID0gNSkgLT4gdHNuZS5yZXN1bHQNCnByb2MudGltZSgpLXN0YXJ0LnRpbWUNCg0KYGBgDQoNClNvIGZvciBhIHByZXR0eSBzbWFsbCBkYXRhc2V0ICgxOCBzYW1wbGVzIGFuZCAxNTEwIG1lYXN1cmVzKSwgaXQncyBwcmV0dHkgc2xvdyAofjMwMHggc2xvd2VyIHRoYW4gUENBKS4NCg0KTGV0J3MgcGxvdCB0aGUgcmVzdWx0cyB0byBzZWUgaWYgaXQgbG9va3Mgc2Vuc2libGUNCg0KYGBge3IgZmlnLmhlaWdodD02LCBmaWcud2lkdGg9Nn0NCg0KcGxvdCgNCiAgdHNuZS5yZXN1bHQsIA0KICBwY2g9MTksDQogIGNvbD1hcy5mYWN0b3IoZ3N1YigiXy4qIiwiIixjb2xuYW1lcyh0c25lLmRhdGEpKSksDQogIHhsYWI9IlRTTkUgZGltMSIsIA0KICB5bGFiPSJUU05FIGRpbTIiLCANCiAgbWFpbj0iU2VwYXJhdGlvbiB3aXRoIHRzbmUgcGFja2FnZSIpDQp0ZXh0KHRzbmUucmVzdWx0WywxXSx0c25lLnJlc3VsdFssMl0sZ3N1YigiXy4qIiwiIixjb2xuYW1lcyh0c25lLmRhdGEpKSxwb3M9MSwgY2V4PTAuNykNCg0KYGBgDQpOb3QgZXhhY3RseSBlYXJ0aCBzaGF0dGVyaW5nIHNlcGFyYXRpb24uDQoNCg0KUnRzbmUNCi0tLS0tDQoNCk5vdyBsZXRzIHRyeSB0aGUgUnRzbmUgcGFja2FnZQ0KDQpgYGB7cn0NCmxpYnJhcnkoUnRzbmUpDQoNCnN0YXJ0LnRpbWUgPC0gcHJvYy50aW1lKCkNClJ0c25lKHQodHNuZS5kYXRhKSwgcGVycGxleGl0eSA9IDUpIC0+IHJ0c25lLnJlc3VsdA0KcHJvYy50aW1lKCktc3RhcnQudGltZQ0KDQpgYGANCg0KV2VsbCB0aGF0J3MgYSB3aG9sZSBsb3QgZmFzdGVyIChhYm91dCAzWCBzbG93ZXIgdGhhbiBQQ0EpIC0gd2hpY2ggc2VlbXMgYSBsb3QgbW9yZSByZWFzb25hYmxlLiAgTGV0J3MgbG9vayBhdCB0aGUgc2VwYXJhdGlvbg0KDQpgYGB7ciBmaWcuaGVpZ2h0PTYsIGZpZy53aWR0aD02fQ0KDQpwbG90KA0KICBydHNuZS5yZXN1bHQkWSwNCiAgY29sPWFzLmZhY3Rvcihnc3ViKCJfLioiLCIiLGNvbG5hbWVzKHRzbmUuZGF0YSkpKSwNCiAgcGNoPTE5LCANCiAgeGxhYj0iVFNORSBkaW0xIiwgDQogIHlsYWI9IlRTTkUgZGltMiIsIA0KICBtYWluPSJTZXBhcmF0aW9uIHdpdGggdGhlIFJ0c25lIHBhY2thZ2UiDQogICkNCnRleHQocnRzbmUucmVzdWx0JFlbLDFdLHJ0c25lLnJlc3VsdCRZWywyXSxnc3ViKCJfLioiLCIiLGNvbG5hbWVzKHRzbmUuZGF0YSkpLHBvcz0xLCBjZXg9MC41KQ0KDQpgYGANCg0KV2VsbCB0aGF0IGxvb2tzIG11Y2ggbW9yZSByZWFzb25hYmxlLCBzbyB0aGUgc2xvd25lc3MgYW5kIHBvb3Igc2VwYXJhdGlvbiBpbiB0aGUgdG5zZSBwYWNrYWdlIGlzbid0IGluaGVyZW50IHRvIHRoZSBwcm9jZWR1cmUsIGJ1dCBhcmUgc29tZXRoaW5nIHRvIGRvIHdpdGggdGhlIGltcGxlbWVudGF0aW9uLg0KDQpXZSdyZSBub3QgdGhlIG9ubHkgb25lcyB0byBbc3BvdCB0aGlzXShodHRwczovL3d3dy5iaW9zdGFycy5vcmcvcC8yMTQ1OTEvKSAtIHRoZXJlJ3MgZXZlbiBhIFtwYXRjaF0oaHR0cHM6Ly9naXN0LmdpdGh1Yi5jb20vbWlrZWxvdmUvNzRiYmY1YzQxMDEwYWUxZGM5NDI4MWNmYWNlOTBkMzIpIHdoaWNoIGZpeGVkIGEgdmVyeSBzaW1pbGFyIGxvb2tpbmcgZXJyb3Igd2hpY2ggd2FzIHN1Ym1pdHRlZCBiYWNrIGluIDIwMTYgYW5kIGlzIHN1cHBvc2VkbHkgaW4gdjEuMyAodGhlIG9uZSBJIHVzZWQgaGVyZSksIGJ1dCBkb2Vzbid0IHNlZW0gdG8gaGF2ZSBhY3R1YWxseSBmaXhlZCBpdCAob3Igd2Fzbid0IGluY2x1ZGVkKS4NCg0KUmVnYXJkbGVzcyBvZiBhbnkgYnVncywgdGhlIHNwZWVkIGRpZmZlcmVuY2UgYWxvbmUgd291bGQgbWVhbiB0aGF0IHlvdSBzaG91bGQgZ28gZm9yIFJ0c25lIG92ZXIgdHNuZSwgYnV0IGl0IHdvdWxkIGJlIGdvb2QgaWYgdGhlIHRzbmUgcGFja2FnZSBjb3VsZCBhdCBsZWFzdCBiZSBmaXhlZC4NCg0KDQpgYGB7cn0NCnNlc3Npb25JbmZvKCkNCmBgYA0KDQo=