View on GitHub

DOSNES

Doubly Stochastic Neighbor Embedding on Spheres

DOSNES is a new method to visualize your data.

It is based on t-SNE with two improvements:

Why Doubly Stochastic Normalization?

t-SNE is the state-of-the-art method for data visualization. However, when the similarity graph in t-SNE has highly imbalanced node degrees, t-SNE might have an undesirable effect: the nodes of high degree are crowed in the center.

For example, in a co-author graph (NIPS), some authors have lots of papers (Michael Jordan has 93 papers) while most people only have a few. As a result, although productive authors such as Hinton and Scholkopf have never co-authored a paper, they are close in the plot, as below.

t-SNE 2D
t-SNE 3D

For another example, in a world trade dataset, some countries like USA and Germany have much more total imports/exports than many others. In the t-SNE visualizations, these countries are crowded in the center, as below.

t-SNE 2D
t-SNE 3D

A solution here is to apply doubly stochastic normalization to the similarity graph such that all the nodes have the same degree. Please see our paper below for more details.

Why Spherical Embedding?

Sometimes we like to view the world map on a globe. Although a globe can be projected on a plane, this projection would introduce distortion.

Demo

NIPS
WorldTrade
MIREX
MNIST
COIL 100
CURET

Paper

Doubly Stochastic Neighbor Embedding on Spheres

Yao Lu, Jukka Corander, Zhirong Yang
Pattern Recognition Letters, 2019

Code

https://github.com/yaolubrain/DOSNES

Questions

If you have any question, feel free to contact us (yaolubrain@gmail.com). We would like to hear from you!