DOSNES is a new method to visualize your data.

It is based on t-SNE with two improvements:

- Doubly Stochastic Normalization
- Spherical Embedding

## Why Doubly Stochastic Normalization?

t-SNE is the state-of-the-art method for data visualization. However, when the similarity graph in t-SNE has highly imbalanced node degrees, t-SNE might have an undesirable effect: the nodes of high degree are crowed in the center.

For example, in a co-author graph (NIPS), some authors have lots of papers (Michael Jordan has 93 papers) while most people only have a few. As a result, although productive authors such as Hinton and Scholkopf have never co-authored a paper, they are close in the plot, as below.

For another example, in a world trade dataset, some countries like USA and Germany have much more total imports/exports than many others. In the t-SNE visualizations, these countries are crowded in the center, as below.

A solution here is to apply doubly stochastic normalization to the similarity graph such that all the nodes have the same degree. Please see our paper below for more details.

## Why Spherical Embedding?

Sometimes we like to view the world map on a globe. Although a globe can be projected on a plane, this projection would introduce distortion.

- Roughly speaking, the embedding of a doubly stochastic similarity matrix is very likely to be spherical. Please see our paper below for the proofs.
- A sphere has no center or boundary on its surface. It eliminates center and boundary effects in visualization.
- Data visualization on spherical screens.

## Demo

## Paper

Doubly Stochastic Neighbor Embedding on SpheresYao Lu*, Zhirong Yang*, Jukka Corander

(*equal contribution)

## Code

https://github.com/yaolubrain/DOSNES

## Questions

If you have any question, feel free to contact us (yaolubrain@gmail.com). We would like to hear from you!