Friday, July 1, 2011

Dimensional Reduction via Random Projection, Part 2: Distributions

There is more than one random distribution, and there are some surprises in store.

Achtioplas (2001) claims that random projection does not need fully distributed values: +1/-1 chosen with linear random suffices. Even better, a linear distribution of [0, 0, 0, 0, sqrt(3), -sqrt(3)] also works. This throws away 4 out of 6 input values.

This post explores applying these four different distributions to the same random projection. To recap, here is the best I've got: a full MDS projection from 200 dimensions to 2 dimensions.

Here are four versions of the same dataset, reducing 200 dimensions to 2 dimensions via random projection:



The full MDS version is certainly the most pleasant to look at. The Gaussian and 2 distributions from Antiochplas all seem to be different rotations of a cylinder. The Linear distribution is useless in this situation.

Given these results, to do a quick visualization of your data, I would try all four random distributions; you may get lucky like I did with somewhat similar rotations. And, I recommend the colorizing trick; it really helps show what's going on here. After that, I would get a good dimensional reduction algorithm and do the 2-stage process:

        hi-dimensional -> RP -> low-d -> formal dimensional reduction -> 2d or 3d.

On to part 3 for a discussion of noise.

Achlioptas, 2001
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

PDF available online at various places:
I did these diagrams with the KNime visual programming app for data mining.  All Hail KNime!

    No comments:

    Post a Comment