Friday, July 1, 2011

Dimensional Reduction via Random Projection, Part 3: Noise

Now for a third axis of investigation: distances.

Random Projection preserves pairwise distances; this is its claim to fame. To measure this, I created matrices of distances before and after "full RP": 200-d -> RP -> 2d. Here are spreads of distances: each color is that vector's distances: the X axis is the 200-d distances, and the Y axis is the 2d distances.



From tightest to loosest spreads, it's Linear, +1/-1, Sqrt(3) and Gaussian. Linear is so clean because it is overconstrained in this example. +1/-1 and Sqrt(3) are useable, and Gaussian looks like a bomb with smallpox. The +1/-1 and Sqrt(3) projectors look best for this case. If I wanted to work harder, I would compare the distances as matrices, find matrix norms, get standard deviations of the distances, etc. But for visualization, these worked well.

I did these diagrams with the KNime visual programming app for data mining.  All Hail KNime!

No comments:

Post a Comment