There is more than one random distribution, and there are some surprises in store.
Achtioplas (2001) claims that random projection does not need fully distributed values: +1/-1 chosen with linear random suffices. Even better, a linear distribution of [0, 0, 0, 0, sqrt(3), -sqrt(3)] also works. This throws away 4 out of 6 input values.
This post explores applying these four different distributions to the same random projection. To recap, here is the best I've got: a full MDS projection from 200 dimensions to 2 dimensions.
Here are four versions of the same dataset, reducing 200 dimensions to 2 dimensions via random projection:
|
|
Gaussian | +1/-1 |
|
|
Sqrt(3) | Linear |
The full MDS version is certainly the most pleasant to look at. The Gaussian and 2 distributions from Antiochplas all seem to be different rotations of a cylinder. The Linear distribution is useless in this situation.
Given these results, to do a quick visualization of your data, I would try all four random distributions; you may get lucky like I did with somewhat similar rotations. And, I recommend the colorizing trick; it really helps show what's going on here. After that, I would get a good dimensional reduction algorithm and do the 2-stage process:
hi-dimensional -> RP -> low-d -> formal dimensional reduction -> 2d or 3d.
On to part 3 for a discussion of noise.
Achlioptas, 2001
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
PDF available online at various places:
I did these diagrams with the KNime visual programming app for data mining.
All Hail KNime!