Monday, August 15, 2011

Singular vectors for recommendations

This is a project to research:
  1. Reproducing these results: http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
  2. Correlating the feature vectors and singular values from an SVD-based recommender to the generated vectors for users and items.
This was inspired by a lecture by one of the top 5 in the Netflix contest: the guy demonstrated axes of interest: chick flix v.s. Star Trek, Harry Potter v.s. Stanley Kubrick, etc. These clusters in the full item space are at the endpoints of vectors which can be realized from the feature vectors and singular values.

TestOpposites.java

This program and the following chart are my recreation of the raw data from the article above. BTJF are the original user/item values used to create the projection: Ben, Jeff, Tom, Fred. Bob, Love and Hate are Bob from the article; Love and Hate are users who love and hate all six seasons.

"Singular" and "Singular Div" are the first two feature vector columns of the SVD left-hand matrix. They are orthogonal. In the later chart, "Shifted data", we will use them to find "axes of interest" for the different items.

Raw data:
Shifted Data:
And now, the magic. The space of users is centered to 0. The two feature vectors are mirrored across 0,0, and the two orthogonal singular/feature vectors are downsized by their singular values. Normalized to add up to zero, the first singular value is 0.6 and the second is 0.25. In this chart, we take the original positions of the feature vectors and multiply them by 0.6 (yellow triangles) and 0.25 (red asterisks). And, I've drawn lines between the downsized versions. Now we have the 4 original users who established this space, three new users who are projected into this space, and the two major axes of interest.

Observations:
  1. The four original users (Billy-Bob, Trimolchio, Jenga and Ferdinand) all had somewhat orthogonal item ratings, and come out in an arc. One of them is far from the others on a large circle, and the feature vectors make sense given the "gravity" of the four users on the circle.
  2. Bob also had an item rating vector with the same style of pluses&minuses as BJTF, and appears at an expected place.
  3. The singular feature vectors (yellow triangles, red asterisks) do give "axes of interest" that make sense v.s. BTJF and Bob.
  4. Love and Hate are the nearest to the two ends of the dominant axis of interest. They are also nearly between the endpoints of the axes of interest.
Conclusions:
  • This technique gives two results:
    • It supplies axes of interest.
    • It allows a new user to describe himself based on the major axes of interest.
  • There is a fine yellow line between Love and Hate.

Sunday, August 14, 2011

Sorted recommender data

These are some images from experiments in sorting a ratings data model. This post has a few sorted versions of the GroupLens 1million sample database. Green and red are sequence values in the sorted output; they help visualize dense to sparse. Red is dense, green is sparse. 5% of men have red-green colorblindness.

Sorted by user:

Sorted by item:


Sorted by user and item:

Why is this interesting?
The lower left, red corner, has popular items. The upper right has the "long tail". I am a long tail guy; I don't really care about popular movies. Japanese female assassin movies, BBC comedy, Brazilian horror movies are on the list. An aquaintance received recommendations from Amazon for French Post-Structuralist literature (I don't know either) and pornographic comic books. "Someone finally understands me!"
A recommendation for me should be biased to the upper right. This sorting gives one algorithm to add to the pile.

Details
Program: SortDataModel.java, ModelComparator.java and maybe some other things in this repository.
https://github.com/LanceNorskog/LSH-Hadoop/blob/master/extras/mahout/test/java/org/apache/mahout/cf/taste/impl/common/SortDataModel.java

Visuals by KNime.

Koren on recommenders - comments about temporal changes in rating data

http://videolectures.net/kdd09_koren_cftd/

Talks about how the timestamps for watching/rating events are important. In general, items change slowly in rating values, while users have sudden changes in rating values.

Slides stolen:





Sunday, July 10, 2011

Dimensional Reduction via Random Projection, Part 4: Better Living Through Rotation!

Thanks to a tip from Ted Dunning, I have rotated the four distributions via factorizing the distributions.



Gaussian+1/-1


Sqrt3Linear

These look mighty funky. I've taken this as far as I can without being able to browse the item metadata (movie names).

Friday, July 1, 2011

Dimensional Reduction via Random Projection, Part 1: Huh?

This recounts a short research project in using random projection for visualization.

I started with vectors which represent items in a data model. (See my Semantic Vectors posts.) They are from 100k entries in the GroupLens film ratings dataset, representing ratings by 669 users on 3274 items. The 200 vectors are semi-randomly chosen from the item pool. All are 200 dimensions, dense. Euclidean distances signify nearness for Item-Item recommendation. The item-item distances give good ratings in the 150-200 dimension range, so it is reasonable to pick 200 dimensions as having rich support. The subset of item vectors have strong user-user bonds, so the recommendation structure is somewhat visible. This was a mistake, but I like it.

The project uses simple 2D visualization to decide whether random projection is as good as other, older methods of dimensional reduction. I used KNime's MDS implementation. MDS (Multi-Dimensional Scaling) is an older algorithm for creating a projection from a high dimensional space to a low dimensional, retaining pairwise distances. MDS is tuneable for the amount of information dropped.

Each test run uses the same 200 vectors in the same order. Each vector has the same color each time (from vivid green to black to vivid red).

The main axis of investigation is between three projection methods: complete Random Projection, partial Random Projection, and complete MDS. The following scatter charts show the vectors transformed to 2d vectors, and shown in a scatter-plot (X and Y are generated dimensions, color is the vector).
  1. Random projection from 200 dimensions to 2 dimensions.
  2. Random projection from 200 dimensions to 20 dimensions, then use MDS to boil down to 2 dimensions.
  3. Use MDS to convert from 200 dimensions to two dimensions.
1.
Full Random Projection from 200 dimensions to 2 dimensions (Gaussian)

2.
Random Projection from 200D to 20D, then MDS to 2D (Gaussian, KNime MDS implementation)

3.
Full MDS from 200d to 2d (KNime MDS Implementation)

Conclusions:
  • Using full random projection does not completely trash the data, but it's challenging to interpret.
  • Downsizing the data to 20 dimensions and then using MDS the rest of the way is somewhat cleaner, and good enough to work with. It's also faster.
  • Full MDS shows a strikingly clear picture of the item-item structure.
With later work it has become clear that these are the same projection, rotated. Later I would like to reduce to 3D and do animations of rotating these vector sets.

Onward to Part 2: there is more than one random projection.

I did these diagrams with the KNime visual programming app for data mining.  All Hail KNime!

Dimensional Reduction via Random Projection, Part 2: Distributions

There is more than one random distribution, and there are some surprises in store.

Achtioplas (2001) claims that random projection does not need fully distributed values: +1/-1 chosen with linear random suffices. Even better, a linear distribution of [0, 0, 0, 0, sqrt(3), -sqrt(3)] also works. This throws away 4 out of 6 input values.

This post explores applying these four different distributions to the same random projection. To recap, here is the best I've got: a full MDS projection from 200 dimensions to 2 dimensions.


Here are four versions of the same dataset, reducing 200 dimensions to 2 dimensions via random projection:



Gaussian+1/-1


Sqrt(3)Linear

The full MDS version is certainly the most pleasant to look at. The Gaussian and 2 distributions from Antiochplas all seem to be different rotations of a cylinder. The Linear distribution is useless in this situation.

Given these results, to do a quick visualization of your data, I would try all four random distributions; you may get lucky like I did with somewhat similar rotations. And, I recommend the colorizing trick; it really helps show what's going on here. After that, I would get a good dimensional reduction algorithm and do the 2-stage process:

        hi-dimensional -> RP -> low-d -> formal dimensional reduction -> 2d or 3d.

On to part 3 for a discussion of noise.

Achlioptas, 2001
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

PDF available online at various places:
I did these diagrams with the KNime visual programming app for data mining.  All Hail KNime!

    Dimensional Reduction via Random Projection, Part 3: Noise

    Now for a third axis of investigation: distances.

    Random Projection preserves pairwise distances; this is its claim to fame. To measure this, I created matrices of distances before and after "full RP": 200-d -> RP -> 2d. Here are spreads of distances: each color is that vector's distances: the X axis is the 200-d distances, and the Y axis is the 2d distances.


    Gaussian+1/-1






    Sqrt(3)Linear

    From tightest to loosest spreads, it's Linear, +1/-1, Sqrt(3) and Gaussian. Linear is so clean because it is overconstrained in this example. +1/-1 and Sqrt(3) are useable, and Gaussian looks like a bomb with smallpox. The +1/-1 and Sqrt(3) projectors look best for this case. If I wanted to work harder, I would compare the distances as matrices, find matrix norms, get standard deviations of the distances, etc. But for visualization, these worked well.

    I did these diagrams with the KNime visual programming app for data mining.  All Hail KNime!