Monday, August 15, 2011

Singular vectors for recommendations

This is a project to research:
  1. Reproducing these results: http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
  2. Correlating the feature vectors and singular values from an SVD-based recommender to the generated vectors for users and items.
This was inspired by a lecture by one of the top 5 in the Netflix contest: the guy demonstrated axes of interest: chick flix v.s. Star Trek, Harry Potter v.s. Stanley Kubrick, etc. These clusters in the full item space are at the endpoints of vectors which can be realized from the feature vectors and singular values.

TestOpposites.java

This program and the following chart are my recreation of the raw data from the article above. BTJF are the original user/item values used to create the projection: Ben, Jeff, Tom, Fred. Bob, Love and Hate are Bob from the article; Love and Hate are users who love and hate all six seasons.

"Singular" and "Singular Div" are the first two feature vector columns of the SVD left-hand matrix. They are orthogonal. In the later chart, "Shifted data", we will use them to find "axes of interest" for the different items.

Raw data:
Shifted Data:
And now, the magic. The space of users is centered to 0. The two feature vectors are mirrored across 0,0, and the two orthogonal singular/feature vectors are downsized by their singular values. Normalized to add up to zero, the first singular value is 0.6 and the second is 0.25. In this chart, we take the original positions of the feature vectors and multiply them by 0.6 (yellow triangles) and 0.25 (red asterisks). And, I've drawn lines between the downsized versions. Now we have the 4 original users who established this space, three new users who are projected into this space, and the two major axes of interest.

Observations:
  1. The four original users (Billy-Bob, Trimolchio, Jenga and Ferdinand) all had somewhat orthogonal item ratings, and come out in an arc. One of them is far from the others on a large circle, and the feature vectors make sense given the "gravity" of the four users on the circle.
  2. Bob also had an item rating vector with the same style of pluses&minuses as BJTF, and appears at an expected place.
  3. The singular feature vectors (yellow triangles, red asterisks) do give "axes of interest" that make sense v.s. BTJF and Bob.
  4. Love and Hate are the nearest to the two ends of the dominant axis of interest. They are also nearly between the endpoints of the axes of interest.
Conclusions:
  • This technique gives two results:
    • It supplies axes of interest.
    • It allows a new user to describe himself based on the major axes of interest.
  • There is a fine yellow line between Love and Hate.

1 comment: