Sunday, August 14, 2011

Sorted recommender data

These are some images from experiments in sorting a ratings data model. This post has a few sorted versions of the GroupLens 1million sample database. Green and red are sequence values in the sorted output; they help visualize dense to sparse. Red is dense, green is sparse. 5% of men have red-green colorblindness.

Sorted by user:

Sorted by item:

Sorted by user and item:

Why is this interesting?
The lower left, red corner, has popular items. The upper right has the "long tail". I am a long tail guy; I don't really care about popular movies. Japanese female assassin movies, BBC comedy, Brazilian horror movies are on the list. An aquaintance received recommendations from Amazon for French Post-Structuralist literature (I don't know either) and pornographic comic books. "Someone finally understands me!"
A recommendation for me should be biased to the upper right. This sorting gives one algorithm to add to the pile.

Program:, and maybe some other things in this repository.

Visuals by KNime.

1 comment: