These are some images from experiments in sorting a ratings data model. This post has a few sorted versions of the GroupLens 1million sample database. Green and red are sequence values in the sorted output; they help visualize dense to sparse. Red is dense, green is sparse. 5% of men have red-green colorblindness.
Sorted by user:
Sorted by item:
Sorted by user and item:
Why is this interesting?
The lower left, red corner, has popular items. The upper right has the "long tail". I am a long tail guy; I don't really care about popular movies. Japanese female assassin movies, BBC comedy, Brazilian horror movies are on the list. An aquaintance received recommendations from Amazon for French Post-Structuralist literature (I don't know either) and pornographic comic books. "Someone finally understands me!"
A recommendation for me should be biased to the upper right. This sorting gives one algorithm to add to the pile.
Details
Program: SortDataModel.java, ModelComparator.java and maybe some other things in this repository.
https://github.com/LanceNorskog/LSH-Hadoop/blob/master/extras/mahout/test/java/org/apache/mahout/cf/taste/impl/common/SortDataModel.java
Visuals by KNime.
Yes, I got the columns wrong in the middle.
ReplyDelete