Getting recommendations out of nothing - Ania Warzecha¶

Tags: django, djangocon

Ania Warzecha researched recommendation systems. Recommendations means estimating ratings or preferences for items a user hasn’t seen yet. For example books or movies you might also like based on earlier purchases.

There are three kinds of recommendations.

Collaborative recommendations. Mostly created based on actions from other users. Which books are often bought together, for instance.

Simple to implement, but can be slow for big datasets. And doesn’t work well on new items and/or new users
Content-based recommendations. Looks for similar items.

Fast and accurate, but tends towards over-specifications regarding needed data.
Hybrid methods. Combining them.

A case study: a Polish car parts website. You normally don’t log in there, you just want a part. So older purchases aren’t available. They did have a lot of parts and data, so they started with content-based recommendations.

They mixed in some basic user actions. 0=didn’t buy, 1=browsed, 2=bought. Later on more elaborate, like points for items found through searching or items placed on wishlists.

They used Redis for its quick addition of user actions, simply pushing an additional score to an item which then gets added in the database.

One thing they needed to do was to merge session keys after a user logs in, merging the before-login session with the logged-in user’s session. They didn’t want to lose data collected till that point.

Now on to figuring out similar users. Common techniques are Euclidean distance, Pearson correlation and cosine similarity. But the problem was that it was slow. So they made an intermediary cache table in Redis.

Some conclusions:

Redis is good for fast storing and painless calculations.
Content-based recommendations are good for big datasets.
Keep all the data you can keep.

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):