Recommendation?

This entry was an email from my tutor, received at 22:38 on the 16th of December. The presentation for this project was done at 15:00 on the same day, after which we discovered that we needed to begin fleshing out the nitty gritty details of our recommendation system.

It's not difficult to answer the question "what is a recommendation?" i.e., a recommendation is an item (of some sort) that is presented to a person for (a kind of) acceptance. If I recommend an action, and you accept the recommendation, you perform the action. Yay!

But what makes a good recommendation? This might also be easy: A good recommendation is one with a high probability of success (i.e., likely to be accepted). This is a very deflationary notion of goodness and one subject to some obvious issues, e.g., food recommendations which are likely to be accepted are also likely to be bad for you. "Goodness" obviously has many dimensions, but where all the possible recommendations are likely harmless or otherwise neutral, "likelihood of acceptance" seems reasonable.

Beware! This assumption is very very dangerous! (How?) But we'll accept it for now. It, at least, is very operational, that is, it's definite in terms of a clear metric. We can measure the goodness of our recommendations by counting how often they are accepted. (Though there are trickinesses in practice.) (What are some?)

Now, how do we generate good recommendations? That's a bit tricky! (I mean, it's a bit tricky from the start. Everything is tricky over the long run.) We need a property or rather a "feature" of the recommendations which is independent of the acceptibility which we can measure directly of the items. (Probability of acceptance is a relational property, not one that inheres in the item.)

Some candidate features

We don't have any. Sorry.

Erk!

Yes, it's a bit off. Features are a very "item type specific" sort of thing. E.g., for food, deliciousness or saltiness are good candidate features, but those are not very good features of candidate friends.

This is a problem if we want to have a general theory of recommendation (i.e., one that can apply to multiple domains).

A New Thought

Let's go back and consider why we didn't use acceptibility. The hypothesis is that it was because it was relational, i.e., acceptability is a relation between an item and a set of people. Which is fine! We can ask those people if they accept the item and determine its acceptibility.

Nothing wrong with that! It works great!

The problem is that we want to predict acceptibility. So asking a bunch of people whether they would accept an item doesn't really work! We want to know before we make the recommendation!

Moreover, in general, we don't want to know that a particular recommendation is acceptible to 70% of people…we want it to be the case that it is acceptible to the person to whom we gave the recommendation!

(If a friend recommends a movie and you go see it and it's a horror flick and you HATE horror flicks, it's not going to play with you if your ex-friend then says, "But it's got a 70% approval rating!!".)

We need individual specific recommendations with good predicted rates of acceptence.

This leads us to a relational approach, but between the target person and the potential recommended item.

The Similarity Theory of Recommendation

The slogan of this theory is:

People tend to like things that are similar to things they already like.

In the movie example, if I like a movie, presumably there are things about the movie that made me like it. We might not know what they are, but we don't need to! If another movie is similar enough to the first then it might well have enough of those features that made us like the first that we'll like it too!

And here is the beauty of the theory: It automatically tailors itself to the domain. (Weeeeeeeeeeeelll sort of. We still need to identify the features which determine similarity.)

We can use past choices to predict future choices! Hurrah!

The similarity theory applied to friends

BUT WAIT! If we follow this theory, you should be gathering information about existing friends and looking for people similar to them! Yikes!!!!

Of course, this isn't an issue, or rather, you've already done some feature/similarity engineering. I.e., you've determined that "being a student with similar interests" is the feature which predicts acceptance. Note that this is not a straight similarity theory for recommendation! (For that, we'd look at your interests and recommend similar interests.) But it's close enough.

(Although, it does suggest that maybe what you should be gathering is my friends interests :))

Two obvious problems

1) Both the similarity and modified similarity theories are dumb. I mean, really dumb.

They are super naive about psychology. To take an obvious example, if you just ate a donut, you might very much like that donut, but you probably don't want another one (maybe). After 10, certainly not. (Ok, maybe 30.) More to the point, people can get bored with the very same thing (or over-interested as anyone who's had to sit through 10,000 viewings of Frozen in a row while hanging out with a small child can attest). Many people have friendship networks with a range of interests and that diversity can be engaging.

Esp. at uni, when, for many people, the whole idea is to expand their experiences and preferences! Just as I don't want to just make friends from my home town or high school, I might well want to pursue new interests…interests I've never even heard of before.

2) Recommendations are bounded by the population of recommendable items.

Take Netflix (please). Its recommendations stink for me. They rarely turn up anything I want to see. Does their algorithm stink? Well, yes, but also there's not a lot I want to see in the Netflix library (and why am I still subscribed again?). The things closest to my interests in that library are pretty far away from what I want to see. With recommendations, it's not just relative distance that matters but absolute distance. "I want a hamburger." "The closest thing we have to that is steak tartare" "You mean…" "Yes. Raw beef. With raw egg. And capers. Plus it's mushy." "Check please!"

C'mon Bijan…this is just because you had to say nice things to us, right?

I must live my truth…which is to tell the truth :)

So, we're screwed?

Not at all! Just because similarity theories are dumb doesn't mean they aren't useful. They certainly are used. But they are extremely limited. Being aware of their limitations is helpful.

For example, when you look to explore other similarity metrics, you might consider using Euclidian distance in a different way rather than just subbing in Manhattan distance. Or, perhaps, you don't need vector analysis but might try something like, "Find all the people who are like me on my strongest interest and look if they have any other strong interest and recommend that interest or people who share it."

Hey! You switched partway from these thing being headers to them being a humor oriented QA format?!?

Yes, yes I did. And then got meta about it.

On to the algorithms!

That's my line!

Let's consider a similar interest-similarity recommendation algorithm:

Inputs:  DB = database of people and their weighted interests
             p = Person to give recommendations to
Output: list of recommended bumps (bumpers? bumpees?)

1: dists = list()
2: for each candidate, c,  in DB:
3:      dists.append((c, distance(p, c, db))) # distance() retrieves the interests and computes the distances
4: dists.sort()
5: return dists[0,9]  # i.e., the top 10; alternatively you could select all above a threshold, e.g., 90%

Note that this involves a linear number of distance computations (not quadratic!!!!). That's still pretty brutal for a large DB. Quadraticness comes in if you want to precompute the distance lists or if everyone asks for recommendations at the same time.

The other issue is that likes 2 and 3 requires pulling in every person in the database into python. If we do it person by person, it's slow (each db access is expensive). If we do it all at once, we have a copy of the whole database in memory (for each request!).

Ideally, we'd like to push stuff into the database and even there avoid iterating over all the data. Linear can kill you when the data gets big!

One simple thought is to store pairwise distances in the database. Sure, it blows up the data….but isn't that what databases are good at? And distance is symmetric so we only need to store one of distance(A,B) and distance(B,A). Our algorithm would change! We could just directly retrieve the top N (using limit) or all above a threshold (using a range query). It needs some care to avoid making the DB either have to iterate or compute large intermediate tables.

In this scenario, we'd do a linear scan each time we add a new person and save those distances. It means a delay after adding someone, but recommendations are (potentially) quick.

Wait…I have an idea…it's there…almost…

YOU'RE RIGHT!! If we already have distances maybe we can use that info! We could grab a random person and compute the distance between newbie and rando. If rando is near newbie, get all the people close to rando. If they are close to rando and I'm close to rando, they are probably pretty close to me!

Contrariwise, if rando is far from me, all the people close to rando are also far from me. (mostly)

Thus if I'm willing to sacrifice some accuracy and completeness, I might be able to save on computation.

THAT'S IT!!! WE'RE DONE?!

Well, probably not. This was only an initial idea and it's clear not perfect. It still seems to involve a lot of "database churn" on adding a new person. But it is suggestive.

You're 100% no fun at parties, are you, Bijan?

I bring so much cake.

You're procrastinating on something, aren't you, Bijan?

I'm sure I don't know what you're talking about.

Cheers, Bijan.