Jan 2, 2008

Current Recommender Types

There are a number of types of recommender systems currently available. They vary significantly in their mode of action and ultimate user experience. In terms of results, recommender systems are expected to offer sufficient good quality recommendations ('New Favorites'). In addition to this, the quality of the results is also dependent on minimizing false positives ('Trust Busters') and false negatives ('Missed Opportunities'). In other words, users should also not be shown inappropriate results and should not be denied appropriate results.

The quality of the user experience is also influenced by the time and effort required to give the recommender system enough information to make minimally reasonable recommendations. Users are sometimes asked to fill out lengthy questionnaires, or applications require that a user's history of choices or ratings be observed and recorded. It takes time and effort before things start working well. These days, users don't like to wait for anything and expect immediate gratification - delivering instant results upon quick registration is called 'cold start'. However, existing applications that permit a 'cold start' lack anything close to sufficient information, explicit or implicit, required to make accurate, high-quality recommendations.

There are a number of strategies that recommender systems are taking today. These include:

  1. Non-personalized: "Web 1.0" technology offering the highest rated or most popular items to all users. No intrinsic personalization, poor quality results, but immediate.
  2. Demographic: Require some knowledge about the user in order to group similar users together (i.e. by age, gender, area code, other similar features). Poor quality recommendations, low personalization, though slightly better than the above. May require "private" information, and depending on the length of the questionnaire, registration can take time.
  3. Simple answer or ratings matching: Matches users based on explicit matching of answers, selections, ratings, etc. Makes recommendations with extremely limited scope, many missed opportunities, requires answers or observations.
  4. Heuristics, probabilistic models (Bayesian, Markov), decision tree, neural net, etc. An application must collect a large amount of user-item preferences, or user/item features before quality recommendations are possble. This approach attempts to identify the underlying logic (or apply certain assumptions, in the case of heuristics) to a user's choices.
  5. User-based Collaborative Filtering: similarity of historical choices or actions allows the application to find highly correlated users. The assumption is that users who agreed in the past might tend to agree in the future. Limited immediate results, most items will not be rated/answered (sparsity). Users with non-typical opinions or taste (the 'long tail') may not get good recommendations.
  6. Item-based collaborative filtering: Finds items that tend to be preferred together. Limited immediate results, and users with non-typical opinions or taste may not get good recommendations.
  7. Content-Based: Find items with similar features (Keywords, author, genre, i.e. DNA) to known preferences of a user. Items must be properly and thoroughly represented as a set of features - this generally requires a large staff. Generally limited to a single domain as there may be few cross-domain features. Limited immediate results.

There are many recommendation engines and recommender applications available on the internet and many more seem to be popping up all the time. Currently they all have severe limitations and offer mediocre to poor quality results when compared to, say, recommendations by a best friend. Examples of current applications include:

  • eHarmony requires a very lengthy questionnaire and uses a proprietary empirical heuristic to match people romantically. It's success depends on the quality of the questions and the heuristic, the person's willingness to answer truthfully, and the person's willingness to spend a few hours to register. Mixed results are reported, but there is certainly an advantage over matchmaking sites that allow daters to make their own bad choices.
  • Pandora and Last.fm both recommend music though they do so in different ways. Pandora's large staff must determine the separable features ("DNA") of a song and observe a user's choices in order to extract common features of a user's preference. Last.fm seems to work by grouping users of similar taste. Both suffer from reduced choice diversity for slightly different reasons. Both are mildly satisfactory, but also suffer from excessive false negatives and false positives, and require recording your existing preferences. Two roommates using the same account will likely see poor results.
  • Amazon.com's recommendations work by observed a user's choices and activity and grouping items (books, CDs, DVDs, etc.) that tend to be chosen or viewed by the same users. After viewing or choosing items, you are presented with: "users who liked X (the currently viewed item) also liked Y (a correlated item). As may be considered a typical pattern, users who buy for multiple people, like for children or friends, will likely see poor results.
  • Social DNA sounds like it works similar to Pandora, but the granularity is significantly greater, and unlike eHarmony, there seems to be no heuristic - matching is all or nothing (i.e. explicit ratings and questions). This is expected to lead to extremely high false negatives, relatively few true positives, and, since matches will likely occur with only a tiny fraction of possible DNA (highly limited explicit information yeilds a sparse matrix), considering the complexity of human beings, mostly false positives.

In order to get relatively high quality and accurate recommendations, a large amount of explicit ratings/choices (and/or possibly implicit activity) must be recorded. This is extremely hard to do: users are less likely to maintain interest while the machine learns, and this will be increasingly true in the future. Currently, users must be content with mediocre results, but a trade-off will develop between accuracy/quality and user patience.

Another frequent limitation is that users can act maliciously or inappropriately to skew results. Due to the limitations of current applications, users may feel the need to modify or exaggerate their choices in order to get better results. On the other end, users who want to promote certain items to others may give or encourage false ratings, views or descriptions (called 'Shilling') through manual or automated efforts or attacks. Also, privacy becomes an issue as users may explicitly or implicitly reveal private information about themselves. Details include demographics, personal details, taste, ratings, opinions, etc. Systems administrators (and possibly hackers) will have free access to this data.

Accurate, high quality, robust and broad scope recommendations have been the holy grail for internet futurists for quite some time, though we are still a long way from that goal. The problem is largely technical: recommendations are a really tough problem. Mathematics/statistics, clever algorithms and artificial intelligence are stretching the results to the maximum, given the poor quality data available from users during registration or interaction with the application. The solution is to get high quality data about the user's identity or individuality and match based on that, rather than matching based on a user's history. The problem is that teaching the machine about the core identity of a person is science fiction. Or is it?

No comments: