The Data Driven Library: 2008

A Great Recommender System
And a Fantastic Application of Collective Intelligence

BibTip is a signal example of the potential inherent in harnessing collective intelligence to serve the needs of the library. BibTip uses Andrew Ehrehberg's "Repeat Buying Theory" as a framework to statistically analyze user search behavior. Repeat Buying Theory is a highly successful and well-tested statistical framework to describe the regularity of repeat-buying behavior of consumers within a distinct period of time.

The developers of BibTip at Karlsruhe University in Germany very skillfully adapted Andrew Ehrenberg's Repeat Buying Theory to the session-based search behavior of library OPAC users. The key is that BibTip only records the inspection of the full details of an individual bib record selected from a larger list of search results. It does not "follow" the user. In this framework, clicking-on and reading the full details of a given record is viewed as an economic choice. That is, the choice of one record over all of the others in a given list is very similar to an individual's choice to purchase one thing over another during a given trip to the store. There is a real cost in time (e.g. an economic cost) for the user each time he/she selects and views a record. It can be assumed that the "search cost" to a user is high enough that he/she is willing only to view the details of a record which is truly of interest. Users, in effect, are self-selecting. That is, users with common interests will select the same documents, and, since recommendations are only provided to users from the full details view, we can surmise that recommendations are only offered to interested users.

In order to build relationships among given documents, BibTip analyzes record pairs. For each record X that has been viewed in the full details view of the OPAC, a "purchase history" is built. This is simply a list of all of the sessions in which record X has been viewed. Record X is then compared with all other records (Y) which have been viewed in the same session as X. For each pair of records (X,Y) that have been viewed in the same session, a second purchase history is built. The number of users who have viewed record X and another record Y in the same session is statistically analyzed and the probability of a “co-inspection” of records X and Y in a given session is calculated. A recommendation for record X (That is, users who liked X also liked…) is created when record Y has been viewed more often in the same session that can be expected from random selections (in statistics, the recommended record would be an "outlier").

This “repeat buying theory” is remarkably good at automatically determining relevant recommendations for a given item. This is because the theory actually models the "noise" created by random clicks on records in a list of search results - that is, groups of records that are clicked-on, but which are not actually related. Diving into a record quickly and backing-out quickly falls well within the repeat-buying theory model. Recommendations are based upon those records that fall outside of regular random co-browsing - the outliers. To quote Dr. Geyer-Schulz (one of the developers of BibTip):

"Ehrenberg's theory faithfully models the noise part of buying processes. That is, repeat-buying theory is capable of predicting random co-purchases of consumer goods. Intentionally bought combinations of consumer goods--a six-pack of beer, spareribs, potatoes, and barbecue sauce for dinner, for example--are outliers. In this sense, Ehrenberg's theory acts as a filter to suppress noise (stochastic regularity) in buying behavior." [From: Andreas Geyer-Schulz, Andreas Neumann und Anke Thede. An Architecture for Behavior-Based Library Recommender Systems. Information Technology and Libraries 22(4), p.169 (2003).]

That is, *most* of the given transactions are noise. Search terms and strategies are irrelevant. The co-browsing of records that lies outside of the usual background noise is the browsing that needs to be examined for potential recommendations.

It takes some time for enough data to be collected so that good recommendations are available for a substantial part of a collection, but what is the hurry? Of course, the longer you have the algorithm running, the better your recommendations become. The more users you have, the better your recommendations become. But, time is on our side in this case ;-)

The Data Driven Library

Friday, May 16, 2008

BibTip Recommender System: How it works

Blog Archive

About Me