Thursday, April 2, 2009

The Boston College Libraries, LogiInfo and Contextual Delivery of Library Services


Once again, this is my first post in quite a long time. But I've been busy. I think I will keep a sporadic log here of the ways in which we develop new Web services at the Boston College Libraries, using a combination of LogiInfo, Java Web apps, Web services, Perl CGI, Ajax, perhaps some ASP.NET and, frankly, whatever else works.


We are calling this endeavor the Aerie Project. The purpose of the project is simple and open-ended: to create a framework/(dare I say it?) portal/dashboard/iLibrary to deliver online library services that take into account the student/faculty/staff member's context - and to decouple these services from the Aerie framework and reuse them in other environments to meet the library's overall service goals. Simple, right? Everyone's doing it. Web/Library 2.0 and all.


For me, one of the inspirations for this project was Lorcan Dempsy's 2003 article The Recombinant Library. In it, Dempsy presents an excellent distillation of the development of information "portals" and the problems they are meant to address, as well as some of the problems such portals themselves present. With much foresight, he muses on the possibilities presented by decoupling services from specific library systems and making them available in other contexts - for example, in course management systems or campus portals/intranets.


He says, "The major development issue facing libraries today is how to create a network environment which is rich in services and which meshes with user behavior in useful and convenient ways."


Indeed. That is it in a nutshell, and, despite the fact that Dempsy's article is more than five years old, this contention still holds true.


The difference, perhaps, is that we - people, not just librarians - are much, much more aware of the novel ways that decoupled, granular data services can be combined and reused. Think mashups, etc. When Dempsy was writing in 2003, Web Services were, of course, just emerging.


In any case, when I first read the article, the conceptual framework Dempsy provided really helped me to begin to think coherently (I hope) about possible ways to provide richer, more flexible, services that can be woven into user behavior more easily.


If you think about it, we - as a University Library - actually know a hell of a lot about our users. Not only do we know what books they have checked-out (now and in the past), we know their current schedule, their major(s)/minor(s), what degree(s) they are pursuing, where they are from, where they currently live, what events are going on in their department/major area of study. If I took the time to think some more I could probably come up with a lot more stuff we know (or can infer).


So, what do we do with all of this juicy info?


This is just the first set of thoughts and ideas. There is a lot more...

Thursday, August 28, 2008

Web 2.0 Expo, New York


Haven't posted in quite a while. I've been too busy at work and home to get time to write. BUT, I'm going to the Web 2.0 Expo in New York City (Yeeeeee Haa!), from Sept. 16-19 this year. It's a "gathering of technical, design, marketing, and business professionals who are building the next generation web." It should be a very inspiring gathering. The week will be filled with designers, developers, entrepreneurs, marketers, business strategists, venture capitalists, and, hopefully, a few people like me: trying to figure out how best to leverage the enormous potential of Web 2.0 for serious academic and research purposes. We'll see...

Friday, May 16, 2008

BibTip Recommender System: How it works


A Great Recommender System
And a Fantastic Application of Collective Intelligence


BibTip is a signal example of the potential inherent in harnessing collective intelligence to serve the needs of the library. BibTip uses Andrew Ehrehberg's "Repeat Buying Theory" as a framework to statistically analyze user search behavior. Repeat Buying Theory is a highly successful and well-tested statistical framework to describe the regularity of repeat-buying behavior of consumers within a distinct period of time.


The developers of BibTip at Karlsruhe University in Germany very skillfully adapted Andrew Ehrenberg's Repeat Buying Theory to the session-based search behavior of library OPAC users. The key is that BibTip only records the inspection of the full details of an individual bib record selected from a larger list of search results. It does not "follow" the user. In this framework, clicking-on and reading the full details of a given record is viewed as an economic choice. That is, the choice of one record over all of the others in a given list is very similar to an individual's choice to purchase one thing over another during a given trip to the store. There is a real cost in time (e.g. an economic cost) for the user each time he/she selects and views a record. It can be assumed that the "search cost" to a user is high enough that he/she is willing only to view the details of a record which is truly of interest. Users, in effect, are self-selecting. That is, users with common interests will select the same documents, and, since recommendations are only provided to users from the full details view, we can surmise that recommendations are only offered to interested users.


In order to build relationships among given documents, BibTip analyzes record pairs. For each record X that has been viewed in the full details view of the OPAC, a "purchase history" is built. This is simply a list of all of the sessions in which record X has been viewed. Record X is then compared with all other records (Y) which have been viewed in the same session as X. For each pair of records (X,Y) that have been viewed in the same session, a second purchase history is built. The number of users who have viewed record X and another record Y in the same session is statistically analyzed and the probability of a “co-inspection” of records X and Y in a given session is calculated. A recommendation for record X (That is, users who liked X also liked…) is created when record Y has been viewed more often in the same session that can be expected from random selections (in statistics, the recommended record would be an "outlier").


This “repeat buying theory” is remarkably good at automatically determining relevant recommendations for a given item. This is because the theory actually models the "noise" created by random clicks on records in a list of search results - that is, groups of records that are clicked-on, but which are not actually related. Diving into a record quickly and backing-out quickly falls well within the repeat-buying theory model. Recommendations are based upon those records that fall outside of regular random co-browsing - the outliers. To quote Dr. Geyer-Schulz (one of the developers of BibTip):

"Ehrenberg's theory faithfully models the noise part of buying processes. That is, repeat-buying theory is capable of predicting random co-purchases of consumer goods. Intentionally bought combinations of consumer goods--a six-pack of beer, spareribs, potatoes, and barbecue sauce for dinner, for example--are outliers. In this sense, Ehrenberg's theory acts as a filter to suppress noise (stochastic regularity) in buying behavior." [From: Andreas Geyer-Schulz, Andreas Neumann und Anke Thede. An Architecture for Behavior-Based Library Recommender Systems. Information Technology and Libraries 22(4), p.169 (2003).]

That is, *most* of the given transactions are noise. Search terms and strategies are irrelevant. The co-browsing of records that lies outside of the usual background noise is the browsing that needs to be examined for potential recommendations.

It takes some time for enough data to be collected so that good recommendations are available for a substantial part of a collection, but what is the hurry? Of course, the longer you have the algorithm running, the better your recommendations become. The more users you have, the better your recommendations become. But, time is on our side in this case ;-)


Frustratingly, for all the talk in library land about the features that should be included in next generation catalogs, I rarely find anything that convinces me that librarians understand that collecting/harvesting and re-using user (and usage) data is the key to most (if not all) of the services we want these new catalogs to provide. Without seriously thinking about the implications of harnessing collective intelligence – and taking steps *now* to build systems that do - we are not going to get very far. BibTip as a service is a big step in the right direction.

Wednesday, April 23, 2008

The Next Generation Legacy Library Catalog

I just got around to reading the 2007 Library Technology Report on Next Generation Catalogs. Frankly, it's really discouraging. Do we have any vision at all?!? Is this all we are asking our vendors for? In the report introduction, Marshall Breeding notes helpfully that
Each library or developer of library automation software has its own view of what constitutes a next-generation library catalog...Each of the next-generation catalogs or interfaces follows a unique approach. The common thread among these products involves a desire to go far beyond the capabilities of the legacy catalogs and give users more powerful and appealing tools. (p.14)

I see. As usual, we are discussing "Next-Generation" catalogs as if they should be local inventory systems/catalogs with the following tools: facets, relevancy ranking, federated searching, RSS feeds, "user contributions" (ratings, tags, rankings) and Did You Mean (spell-checking).

Aaargh! Where is the user in all of this stuff? How do we leverage these "next-gen" library applications to help us develop next-gen services? Do we really think that modern-looking interfaces and a bunch of tools lifted from Web 2.0 sites constitutes next-gen anything?!?

But wait! Wait just a minute! Here it is, the gem I've been looking for: Mr. Breeding notes - under the heading "Recommendations" (as in Amazon-like book recommender services) - that

A common feature, especially in the e-commerce arena, involves proactively providing information about related materials. Amazon.com, for example, sports a prominent recommendation feature: "Users that bought X also bought Y." ...The challenge in the library environment might involve finding the user-behavior data on which to base these associations for recommendation. (p.13)

Yes! This is exactly what we need! Real user-behavior data. If we can only "find" it. God, but where could we find it? Well, let me page forward in the report and see what everyone is doing to unearth this mysterious kind of data...

[page, page, page ... page ...]

Ummm...

[page ... page, page ...]

It must be here somewhere...

[page, page ...]

[page]

...

[page ...]

Saturday, April 19, 2008

Web 2.0 and Librarians: A Fundamental Misunderstanding

Alarmingly, I have yet to read in a library publication (or journal, or blog or whatever) anything that convinces me that librarians actually understand what exactly is fundamental about the myriad of so-called Web 2.0 technologies. Indeed, I am often unconvinced that the library writer has anything but the most superficial understanding of Web 2.0 and its implications for library services. Almost without exception, the articles and postings point-out something like [little air quotes] ...blah, blah, blah Web 2.0 is a model for Library 2.0 and tells us libraries should tailor services to meet the information needs and desires of users (is this news!?!?) and Web 2.0 technologies are good because - wow! check it out - they allow us to tailor services to individuals which is what we should be doing in the library and just look at Amazon and You Tube, people obviously like this kind of stuff. So, we need to add a blog to our web site and maybe some RSS feeds and a Wiki so users can tell us what they want and reviews of stuff would be great as well and presto we are Weblibrary 2.0.

But, the thing we should take away from Web 2.0 is not really a grab-bag of individual services we can use to de-crapify our Web sites. This is what many (if not most) librarians clearly fail to recognize. Web 2.0 is fundamentally about what underpins all of these neat new services and Web sites:


It's the data, stupid.


If there is a new library service model that is informed by Web 2.0
(Library 2.0, or whatever you want to call it this year) , it will have to be data driven. Let me say it again, so that the people in the back can hear:

ANY NEW LIBRARY SERVICE MODEL THAT IS TRULY INFORMED BY WEB 2.0 WILL HAVE TO BE DATA DRIVEN!!!


Library 2.0, in all its glorious vagueness, is not about data. That much is clear. Nonsense like the following drives me nuts:

While not required, technology can help libraries create a customer-driven, 2.0 environment. [what?!?] Web 2.0 technologies have played a significant role in our ability to keep up with the changing needs of library users. Technological advances in the past several years have enabled libraries to create new services that before were not possible, such as virtual reference, personalized OPAC interfaces, or downloadable media that library customers can use in the comfort of their own homes. This increase in available technologies gives libraries the ability to offer improved, customer-driven service opportunities. (Library 2.0: Service for the next-generation library)


How can we say with a straight face that technology is not required? Without technology - without data - how will acolytes of Library 2.0 divine the "changing needs of library users"? An online survey? How about Survey Monkey embedded in a Wiki? Oh, please.

If we want to create new library services based upon the Web 2.0 model, we need real user (and usage) data. This is something that libraries and existing library applications are notoriously bad at collecting and maintaining. But, it is the direction in which we must move.

We must really use the power of the web to harness collective intelligence. The best web applications do just this. Amazon, Flickr and Yahoo are obvious examples. These sites actively invite user participation. On virtually every page, users are given many options to contribute. Furthermore, users of these systems are able to create a "personality" through which they can engage other users. In most cases, they can build dynamic and detailed profiles, including nicknames, interests, demographic information and photos. More importantly, such systems track *all* user activity and leverage it to produce better search results and targeted services.
A truly revolutionary library system should involve users both explicitly - through reviews, tags, ratings, messages, etc - and implicitly, by aggregating user data as a side-effect of their use of the application.

If we want to make some exploitable inferences about user needs and user behavior, we need data. Web 2.0 technologies give us numerous examples of how to get user data, and the services a given Web site offers are built upon that data.

The data - not the service - is fundamental.

Friday, April 18, 2008

So, what is this all about? ...

...it's about what I think and believe about technology in libraries in particular and about other things in general (that is, whatever I decide at the moment). There are a few specific things in the air in Libraryland now that motivated me to start this blog:

1.) The suckitude of OPACs (Library Catalogs).
2.) The debate about the suckitude of OPACs.
3.) Buzzwords like Library 2.0.

4.) Google taking over the world (which, it turns out, might not be as bad as everyone once thought).
5.) The real need for libraries to re-conceptualize and build, from the ground up, a service-based technological infrastructure.
6.) The rise of systems like Primo, Endeca and Aquabrowser as a (partial) answer to the problems presented by OPACs and the proliferation of online resources in general.

I have some opinions on these matters, as you might guess...