“People who looked at this thing, also looked at this stuff…”

We’ve had serendipity suggestions on the OPAC for nearly 7 years now, but they’ve been based entirely around the physical collection in the library.
After Friday’s Skype chat to the SPLURGE Hackfest, I got to thinking about how we can hook the e-stuff into the recommendations, so I’ve spent the weekend gathering data from our library management system, our link resolver and our EZProxy logs to see what happens if they all go into the same melting pot.
It’s a very rough & ready “crappy prototype”, but you can have a play around with it here. If you get an empty page, click on the “pick random item” link until something interesting happens.
At the moment, the recommendations are being built from a database of just over 5 million events (approx 70% of those are item loans and the rest are accesses of online journals). If you take the “Midwifery” journal as a starting point, you’ll get a list of the other books and journals that people have looked at. The algorithm behind it is the same one I’ve discussed previously.
If you hover over a title, you’ll see the usage info breakdown, e.g. “42 / 56” means that 56 different users in total have looked at the recommended item, and 42 of those also looked at the item we’re generating the recommendations for.
I’ve not done any de-duping, so you might get the same journal title being repeated (once for the print ISSN and once for the e-ISSN), and I’ve not included any ebook usage data yet. I’ve also avoided merging the two lists together until I can figure out a suitable way of weighting book loans against online journal usage.
Picking random items, it’s apparent that some courses lean more towards book borrowing (i.e. very few journal recommendations), whilst stundents studying other subjects are heavy online journal users (i.e. very few book recommendations).
So, what do you think — is it useful to be able to show more than just book recommendations to students?

8 thoughts on ““People who looked at this thing, also looked at this stuff…””

Dave Pattern says:

20 February 2012 at 9:51 pm
I’ve added in some of the databases from the EZProxy logs, so some items now include web site suggestions, e.g.
The research process in nursing. (6th edition, 2010)
Pingback: 阅读推荐算法 » 编目精灵III
Jonathan Rochkind says:

21 February 2012 at 10:49 pm
Absolutely. Especially if you mean “more than just print books”, with increasing ebook usage.
Where are you getting, or how are you tracking, your e-usage data?
Pingback: More "stuff like this"… – "Self-plagiarism is style"
Dave Pattern says:

22 February 2012 at 9:39 am
Hi Jonathan
We’re pushing most e-things through EZProxy (even for on-campus users), so most of the non-book usage data is coming from the EZProxy logs.
Jonathan Rochkind says:

22 February 2012 at 7:29 pm
Cool. EZProxy keeps sufficient information in it’s logs for you to know person X who accessed doc A also acessed doc B? Only over a single session, or are you actually logging auth credentials, so you know that a given access was on person X one day, and another access two weeks later was also by person X?
Normally, I’d be a bit reluctant to log electronic accesses tied to credentials, for privacy reasons. I don’t think EZProxy does that by default?
Jonathan Rochkind says:

22 February 2012 at 7:30 pm
For that matter, I’m not sure how you easily go from URLs in EZproxy logs to actual journal titles?
Very curious about the technical details here of how you’re getting this information.
Dave Pattern says:

23 February 2012 at 7:37 am
Hi Jonathan
That’s correct — our EZProxy web logs include the REMOTE_USER field. We primarily log this data as the cost of individual databases is split proportionaly by usage (i.e. if X% of usage of database Y is by students from academic school Z, then school Z contributes to the cost accordingly). Also, our use of JANET requires us to be able to account for our IP traffic.
To get the journal titles, I’m crudely parsing the URL field to locate ISSNs.

Comments are closed.