using “circ_tran” to show borrowing suggestions in HIP

One of the things we’re trying to do this year at Huddersfield is to make better use of our data archives:
…as each student goes through a library turnstile, data is written away…
…as each student borrows a book, more data is quietly written away…
…as each student uses an electronic resource, data is written away…
…as each student logs onto a PC, yet another piece of data is…
…okay, enough already – you get the idea!
We’re not particularly interested in what an individual student has done, but we’d like to see the broader pictures. For example, we open the Library 24/7 at certain times of the year (e.g. Easter) – we’d like to know more about the kinds of students who come in late at night and leave early the next morning:

  • are certain ethnic groups more likely to use the Library outside of the standard opening hours?
  • do we get more male or female students using the Library in the wee small hours?
  • are students coming in to use the computers, to issue/return items, or to sit quietly in a corner and study?

The answers to those kinds of questions tend to be found in several databases. The Sentry database tells us when someone entered the Library, but it doesn’t tell us if they are male or female, Asian or Caucasian – that kind is information is stored in the Student Records System. Also, the Sentry database doesn’t tell us what the student actually did – Circ transactions are in Horizon and PC usage info is stored in other databases.
So, long term we’re looking at ways of trying to combine data from all of those sources into meaningful and enlightening stats.
“What has this got to do with showing borrowing suggestions in HIP?”, I hear you ask!
Well, once I’d had a hunt around in our circ_tran table in Horizon, it seemed like a great use of all that historical Circ data would be to do an Amazon-like “patrons who borrowed this book also borrowed…”.
Before I proceed with the “how to”, I’ve got a hunch that not everyone has got a circ_tran table – it might be something that SirsiDynix needs to set for you, rather than a default table that ships with Horizon (can anyone confirm this?)
The circ_tran table contains (amongst other things) two very useful bits of information – the borrower# and the item# of the item they borrowed. You can use the item# to look up the bib# of that item (using the item table).
Once you’ve got the borrower# and bib#, you can use that to create two lists of data:

  • a list of all the bib#s that a specific borrower has ever borrowed
  • a list of all the borrower#s who have borrowed a specific bib#

To build the list of borrowing suggestions, you start with a bib# and:

  • 1) build the list of all the borrower#s who have borrowed that bib#
  • 2) for each of those borrower#s, compile all the bib#s of all the items they’ve borrowed to a single big list of bib#s
  • 3) take that big list and count how many times each bib# appears in the list
  • 4) sort your list of individual bib#s by the count of how many times they appear in the big list

…those bib#s that appear the most times in the big list are therefore the most appropriate ones to suggest.
Unfortunately those 4 steps can take some serious CPU time, so it’s not possible to do it on the fly as each of your patrons brings up a full bib page in HIP. Therefore, you need to pre-process each of your bib#s to generate a list of other suggested bib#s.
I wrote a Perl script this evening (which I’ll make available soon) that slurps up the entire circ_tran table into your PCs memory and then processes each of the bib#s to create up to 10 other suggested bib#s. Each of those suggestions is then pumped into a MySQL database where it will sit until a patron views that bib#s page in HIP.
A single line of JavaScript added to the fullnonmarcbib.xsl stylesheet then pulls in dynamic content from a Perl CGI script. That CGI script simply fetches the list of suggested bib#s from the MySQL database, quickly runs them via the title table in Horizon, and then displays a random selection of them underneath the copy/holding info:
click to view larger image
The only real drawback is that it’s not working with your circ_tran data in real time – the list of 10 possible suggestions per bib# won’t change until I run the slurping Perl script again to rebuild all of the suggestions. On our database of 2,046,180 circ_tran entries, that took about 3 hours to process. So, in theory, you could schedule it to run once a week or once a month.

15 thoughts on “using “circ_tran” to show borrowing suggestions in HIP”

  1. Okay, that is phenomenally cool, but in the US we have some thorny problems with keeping around any data that links circ transactions with borrowers — the biggest problem being a thing called a National Security Letter. An NSL can be issued at the request of an FBI agent, without the authorization of a judge or the review of a prosecutor. The NSL can oblige a library to give up circulation data like that upon presentation of the letter, and at the same time prohibit the library from informing anyone what has occurred. For this reason, many US libraries purge any records that could tie a circ transaction to a specific borrower pretty much as soon as the book has been returned and the blocks have been cleared. So can you think of any way to aggregate or abstract your data from the original source so you can still get the same valuable results, without maintaining any long-term linkages between bibs and borrowers?

  2. That’s a tough one Luke!
    There are several methods of turning a borrower number into something else (e.g. a MD5 hash), but none really fully erase the trail back to the original borrower.
    If you have a high circulation turnover, then the alternative would be to store each of the bibs of the items that a borrower checks out at the same time – e.g. if someone goes to the issue desk with 5 items, record those 5 bibs together. However, don’t store anything about the borrower – just record the bib numbers:
    1857,47265,901,11367,91375
    As each borrower checks items out, you’ll get strings of bibs to add to your database…
    857,37562,9582,901,1857 (5 items checked out)
    7464,1725 (2 items checked out)
    3874,58948,857 (3 items checked out)
    Then, to generate suggested items for bib# 901, you’d need to:
    1) search though all your strings of bibs and collate all the ones that contain a “901”:
    1857,47265,901,11367,91375
    857,37562,9582,901,1857
    2) count how many times each bib occurs in the collated list
    Then present the ones that occured the most as the suggested items (e.g. 1857).
    Think of it more as a “people who borrowed this item, also borrowed these items at the same time…”
    At the very worse, all you could deduce from the database would be that at some point in the past, someone borrowed these specific books at the same time. There would be nothing to help to work out who they were, when they borrowed those items, or if they ever borrowed again.

  3. I agree. This is very cool.
    I have two ideas on this subject:
    1) You have a trusted 3rd party (preferably in a country with no extradition treaty) “launder” the data — encrypt the borrower#’s with an encryption key that you don’t know, only they know. So borrower #123 will always get encrypted as “7xl3” or something. That way, you can keep circ history indefinitely without having it tied to the borrower at all. If you get visited by the feds, they have no way of decrypting the borrower#’s since you don’t have the key.
    2) Say you are allowed to keep circ history data for a relatively long period of time but not forever (90 days, say). Every 90 days, you encrypt the borrower#’s in the circ history with a one-time pad (basically, a completely random encryption key that is never reused so the only way to crack it is by brute force). So say 1st quarter of 2005, borrower #123 gets encrypted as “7xklj” and 2nd quarter of 2005, borrower #123 gets encrypted as “823w”. You have no way of knowing that “823w” is the same person as “7xklj” but you can still know that borrower 7xklj checked out so and so books in a 90 day period. Basically it would make it so you could figure out “borrowers who checked out x also checked out y within 90 days of each other” but again, there’s no way to decrypt any of the data older than 90 days — or make inferences between what happened 3-6 months ago and what happened 0-3 months ago.
    If you only keep circ history for a few days like we do, the method is basically equivalent to what Davey describes but much messier.
    Finally, you don’t have to collect any data yourself to offer such a service. You could just use Amazon Web Services.

  4. Great suggestions Casey 🙂
    If you’ve not seen what Amazon Web Services have to offer, then have a look at this sample XML output:
    http://www.daveyp.com/files/stuff/032112247X.xml
    …down at the bottom, there’s a section marked “SimilarProducts” – those are the ISBNs Amazon use for their “Customers who bought this book also bought”.
    All you need to do is to cross-reference the Amazon ISBNs with those in your database – if you get a match, display a link.
    Casey’s second suggestion would ideally suit an academic institution, as student borrowing trends at specific times in the year could partly depend upon the modules they were studying.
    If you were to view the student’s entire borrowing history (e.g. over 3 years), then it would contain (in part) an amalgamation of all of their module topics.
    However, sample the data every 90 days and you get a clearer selection of which titles to suggest to another borrower that will (potentially) be the most relevant to them at that moment in time.
    For example, if a student took a three year course in Java then you’d expect them to begin by borrowing books like “Dummies Guide to Java” and “Learn Java in 21 Days”. By the end of the course, they might be on to “Advanced Agile Java Development using Scrum”***
    Their entire borrowing history would be a mix of Java titles, whereas a 90 day sample will group together Java titles at a similar skill level.
    *** – After graduation, they will of course be rounded up by a press gang led by Jack Blount (wearing an eye patch and a suspicious looking parrot on one shoulder squawking “Pieces of 8.0! Pieces of 8.0!”)

  5. I now invenstigate to add a java script in the full bib view .
    Would you mind tell me where I can add the script in the fullnonmarcbib.xsl stylesheet ?
    Thanks

  6. Hi Alex
    I’ve added my code just above the section of the XSL file that starts with:
    <!--
    ************************************************
    Javascript
    ************************************************
    -->

Comments are closed.