"Self-plagiarism is style"

using "circ_tran" to show borrowing suggestions in HIP

17th November 2005

using "circ_tran" to show borrowing suggestions in HIP

posted in Horizon/HIP |

One of the things we're trying to do this year at Huddersfield is to make better use of our data archives:

…as each student goes through a library turnstile, data is written away…
…as each student borrows a book, more data is quietly written away…
…as each student uses an electronic resource, data is written away…
…as each student logs onto a PC, yet another piece of data is…

…okay, enough already - you get the idea!

We're not particularly interested in what an individual student has done, but we'd like to see the broader pictures. For example, we open the Library 24/7 at certain times of the year (e.g. Easter) - we'd like to know more about the kinds of students who come in late at night and leave early the next morning:

  • are certain ethnic groups more likely to use the Library outside of the standard opening hours?
  • do we get more male or female students using the Library in the wee small hours?
  • are students coming in to use the computers, to issue/return items, or to sit quietly in a corner and study?

The answers to those kinds of questions tend to be found in several databases. The Sentry database tells us when someone entered the Library, but it doesn't tell us if they are male or female, Asian or Caucasian - that kind is information is stored in the Student Records System. Also, the Sentry database doesn't tell us what the student actually did - Circ transactions are in Horizon and PC usage info is stored in other databases.

So, long term we're looking at ways of trying to combine data from all of those sources into meaningful and enlightening stats.

"What has this got to do with showing borrowing suggestions in HIP?", I hear you ask!

Well, once I'd had a hunt around in our circ_tran table in Horizon, it seemed like a great use of all that historical Circ data would be to do an Amazon-like "patrons who borrowed this book also borrowed…".

Before I proceed with the "how to", I've got a hunch that not everyone has got a circ_tran table - it might be something that SirsiDynix needs to set for you, rather than a default table that ships with Horizon (can anyone confirm this?)

The circ_tran table contains (amongst other things) two very useful bits of information - the borrower# and the item# of the item they borrowed. You can use the item# to look up the bib# of that item (using the item table).

Once you've got the borrower# and bib#, you can use that to create two lists of data:

  • a list of all the bib#s that a specific borrower has ever borrowed
  • a list of all the borrower#s who have borrowed a specific bib#

To build the list of borrowing suggestions, you start with a bib# and:

  • 1) build the list of all the borrower#s who have borrowed that bib#
  • 2) for each of those borrower#s, compile all the bib#s of all the items they've borrowed to a single big list of bib#s
  • 3) take that big list and count how many times each bib# appears in the list
  • 4) sort your list of individual bib#s by the count of how many times they appear in the big list

…those bib#s that appear the most times in the big list are therefore the most appropriate ones to suggest.

Unfortunately those 4 steps can take some serious CPU time, so it's not possible to do it on the fly as each of your patrons brings up a full bib page in HIP. Therefore, you need to pre-process each of your bib#s to generate a list of other suggested bib#s.

I wrote a Perl script this evening (which I'll make available soon) that slurps up the entire circ_tran table into your PCs memory and then processes each of the bib#s to create up to 10 other suggested bib#s. Each of those suggestions is then pumped into a MySQL database where it will sit until a patron views that bib#s page in HIP.

A single line of JavaScript added to the fullnonmarcbib.xsl stylesheet then pulls in dynamic content from a Perl CGI script. That CGI script simply fetches the list of suggested bib#s from the MySQL database, quickly runs them via the title table in Horizon, and then displays a random selection of them underneath the copy/holding info:

click to view larger image

The only real drawback is that it's not working with your circ_tran data in real time - the list of 10 possible suggestions per bib# won't change until I run the slurping Perl script again to rebuild all of the suggestions. On our database of 2,046,180 circ_tran entries, that took about 3 hours to process. So, in theory, you could schedule it to run once a week or once a month.

There are currently 11 responses to “using "circ_tran" to show borrowing suggestions in HIP”

Why not let us know what you think by adding your own comment! Your opinion is as valid as anyone elses, so come on... let us know what you think.

  1. 1 On November 17th, 2005, lukethelibrarian said:

    Okay, that is phenomenally cool, but in the US we have some thorny problems with keeping around any data that links circ transactions with borrowers — the biggest problem being a thing called a National Security Letter. An NSL can be issued at the request of an FBI agent, without the authorization of a judge or the review of a prosecutor. The NSL can oblige a library to give up circulation data like that upon presentation of the letter, and at the same time prohibit the library from informing anyone what has occurred. For this reason, many US libraries purge any records that could tie a circ transaction to a specific borrower pretty much as soon as the book has been returned and the blocks have been cleared. So can you think of any way to aggregate or abstract your data from the original source so you can still get the same valuable results, without maintaining any long-term linkages between bibs and borrowers?

  2. 2 On November 17th, 2005, Davey P said:

    That's a tough one Luke!

    There are several methods of turning a borrower number into something else (e.g. a MD5 hash), but none really fully erase the trail back to the original borrower.

    If you have a high circulation turnover, then the alternative would be to store each of the bibs of the items that a borrower checks out at the same time - e.g. if someone goes to the issue desk with 5 items, record those 5 bibs together. However, don't store anything about the borrower - just record the bib numbers:

    1857,47265,901,11367,91375

    As each borrower checks items out, you'll get strings of bibs to add to your database…

    857,37562,9582,901,1857 (5 items checked out)
    7464,1725 (2 items checked out)
    3874,58948,857 (3 items checked out)

    Then, to generate suggested items for bib# 901, you'd need to:

    1) search though all your strings of bibs and collate all the ones that contain a "901":

    1857,47265,901,11367,91375
    857,37562,9582,901,1857

    2) count how many times each bib occurs in the collated list

    Then present the ones that occured the most as the suggested items (e.g. 1857).

    Think of it more as a "people who borrowed this item, also borrowed these items at the same time…"

    At the very worse, all you could deduce from the database would be that at some point in the past, someone borrowed these specific books at the same time. There would be nothing to help to work out who they were, when they borrowed those items, or if they ever borrowed again.

  3. 3 On November 17th, 2005, casey said:

    I agree. This is very cool.

    I have two ideas on this subject:

    1) You have a trusted 3rd party (preferably in a country with no extradition treaty) "launder" the data — encrypt the borrower#'s with an encryption key that you don't know, only they know. So borrower #123 will always get encrypted as "7xl3" or something. That way, you can keep circ history indefinitely without having it tied to the borrower at all. If you get visited by the feds, they have no way of decrypting the borrower#'s since you don't have the key.

    2) Say you are allowed to keep circ history data for a relatively long period of time but not forever (90 days, say). Every 90 days, you encrypt the borrower#'s in the circ history with a one-time pad (basically, a completely random encryption key that is never reused so the only way to crack it is by brute force). So say 1st quarter of 2005, borrower #123 gets encrypted as "7xklj" and 2nd quarter of 2005, borrower #123 gets encrypted as "823w". You have no way of knowing that "823w" is the same person as "7xklj" but you can still know that borrower 7xklj checked out so and so books in a 90 day period. Basically it would make it so you could figure out "borrowers who checked out x also checked out y within 90 days of each other" but again, there's no way to decrypt any of the data older than 90 days — or make inferences between what happened 3-6 months ago and what happened 0-3 months ago.

    If you only keep circ history for a few days like we do, the method is basically equivalent to what Davey describes but much messier.

    Finally, you don't have to collect any data yourself to offer such a service. You could just use Amazon Web Services.

  4. 4 On November 17th, 2005, Davey P said:

    Great suggestions Casey :-)

    If you've not seen what Amazon Web Services have to offer, then have a look at this sample XML output:

    http://www.daveyp.com/blog/stuff/032112247X.xml

    …down at the bottom, there's a section marked "SimilarProducts" - those are the ISBNs Amazon use for their "Customers who bought this book also bought".

    All you need to do is to cross-reference the Amazon ISBNs with those in your database - if you get a match, display a link.

    Casey's second suggestion would ideally suit an academic institution, as student borrowing trends at specific times in the year could partly depend upon the modules they were studying.

    If you were to view the student's entire borrowing history (e.g. over 3 years), then it would contain (in part) an amalgamation of all of their module topics.

    However, sample the data every 90 days and you get a clearer selection of which titles to suggest to another borrower that will (potentially) be the most relevant to them at that moment in time.

    For example, if a student took a three year course in Java then you'd expect them to begin by borrowing books like "Dummies Guide to Java" and "Learn Java in 21 Days". By the end of the course, they might be on to "Advanced Agile Java Development using Scrum"***

    Their entire borrowing history would be a mix of Java titles, whereas a 90 day sample will group together Java titles at a similar skill level.

    *** - After graduation, they will of course be rounded up by a press gang led by Jack Blount (wearing an eye patch and a suspicious looking parrot on one shoulder squawking "Pieces of 8.0! Pieces of 8.0!")

  5. 5 On November 28th, 2005, Lorcan Dempsey's weblog said:

    Circulating intentional data

    I have posted a couple of times recently about intentional data, data that records choices and behaviors. I mentioned holdings data, ILL records, circulation records, and database usage records. One could extend this list to any data which records an i…

  6. 6 On February 27th, 2006, Davey P said:

    Our borrowing suggestions are now available via a web service:

    http://www.daveyp.com/blog/index.php/archives/69

  7. 7 On April 20th, 2006, Alex said:

    I now invenstigate to add a java script in the full bib view .

    Would you mind tell me where I can add the script in the fullnonmarcbib.xsl stylesheet ?

    Thanks

  8. 8 On April 20th, 2006, Dave Pattern said:

    Hi Alex

    I've added my code just above the section of the XSL file that starts with:

    <!–
    ************************************************
    Javascript
    ************************************************
    –>

  9. 9 On February 1st, 2007, Go John, Go! » "Self-plagiarism is style" said:

    [...] suggestions on our OPAC are very much driven by books recommended on the student reading lists, so it's going to be [...]

  10. 10 On May 22nd, 2008, 2008 — The Year of Making Your Data Work Harder » "Self-plagiarism is style" said:

    [...] Implementation of Library 2.0 and the E-framework" study). We've had circ driven borrowing suggestions on our OPAC since 2005 (were we the first library to do this?) and, more recently, we've used [...]

  11. 11 On November 18th, 2008, Dewey friend wheel » "Self-plagiarism is style" said:

    [...] friend wheel, but using library data, for a while now. Here's a prototype which uses our "people who borrowed this, also borrowed…" data to try find strong borrowing [...]

Leave a Reply