Summon 4 HN — bits o’ code

As part of the JISC Summon 4 HN project, we’ll be releasing some chunks of code that I’ve knocked together for our Summon implementation at Huddersfield.
The code will cover these areas:

  1. updating Summon with MARC record additions, updates and deletions from Horizon
  2. providing live availability information from Horizon without resorting to screen-scraping the OPAC
  3. customising 360 Link using jQuery

In theory, the first 2 might also be of interest to Horizon sites that are implementing an alternative OPAC (e.g. VuFind or AquaBrowser) where you need to set up regular MARC exports. The latter might be of interest to 360 Link sites in general.
Keep an eye on the Project Code section of the Summon 4 HN blog for details of the code 🙂


I couldn’t find a relevant photo for this blog post, so instead, let’s have another look at those infamous MIMAS #cupcakes from ILI2009 🙂
ili2009_013

Here comes Summ(er|on)

It’s probably a sign of getting old and decrepit, but this year has just flown by — it doesn’t seem like two minutes since we kicked off our implementation of Serials Solutions’ Summon and now it’s gone fully live (it actually went fully live halfway through the Mashed Library event we ran the other week).
woods_004
The bulk of the implementation was done and dusted by early January 2010, and the majority of the implementation time was spent populating 360 Link (the Serials Solutions link resolver) with our journal holdings — a task our Journals Team found much easier than when we implemented SFX back in 2006.  As the plan had always been to run Summon in parallel to MetaLib during the 2009/10 academic year, it meant we had lots of time to play and tweak. 
We flipped the link resolver over from SFX to 360 Link in late January and then formally “soft” launched Summon during the University’s Research Festival in early March.  Throughout the academic year, usage of Summon has been growing and the vast majority of the feedback has been positive 🙂
As part of the JISC Summon4HN Project, we’ll be documenting the implementation and releasing chunks of code that we hope might be of use to the community, including:

  • code for automating the export of deleted, new and updated MARC records from Horizon so that they can be imported into Summon (or VuFind, AquaBrowser, etc)
  • code for creating “dummy” journal title records (so that known journal titles can be easily located in Summon, e.g. American Journal of Nursing)
  • a basic mod_perl implementation of the DLF spec for exposing availability data for library collections
  • details of the various tweaks we’ve made to our 360 Link instance

Also, as part of the roll out of Summon, we’ve been revamping our E-Resources Wiki to provide a browseable list of resources — as with the journal titles, we’ve been dropping dummy MARC records into Summon so that known resources can be located via a search (e.g. Mintel Reports).

Non/low library usage and final grades

Whilst chatting to one of the delegates at yesterday’s “Gaining business intelligence from user activity data” event (my Powerpoint slides can be grabbed from here) about non & low-usage of library services/resources, I began wondering how that relates to final grades.
In the previous blog post, we’ve seen that there appears to be evidence of a correlation between usage and grades, but that doesn’t really give an indication into how many students are non/low users. For example, if we happened to know that 25% of all students never borrow anything from the library, does that mean that 25% of students who gain the highest grades don’t borrow a book?
Let’s churn the data again 🙂
In the following 3 graphs, we’re looking at:

  • X axis: bands of usage (zero usage, then incremental bands of 20, then everything over 180 uses)
  • Y axis: as a percentage, what proportion of the students who achieved a particular grade are in each band

You can click on the graphs to view a full-sized version.
One of the things to look for is which grade peaks in each band of usage.
Borrowing
The usage bands represent the number of items borrowed from the library during the final 3 years of study…
horizon
caveat: we have a lot of distance learners across the world and we wouldn’t expect them to borrow anything from the library
In terms on non-usage (i.e. never borrowing an item), there’s a marked difference between those who get the two highest grades (1 and 2:1) and those who get the lowest honours grade (3). It seems that those who get a third-class honour are twice as likely to be non-users than those who get a first-class or 2:1 degree.
E-Resource Usage
The usage bands represent the number of times the student logged into MetaLib (or AthensDA) during the final 3 years of study…
metalib
caveat: this is a relatively crude measure of e-resource usage, as it doesn’t measure what the student accessed or how long they accessed each e-resource
Even at a quick glance, we can see that this graphs tells a different story to the previous one — the numbers of non-users is lower, but there’s a huge (worrying?) amount of low usage (the “1-20” band). I can only speculate on that:

  • did students try logging in but found the e-resources too difficult to use?
  • how much of an impact do the barriers to off-campus access (e.g. having to know when & how to authenticate using Athens or Shibboleth) have on repeat usage?
  • are students finding the materials they need for their studies outside of the subscription materials?

As I mentioned previously, Summon is a different kettle of fish to MetaLib, so it’s unlikely we’ll be able to capture comparative usage data — if you’ve tried using Summon, you’ll know that you don’t need to log in to use it (authentication only kicks in when you try to access the full-text). However, we’re confident that Summon’s ease-of-use and the work we’ve done to improve off-campus access will result in a dramatic increase in e-resource usage.
As before, we see it’s those students who graduate with a third-class honour who are the most likely to be non or low-users of e-resources.
Visits to the Library
The usage bands represent the number of visits to the library during the final 3 years of study…
sentry
caveat: we have a lot of distance learners across the world and we wouldn’t expect them to borrow anything the the library
Again, the graph shows that those who gain a third-class degree are twice as likely to never visit the library than those who gain a first-class or 2:1.

Library usage and final grades

It’s high time I started blogging again, so let’s start off with something that my colleagues in the library have been talking about at recent conferences — the link between the usage of library services and the final academic grades achieved by students.
As a bit of background to this, it’s probably worth mentioning that we’ve had an ongoing project (since 2006?) in the library looking at non and low-usage of library resources. That project has helped identify the long term trends in book borrowing, e-resource usage and library visits by the students at Huddersfield. Plus, we’ve used that information to help identify specific courses and cohorts of students who probably aren’t using the library as much as they should be, as well as when is the most effective time during a course to do refresher training.
Towards the back end of last year, we worked with the Student Records Team to build up a profile of library usage by the previous 2 years worth of graduates. For each graduate, we compared their final degree grade with their last 3 years of library usage data — specifically:

  • Items loaned — how many things did they borrow from the library?
  • MetaLib/AthensDA logins — how often did they access e-resources?
  • Entry stats — how many times did they venture in to the library?

Now, I’ll be the first to admit that these are basic & crude measures…

  • A student might borrow many items, but maybe he’s just working his way through our DVD collection for fun.
  • A login to MetaLib doesn’t tell you what they looked at or how long they used our e-resources.
  • Students might (and do) come into the library for purely social reasons.
  • Using the library is just one part of the overall academic experience.

…but they are rough indicators, useful for a quick initial check to see if there is a correlation. Plus, we know from the non & low-usage project that there are still many students who (for many reasons) don’t use the library much.
So, let’s churn the data! 🙂
Here’s the average usage by the 3,400 or so undergraduate degree students who graduated with an honour in the 2007/8 academic year:
2007/8
In terms of visits to the library, there’s no overall correlation — the average number of visits per student ranges from 109 to 120 — although we do seem some correlation at the level of individual courses. What does this tell us (if anything)? I’d say it’s evidence that the library is for everyone, regardless of their ability and academic prowess.
We do see a correlation with stock usage and e-resource usage. Those who achieved a first (1) on average borrowed twice as many items as those who got a third (3) and logged into MetaLib/AthensDA to access e-resources 3.5 times as much. The correlation is fairly linear across the grades, although there’s a noticable jump up in e-resource usage (when compared to stock borrowing) in those who gained a first.
Now the data for the 3,200 or students from the following academic year, 2008/9:
2008/9
As before, no particular correlation with visits to the library, but a noticeable correlation with stock & e-resource usage. Again we see that jump in e-resource usage for those who got the highest grade.
Note too that the average usage has increased. We’ve not changed the way we measure logins or item circulation, so this is a real year-on-year growth. (Side note: as we make the move from MetaLib to Summon, the concept of an “e-resource login” will change dramatically, so we won’t be able to accurately compare year-on-year in future)
Finally, here’s both years of graduates usage combined onto a single graph:
2007/8 & 2008/9
I’m curious about that jump in e-resource usage. Does it mean, to gain the best marks, students need to be looking online for the best journal articles, rather than relying on the printed page? If that is the case, will Summon have a measurably positive impact on improving grades (it certainly makes it a lot easier to find relevant articles quickly)?
Going forward, we’ve still got a lot of work to do drilling down into the data — analysing it by individual courses, looking deeper into the books that were borrowed and the e-resources that were accessed, etc. We’re also need to prove that all this has a stastical relevance. Not only that, but how can we use that knowledge and insight to improve the services which the library offers — it’d be foolish to say “borrow more books and you’ll get better grades”, but maybe we can continue to help guide students to the most relevant materials for their students.
It’s all exciting stuff and, believe me, the University of Huddersfield Library is a great environment to work in… I just wish there were more hours in the day! 🙂

Mashed Library 2010 – Chips and Mash

I promise that one of these days I’ll get back into regular blogging (honest!)
DSC_1487
After “Liver and Mash” at Liverpool, it looked likely that there might not be another Mashed Library event in 2010. So, after a bit of a natter with colleagues, we’ve decided to host another event at Huddersfield (with even more cake than last time!)…

Chips and Mash

Mashed Library 2010 – Liverpool

As announed on Twitter last night, the first Mashed Library UK event of 2010 will be taking place in Liverpool on Friday 14th May 🙂
Keep an eye on the following sites for further details!

DSC_7775
It’s become a little tradition to give each event a fun name — we’ve had “Mash Oop North!” (Huddersfield) and “Middlemash” (Birmingham). If you’ve got any suggestions for the Liverpool event, please tweet them to @m8nd1 or @daviddclay, or leave a comment here 🙂
Some (Beatles related) suggestions are:

  • A Hard Day’s Mash
  • All You Need is Mash
  • Sgt Masher’s Mashtastical Mashup Band
  • All Things Must Mash
  • Mash Me Do
  • Eight Mashups a Week
  • I Am the Mashup
  • Mashups (That’s What I Want)
  • I’m API Just to Dance With You

Mashed Library 2009 — Middlemash

Just a quick “heads-up” that the second Mashed Library event of 2009 (“Middlemash”) takes place at Birmingham City University on Monday 30th November.
The registration form should be going live The registration form went live on Tuesday morning and I’m sure it’ll be another sell-out event — keep a close eye on the event blog for further details 🙂
mashlib09_018
If you can’t make it to Birmingham in November, then keep an eye on the Mashed Library Wiki for details of the next event, which will hopefully take place at the University of Liverpool in early 2010. Many thanks to David Clay for offering to host the event!

ILI 2009 Presentation

I really struggled to shoehorn everything I wanted to talk about during my ILI 2009 presentation into the slides, so this blog post goes into a bit more depth than I’ll probably talk about…
slide 1 & 2

I’m still in two minds about whether or not the word “exploit” has too many negative connotations, but what the heck!
If you do use any of the content from the presentation, please drop me an email to let me know 🙂
slide 3

As part of the development of the UK version of Horizon back in the early 1990s, libraries requested that the company (Dynix) add code to log all circulation transactions. Horizon was installed at Huddersfield in 1996 and has been logging circulation data since then. At the time of writing this blog post, we’ve got data for 3,157,111 transactions.
slide 4

With that volume of historical data, it seemed sensible to try and create some useful services for our students. In November 2005, we started dabbling with an Amazon-style “people who borrowed this” service on our OPAC. After some initial testing and tweaking, the service went fully live in January 2006. The following month, we added a web service API (named “pewbot”).
To date, we’ve had over 90,000 clicks on the “people who borrowed this, also borrowed…” suggestions, with a peak of 5,229 clicks in a single month (~175 clicks per day). Apart from the “Did you mean?” spelling suggestions, this has been the most popular tweak we’ve made to our OPAC.
slide 5

Because we’re an academic library, we get peaks and troughs of borrowing throughout the academic year. The busiest times are the start of the new academic year in October and Easter.
slide 6

If you compare the number of clicks on the “people who borrowed this, also borrowed..” suggestions, you can see that it’s broadly similar to the borrowing graph, except for the peak usage. Due to the borrowing peak in October, in November a significant portion of our book stock will be on loan. When our students find that they books they want aren’t available, they seem to find the suggestions useful.
I’m hoping to do some analysis to see if there’s a stronger correlation between the suggested books that are clicked on and then borrowed on the same day during November than during the other months.
slide 7

Once a user logs into the OPAC, we can provide a personal suggestion by generating the suggestions for the books they’ve borrowed recently and then picking one of the titles that comes out near the top.
slide 8

I was originally asked to come up with some code to generate new book lists for each of our seven academic schools. It turned out to be extremely hard to figure out which school a book might have been purchased for, so I turned to the historical book circulation data to come up with a better method.
Rather than having a new book list per school, we’re now offering new book lists per course of study.
The way it’s done is really simple — for each course, we analyse all of the books borrowed by students on that course and then automatically build up a Dewey lending profile. Whenever a new book is added to our catalogue, we check to see which courses have previously borrowed heavily from that Dewey class and then add the book details to their feeds.
The feeds are picked up by the University Portal, so students should see the new book list for their course and (touch wood!) the titles will be highly relevant to their studies.
slide 9

One of the comments I frequently hear is that book recommendation services might create a “vicious circle” of borrowing, with only the most popular books being recommended. At Huddersfield, we’ve seen the opposite — since adding recommendations and suggestions, the range of stock being borrowed has started to widen.
From 2000 to 2005, the range of titles being borrowed per year was around 65,000 (which is approximately 25% of the titles held by the library). Since adding the features in early 2006, we’ve seen a year-on-year increase in the range of titles being borrowed. In 2009, we expect to see over 80,000 titles in circulation, which is close to 33% of the titles held by the library.
I strongly believe that by adding serendipity to our catalogue, we’re seeing a very positive trend in borrowing by our students.
slide 10

Not only are students borrowing more widely than before, they’re also borrowing more books than before. From 2000 to 2005, students would borrow an average of 14 books per year. In 2009, we’re expecting to see borrowing increase to nearly 16 books per year. We’re also seeing a year-on-year decrease in renewals — rather than keeping hold of a book and renewing it, students seem to be returning items sooner and borrowing more than ever before.
slide 11

We’re also logging keyword searches on the catalogue — since 2006, we’ve logged over 5 million keyword searches and it’s fun looking at some of the trends.
As we had a bit of dead space on the OPAC front page, we decided to add some “eye candy” — in this case, it’s a keyword cloud of the most popular search terms from the last 48 hours. Looking at the usage statistics, we’re seeing that new students find the cloud a useful way of starting their very first search of the catalogue, with the usage in October nearly twice that of the next highest month.
slide 12

A much more useful service that we’ve built from the keywords is one that suggests good keywords to combine with your current search terms.
In the above example, we start with a general search for “law” which brings back an unmanageable 7000+ results. In the background, the code quickly searches through all of the previous keyword searches that contained law and pulls together the other keywords that are most commonly used in multi-keyword searches that included “law”. With a couple of mouse clicks, the user can quickly narrow the search down to a manageable 34 results for “criminal law statutes“.
There’re two things I really like about this service:
1) I didn’t have to ask our librarians to come up with the lists of good keywords to combine with other keywords — they’ve got much more important things to do with their time 🙂
2) The service acts as a feedback loop — the more searches that are carried out, the better the suggestions become.
slide 13

I forget exactly how this came about (but I suspect a conversation with Ken Chad sowed the initial seed), but we decided to release our circulation and recommendation data into “the wild” in December 2008 — see here for the blog post and here for the data.
The data was for every item that has an ISBN in the bibliographic record, as we felt than the ISBN would be the most useful match point for mashing the data up with other web services (e.g. Amazon).
We realised that we’d need to use a licence for the data release and, after a brief discussion with Ken Chad, it became increasingly obvious that a Public Domain licence was the most appropriate. Accordingly, the data was released under a joint Open Data Commons and (partly because we couldn’t decide which licence was the best one!). In other words, we wanted it to be really clear that there were “no strings” attached to how the data could be used.
slide 14

Within a couple of days of releasing the data, Patrick Murray-John at the University of Mary Washington had taken it and “semantified” the data.
A few weeks later, I had the privilege of chatting to Patrick and Richard Wallis when we took part in a Talis Podcast about the data release.
slide 15

My great friend Iman Moradi (formerly a lecturer at Huddersfield and now the Creative Director of Running in the Halls) used some of the library data as part of the Multimedia Design course.
slides 16 & 17

Iman’s students used the library data to generate some really cool data visualisations — it was really hard to narrow them down to just two images for the ILI presentation. The second image made me think of Ranganathan‘s 5th Law of Library Science: “The library is a growing organism” 🙂
slide 18

The JISC funded MOSAIC Project (Making Our Shared Activity Information Count), which followed on from the completed TILE Project, is exploring the benefits that can be derived from library usage and attention data.
Amongst the goals of the project are to:

  • Encourage academic libraries to release aggregated/anonymised usage data under an open licence
  • Develop a prototype search engine capable of providing course/subject specific relevancy ranked results

The prototype search engine is of particular interest, as it uses the pooled usage/attention data to rank results so that the ones which are more relevant to the student (based on their course) are boosted. For example, if a law student did a search for “ethics”, books on legal ethics would be ranked higher than those relating to nursing ethics, ethics in journalism, etc. This is achieved by deep analysis of the behaviour of other law students at a variety of universities.
slide 19

The MOSAIC Project is also encouraging the developer community to engage with the usage data, and this included sponsorship of a developer competition.
they
slides 20 & 21

It was hard to pick which competition entries to include in the presentation, so I just picked a couple of them at random. The winning entry, and the two runners up, should be announced shortly — keep an eye on the project web site!
slide 22

The library usage graphs on slides 9 and 10 clearly show that borrower behaviour has changed since the start of 2006. Given that this change coincided with the introduction of suggestions, recommendations and serendipity in the library catalogue, I believe that there’s a compelling argument that they have played a role in initiating that change.
With the continuing push for Open Data (e.g. see the recent TED talk by Tim-Berner’s Lee), I believe libraries should be seriously considering releasing their usage and attention data.
slide 23

Most usage based services require some initial data to work with. So, given that disk storage space is so cheap, it makes sense to capture as much usage/attention data as possible in advance, even if you have no immediate thoughts about how to utilise it.

Simple API for JISC MOSAIC Project Developer Competition data

For those of you interested in the developer competition being run by the JISC MOSAIC Project, I’ve put together a quick & dirty API for the available data sets. If it’s easier for you, you can use this API to develop your competition entry rather than working with the entire downloaded data set.

edit (31/Jul/2009): Just to clarify — the developer competition is open to anyone, not just UK residents (however, UK law applies to how the competition is being run). Fingers crossed, the Project Team is hopeful that a few more UK academic libraries will be adding their data sets to the pot in early August.

The URL to use for the API is https://library.hud.ac.uk/mosaic/api.pl and you’ll need to supply a ucas and/or isbn parameter to get a response back (in XML), e.g.:

The “ucas” value is a UCAS Course Code. You can find these codes by going to the UCAS web site and doing a “search by subject”. Not all codes will generate output using the API, but you can find a list of codes that do appear in the MOSAIC data sets here.
If you use both a “ucas” and “isbn” value, the output will be limited to just transactions for that ISBN on courses with that UCAS course code.
You can also use these extra parameters in the URL…

  • show=summary — only show the summary section in the XML output
  • show=data — only show the data in the XML output (i.e. hide the summary)
  • prog=… — only show data for the specified progression level (e.g. staff, UG1, etc, see documentation for full list)
  • year=… — only show data for the specified academic year (e.g. 2005 = academic year 2005/6)
  • rows=… — max number of rows of data to include (default is 500) n.b. the summary section shows the breakdown for all rows, not just the ones included by the rows limit

The format of the XML is pretty much the same as shown in the project documentation guide, except that I’ve added a summary section to the output.
Notes
The API was knocked together quite quickly, so please report any bugs! Also, I can’t guarentee that the API is 100% stable, so please let me know (e.g. via Twitter) if it appears to be down.

Mashed Oop Multimedia

Just in case it’s of interest to anyone, we’ve started uploading videos of the opening sessions from “Mash Oop North” to Vimeo and the Internet Archive (see this blog post for links).
With the free Vimeo account, you can only upload up to 500MB a week, so it’s going to take a few weeks to get them all uploaded. However, you can find them all already on the Internet Archive.
As a taster, here’s Brendan Dawes (Creative Director at Magnetic North) strutting his funky stuff…

Mash Oop North – Brendan Dawes from Dave Pattern on Vimeo.

There’s also quite a few photos on Flickr (tagged with mashlib09)…
mashlib09_020
mashlib09_029
mashlib09_016
mashlib09_011