"Self-plagiarism is style" Rotating Header Image

Web service for the free book usage data

I've been meaning to get around to adding a web service front end on to the book usage data that we released in December for ages. So, better late than never, here it is!

It's not the fastest bit of code I've ever written, but (if there's enough interest) I could speed it up.

The web service can be called a couple of different ways:

1) using an ISBN

Examples:
a) http://library.hud.ac.uk/api/usagedata/isbn=0415014190 ("Language in the news")
b) http://library.hud.ac.uk/api/usagedata/isbn=159308000X ("The Adventures of Huckleberry Finn")

Assuming a match is located, data for 1 or more items will be returned. This will include FRBR style matching using the LibraryThing thingISBN data, as shown in the second example where we don't have an item which exactly matches the given ISBN.

2) using an ID number

Examples:

a) http://library.hud.ac.uk/api/usagedata/id=125120 ("Language and power")

The item ID numbers are included in the suggestion data and are the internal bibliographic ID numbers used by our library management system.

——————-

edit 1: I should also have mentioned that the XML returned is essentially the same format as described here.
edit 2: Ive now re-written the code as a mod_perl script (to make it faster when using ISBNs) and slightly altered the URL

Keeping everyone happy at a conference

At Mashed Library UK 2009, we're planning to kick the event off with six 30 minute opening sessions. We've got two rooms, so there'll be a session running in each room at the same time. Since a delegate can't be in two places at the same time, they'll only be able to go to three of the six sessions. So, how do you ensure that you keep everyone happy and that you don't have too many clashes (i.e. having to miss a session you'd have quite liked to have gone to)?

Having never organised an event before, I'm guessing the usual way would be to try and schedule sessions together that target different audiences? However, that sounds like a potential headache inducer and I'm a programmer, not a planner!

So, what we're going to do, once we've got all six sessions finalised, is to let each of the 60 odd delegates (and by that I mean we've got more than 60 delegates!) rank the sessions in order of preference. So, their 1st, 2nd, and 3rd choices would be the three sessions that they'd most like to go to.

With that kind of data, you'd expect to see some clustering (i.e. delegates making the same or similar choices) and so (in theory) there will be an optimal sequencing of sessions that will give the most delegates the best chance to going to their three top choices.

There's a wide variety of programming techniques for finding optimal solutions to problems, from the simple to the complex (e.g. simulated annealing and genetic algorithms). However, because I'd got a bath running, I decided to knock up a quick hack using the simplest method — randomly generate a session sequence and then see how well it meets the choices of the delegates. By the way, if you want to learn more about calculating optimal solutions, see "Programming Collective Intelligence" by Toby Segaran (ISBN 9780596529321).

With any optimal solution code, you need to way of measuring the success of a given solution. To my mind, that would be "happiness" — if you find a solution that gives a delegate the ability to attend their top three choices, they'll be very happy, but if you have a session clash for their 1st and 2nd choices, they won't be happy. Once you've calculated the overall "happiness" for all the delegates, then that allows you to compare that particular solution with other random solutions (i.e. "does this session sequence generate more happiness or less that the previous one?")

I hadn't planned on releasing the code, as it really was a 5 minute "quick and dirty" hack, but Ben tweeted to say he might find it useful, so I've uploaded the Perl script to here. I've also included a sample file containing some dummy delegate choices.

For each delegate, there's a comma separated list showing their session preference (1=top choice)…

Andy    2,4,3,5,6,1

…so Andy's top choice is session 6, followed by session 1, then session 3, etc.

If you run the Perl script, it'll pick a random session sequence and calculate the happiness. It'll keep on looping and trying to find better solutions until it finds one that can't be improved upon. You'd probably want to run the code several times to ensure that the final solution really is the best one. You might want to also try one of the alternative $overall calculations to see if that produces the same session sequence.

Here's an example of an early solution…

[1]     session 1 = 11 delegate(s)
[1]     session 6 = 4 delegate(s)

[2]     session 5 = 6 delegate(s)
[2]     session 4 = 9 delegate(s)

[3]     session 2 = 8 delegate(s)
[3]     session 3 = 7 delegate(s)

HAPPINESS = 87 (5.8)

        1       Andy    -4.8
        3       Beth    -2.8
        3       Cary    -2.8
        9       Dave    +3.2
        5       Earl    -0.8
        9       Fred    +3.2
        9       Gene    +3.2
        3       Hans    -2.8
        9       Iggy    +3.2
        5       Jane    -0.8
        5       Karl    -0.8
        9       Leah    +3.2
        9       Macy    +3.2
        3       Neil    -2.8
        5       Owen    -0.8

CLASHES = 7 / OVERALL = 12.4285714285714 / DIFF = 38.4

In the above output, it's proposing to run sessions 1 & 6 together, then 5 & 4, and finally 2 & 3. By looking at the delegate choices, you can easily calculate which of the two concurrent sessions each delegate would prefer to go to (i.e. 11 delegates would choose to go to session 1).

The code also calculates a "happiness" value for each delegate. If a delegate gets to go to their 1st, 2nd and 3rd choices, then they'd get a maximum happiness score of 9 (3 x 3 points). If a 1st choice session is being run at the same time as their 2nd choice (or a 2nd at the same time as the 3rd), that would make them unhappy, so a point is deducted. If a 1st choice runs at the same time as their 3rd choice, they'd probably accept that (however, nothing is added to their happiness score).

Once all the scores have been calculated, we get an overall happiness of 87 (out of a possible 135, i.e. 15 delegates x the maximum happiness score of 9) and the average happiness is 5.8 out of 9.

We can also see the how (un)happy each delegate is and how much they deviate from the average happiness. Dave, Fred, Gene, Iggy, Leah and Macy all get to go to their top 3 choices, so they've all got scores of 9 out of 9. Andy is very unhappy (1 out of 9). The others are somewhere in the middle, so they've all had to make compromises and won't be going to their top 3 sessions.

There are 7 clashes (when a 1st choice runs at the same time as the 2nd, or the 2nd at the same time as the 3rd). Ideally, we'd like to keep the clashes to a minimum.

Here's an example of a better solution (which might actually be the optimal solution for the dummy data)…

[1]     session 3 = 9 delegate(s)
[1]     session 5 = 6 delegate(s)

[2]     session 4 = 9 delegate(s)
[2]     session 6 = 6 delegate(s)

[3]     session 1 = 10 delegate(s)
[3]     session 2 = 5 delegate(s)

HAPPINESS = 101 (6.73333333333333)

        5       Andy    -1.73333333333333
        9       Beth    +2.26666666666667
        3       Cary    -3.73333333333333
        3       Dave    -3.73333333333333
        5       Earl    -1.73333333333333
        3       Fred    -3.73333333333333
        5       Gene    -1.73333333333333
        9       Hans    +2.26666666666667
        5       Iggy    -1.73333333333333
        9       Jane    +2.26666666666667
        9       Karl    +2.26666666666667
        9       Leah    +2.26666666666667
        9       Macy    +2.26666666666667
        9       Neil    +2.26666666666667
        9       Owen    +2.26666666666667

CLASHES = 2 / OVERALL = 50.5 / DIFF = 36.2666666666667

The average happiness is now up to 6.73 per delegate and there are only 2 clashes, which is much better. Cary, Dave and Fred will be the most affected by this particular session scheduling, but we now have 8 delegates attending their top choices.

So, the big question will be: what happens when we get the real data from the 60 odd delegates who are coming to Mashed Library? Stay tuned for the answer!

Mashed Library UK 2009 — Open for Business!

We're now accepting registrations for Mashed Library UK 2009: Mash Oop North!

mashuplibrary2009

Please see the event blog for further details.

————

Blimey — we've "sold out" already! Any bookings from 2:20pm onwards today will be placed on a waiting list.

Tracking the library angle on swine flu

For anyone who's interested in how libraries and librarians are blogging about the swine flu outbreak, I've just added a dedicated RSS feed over on the HotStuff 2.0 blog: http://www.daveyp.com/hotstuff/?p=268

Transcript of the #cilip2 Twitter hastag

Despite a widespread network failure that seemed to affect quite a few universities, I finally managed to pick up all of the #cilip2 tweets from today's event: http://www.daveyp.com/blog/stuff/cilip2.html

cilip2_full

Whenever I get a spare half-an-hour, I'll do some analysis of the tweets. If anyone want a tab separated version of the data, you can grab it from here.

Sunny day

A gorgeous sunny day, ideal for a walk in the wood

25apr2009_012

25apr2009_023

…or for lazing around in the garden…

25apr2009_027

OCLC to launch web based ILS

I'm guessing it's safe to post this now, although I'm still expecting a phone call from lawyers representing OCLC and/or The Library Journal after "accidentally" stumbling on the news before it officially broke :-S

The LJ article (written by Marshall Breeding) isn't live at the moment, but it the annoucement has appeared on the OCLC home pages:

- OCLC announces strategy to move library management services to Web scale
- Andrew Pace: And now for something completely different

bignews

Twittering ideas

Is your library looking a little tired and shabby? Are you patrons listless and lackadaisical? Are you in need of inspiration? Have you shifted your paradigms recently? Do you believe that ideas should come at regular 15 minute intervals and always be less than 140 characters long?

I should have thought of hooking the Library 2.0 Idea Generator into Twitter a long time ago :-D

Keyword search data

We've been logging all keyword searches on our OPAC for nearly 3 years and now have details for over 3 million searches. Just in case the data is of any use to anyone, I've uploaded an aggregated XML version to our web server: http://library.hud.ac.uk/data/keyworddata/

As with the usage data, we're putting it out there with no strings attached by using an Open Data Commons Licence.

The XML file contains a list of about 8,500 keywords. For each keyword, there's a list of other terms that have been used with that keyword in multi-term searches. The readme file contains more information about the structure.

Read my shocking story

ad

…now, that's more like it :-)