code – Self Plagiarism is Style

Keeping everyone happy at a conference

At Mashed Library UK 2009, we’re planning to kick the event off with six 30 minute opening sessions. We’ve got two rooms, so there’ll be a session running in each room at the same time. Since a delegate can’t be in two places at the same time, they’ll only be able to go to three of the six sessions. So, how do you ensure that you keep everyone happy and that you don’t have too many clashes (i.e. having to miss a session you’d have quite liked to have gone to)?
Having never organised an event before, I’m guessing the usual way would be to try and schedule sessions together that target different audiences? However, that sounds like a potential headache inducer and I’m a programmer, not a planner!
So, what we’re going to do, once we’ve got all six sessions finalised, is to let each of the 60 odd delegates (and by that I mean we’ve got more than 60 delegates!) rank the sessions in order of preference. So, their 1st, 2nd, and 3rd choices would be the three sessions that they’d most like to go to.
With that kind of data, you’d expect to see some clustering (i.e. delegates making the same or similar choices) and so (in theory) there will be an optimal sequencing of sessions that will give the most delegates the best chance to going to their three top choices.
There’s a wide variety of programming techniques for finding optimal solutions to problems, from the simple to the complex (e.g. simulated annealing and genetic algorithms). However, because I’d got a bath running, I decided to knock up a quick hack using the simplest method — randomly generate a session sequence and then see how well it meets the choices of the delegates. By the way, if you want to learn more about calculating optimal solutions, see “Programming Collective Intelligence” by Toby Segaran (ISBN 9780596529321).
With any optimal solution code, you need to way of measuring the success of a given solution. To my mind, that would be “happiness” — if you find a solution that gives a delegate the ability to attend their top three choices, they’ll be very happy, but if you have a session clash for their 1st and 2nd choices, they won’t be happy. Once you’ve calculated the overall “happiness” for all the delegates, then that allows you to compare that particular solution with other random solutions (i.e. “does this session sequence generate more happiness or less that the previous one?”)
I hadn’t planned on releasing the code, as it really was a 5 minute “quick and dirty” hack, but Ben tweeted to say he might find it useful, so I’ve uploaded the Perl script to here. I’ve also included a sample file containing some dummy delegate choices.
For each delegate, there’s a comma separated list showing their session preference (1=top choice)…

Andy    2,4,3,5,6,1

…so Andy’s top choice is session 6, followed by session 1, then session 3, etc.
If you run the Perl script, it’ll pick a random session sequence and calculate the happiness. It’ll keep on looping and trying to find better solutions until it finds one that can’t be improved upon. You’d probably want to run the code several times to ensure that the final solution really is the best one. You might want to also try one of the alternative $overall calculations to see if that produces the same session sequence.
Here’s an example of an early solution…

[1]     session 1 = 11 delegate(s)
[1]     session 6 = 4 delegate(s)
[2]     session 5 = 6 delegate(s)
[2]     session 4 = 9 delegate(s)
[3]     session 2 = 8 delegate(s)
[3]     session 3 = 7 delegate(s)
HAPPINESS = 87 (5.8)
        1       Andy    -4.8
        3       Beth    -2.8
        3       Cary    -2.8
        9       Dave    +3.2
        5       Earl    -0.8
        9       Fred    +3.2
        9       Gene    +3.2
        3       Hans    -2.8
        9       Iggy    +3.2
        5       Jane    -0.8
        5       Karl    -0.8
        9       Leah    +3.2
        9       Macy    +3.2
        3       Neil    -2.8
        5       Owen    -0.8
CLASHES = 7 / OVERALL = 12.4285714285714 / DIFF = 38.4

In the above output, it’s proposing to run sessions 1 & 6 together, then 5 & 4, and finally 2 & 3. By looking at the delegate choices, you can easily calculate which of the two concurrent sessions each delegate would prefer to go to (i.e. 11 delegates would choose to go to session 1).
The code also calculates a “happiness” value for each delegate. If a delegate gets to go to their 1st, 2nd and 3rd choices, then they’d get a maximum happiness score of 9 (3 x 3 points). If a 1st choice session is being run at the same time as their 2nd choice (or a 2nd at the same time as the 3rd), that would make them unhappy, so a point is deducted. If a 1st choice runs at the same time as their 3rd choice, they’d probably accept that (however, nothing is added to their happiness score).
Once all the scores have been calculated, we get an overall happiness of 87 (out of a possible 135, i.e. 15 delegates x the maximum happiness score of 9) and the average happiness is 5.8 out of 9.
We can also see the how (un)happy each delegate is and how much they deviate from the average happiness. Dave, Fred, Gene, Iggy, Leah and Macy all get to go to their top 3 choices, so they’ve all got scores of 9 out of 9. Andy is very unhappy (1 out of 9). The others are somewhere in the middle, so they’ve all had to make compromises and won’t be going to their top 3 sessions.
There are 7 clashes (when a 1st choice runs at the same time as the 2nd, or the 2nd at the same time as the 3rd). Ideally, we’d like to keep the clashes to a minimum.
Here’s an example of a better solution (which might actually be the optimal solution for the dummy data)…

[1]     session 3 = 9 delegate(s)
[1]     session 5 = 6 delegate(s)
[2]     session 4 = 9 delegate(s)
[2]     session 6 = 6 delegate(s)
[3]     session 1 = 10 delegate(s)
[3]     session 2 = 5 delegate(s)
HAPPINESS = 101 (6.73333333333333)
        5       Andy    -1.73333333333333
        9       Beth    +2.26666666666667
        3       Cary    -3.73333333333333
        3       Dave    -3.73333333333333
        5       Earl    -1.73333333333333
        3       Fred    -3.73333333333333
        5       Gene    -1.73333333333333
        9       Hans    +2.26666666666667
        5       Iggy    -1.73333333333333
        9       Jane    +2.26666666666667
        9       Karl    +2.26666666666667
        9       Leah    +2.26666666666667
        9       Macy    +2.26666666666667
        9       Neil    +2.26666666666667
        9       Owen    +2.26666666666667
CLASHES = 2 / OVERALL = 50.5 / DIFF = 36.2666666666667

The average happiness is now up to 6.73 per delegate and there are only 2 clashes, which is much better. Cary, Dave and Fred will be the most affected by this particular session scheduling, but we now have 8 delegates attending their top choices.
So, the big question will be: what happens when we get the real data from the 60 odd delegates who are coming to Mashed Library? Stay tuned for the answer!

HIPpie — how to build a dictionary

Many thanks to those of you who’ve tested the code from yesterday! Those of you outside of the UK might want to see if this version works slightly faster for you:
hippie_spellcheck_v0.02.txt
The next thing I’ll be looking at is how to optimise the spellchecker dictionary for each library. Some of you will already have read this in the email I sent out this morning or in the comment I left previously, but I’m thinking of attacking it this way:
1) Start off with a standard word list (e.g. the 1000 most commonly used English words) to create the spellcheck dictionary for your library, as the vast majority should match something on your catalogue.
2) Add some extra code to your HIP so that all successful keyword searches get logged. Those keywords can then be added to your dictionary.
It could even be that starting with an empty dictionary might prove to be more effective (i.e. don’t bother with step 1) — just let the “network effect” of your users searching your OPAC generate the dictionary from scratch (how “2.0” is that?!)
To avoid any privacy issues, the code for capturing the successful keywords could be hosted locally on your own web server (I should be able to knock up suitable Perl and PHP scripts for you to use). Then, periodically, you’d upload your keyword list to HIPpie so that it can add the words to your spellchecker dictionary.
What about if you don’t have SirsiDynix HIP? Well, as mentioned previously, the spellchecker has been implemented as a web service (more info here), and the HIP spellchecker makes use of that web service to get a suggestion. At the moment it only returns text or XML, but I’m planning to add JSON as an option soon. Also, if you have a look at the HIP stylesheet changes, you can see the general flow of the code:
1) insert a div with an id of “hippie_spellchecker” into the HTML
2) make a call to “https://library.hud.ac.uk/hippie_perl/spellchecker2.pl” with your library ID (currently “demo”) and the search term(s) as the parameters
3) the call to “spellchecker2.pl” returns JavaScript to update the div from step 1
4) clicking on the spelling suggestion triggers the “hippie_search” JavaScript function which is responsible for creating a search URL suitable for the OPAC (which might include things like a session ID or an index to search)
None of the above 4 steps are specifically tied to the SirsiDynix HIP and should be transferable to other OPACs. I’ve put together a small sample HTML page that does nothing apart from pull in a suggestion using those 4 steps:
example001.html
If you do want to have a go with your own OPAC, please let me know — at some point I’ll need people to register their libraries so that each can have their own dictionary, and I might start limiting the number of requests that any single IP address can make using the “demo” account. Also, it would be good to build up a collection of working implementations for different OPACs.

HIPpie “Did you mean?” ready for testing (again!)

A thousand apologies to those of you who’ve been waiting for HIPpie to reach the testing phase — has it really been 6 months since I last posted anything?! HIPpie was/is a project that I’ll be doing in my spare time and, unfortunately, since Christmas, my spare time has been taken up with everything but working on HIPpie!
Anyway, having realised that it’s so long since I posted anything, I was shamed into making some time and I’m now at the stage where some brave HIP 3.x library can alpha test the spellchecker code. Ideally you want to be doing this on a test HIP 3.x server, unless you’re feeling particularly reckless.
The usual caveats apply — make sure you safely back up any files you edit and you promise not to hold me responsible if your server room mysteriously burns down shortly after you add the code. Also, altering your XSL stylesheets may have an impact on what support SirsiDynix will be able to give you.
To test the code, you’ll need to edit the searchinput.xsl stylesheet. Once you’ve found the file, make a safe backup before you make the changes! Open up the file and scroll down to around line 580 — you should see a <center> tag. After that tag, you need to insert (i.e. copy & paste) in the contents of this file:
hippie_spellcheck_v0.01.txt
Save the altered file and give your HIP server a minute to pick up the altered stylesheet. Now fire up a web browser and run a search for a misspelled word. If you get an error message, then double-check the changes you made to the stylesheet and, if all else fails, you can revert back to your backed up version. Touch wood, you should get a “did you mean” suggestion which looks like this:
hippie_v001
If you do test the code, please feed back!
Notes
This test version of the code is using a fairly small American dictionary of words, so you may not get appropriate suggestions for your locale.