"Self-plagiarism is style"

HIPpie — how to build a dictionary

2nd July 2008

HIPpie — how to build a dictionary

posted in HIPpie |

Many thanks to those of you who've tested the code from yesterday! Those of you outside of the UK might want to see if this version works slightly faster for you:

hippie_spellcheck_v0.02.txt

The next thing I'll be looking at is how to optimise the spellchecker dictionary for each library. Some of you will already have read this in the email I sent out this morning or in the comment I left previously, but I'm thinking of attacking it this way:

1) Start off with a standard word list (e.g. the 1000 most commonly used English words) to create the spellcheck dictionary for your library, as the vast majority should match something on your catalogue.

2) Add some extra code to your HIP so that all successful keyword searches get logged. Those keywords can then be added to your dictionary.

It could even be that starting with an empty dictionary might prove to be more effective (i.e. don't bother with step 1) — just let the "network effect" of your users searching your OPAC generate the dictionary from scratch (how "2.0" is that?!)

To avoid any privacy issues, the code for capturing the successful keywords could be hosted locally on your own web server (I should be able to knock up suitable Perl and PHP scripts for you to use). Then, periodically, you'd upload your keyword list to HIPpie so that it can add the words to your spellchecker dictionary.

What about if you don't have SirsiDynix HIP? Well, as mentioned previously, the spellchecker has been implemented as a web service (more info here), and the HIP spellchecker makes use of that web service to get a suggestion. At the moment it only returns text or XML, but I'm planning to add JSON as an option soon. Also, if you have a look at the HIP stylesheet changes, you can see the general flow of the code:

1) insert a div with an id of "hippie_spellchecker" into the HTML

2) make a call to "http://library.hud.ac.uk/hippie_perl/spellchecker2.pl" with your library ID (currently "demo") and the search term(s) as the parameters

3) the call to "spellchecker2.pl" returns JavaScript to update the div from step 1

4) clicking on the spelling suggestion triggers the "hippie_search" JavaScript function which is responsible for creating a search URL suitable for the OPAC (which might include things like a session ID or an index to search)

None of the above 4 steps are specifically tied to the SirsiDynix HIP and should be transferable to other OPACs. I've put together a small sample HTML page that does nothing apart from pull in a suggestion using those 4 steps:

example001.html

If you do want to have a go with your own OPAC, please let me know — at some point I'll need people to register their libraries so that each can have their own dictionary, and I might start limiting the number of requests that any single IP address can make using the "demo" account. Also, it would be good to build up a collection of working implementations for different OPACs.

There are currently 6 responses to “HIPpie — how to build a dictionary”

Why not let us know what you think by adding your own comment! Your opinion is as valid as anyone elses, so come on... let us know what you think.

  1. 1 On July 11th, 2008, Chip Kruthoffer said:

    Dave, just a note to let you know we set this up this afternoon and it seems to be working very well. Thanks!

    http://staff.lanepl.org/?q=node/515

  2. 2 On July 11th, 2008, CH said:

    Is there any library that uses your scripts with PICA catalogues? I wonder if and how it works.

  3. 3 On July 11th, 2008, Dave Pattern said:

    Hi Chip — that's really cool! As soon as the scripts are ready for building custom dictionaries, I'll let you know.

    I'm not aware of any PICA catalogues using the script yet, but please feel free to try and figure out a way of making it work :-)

  4. 4 On August 7th, 2008, Colleen Medling said:

    Dave,

    This is very, very nice! I've added it to our development box and, with Admin's blessing, plan to have it available to our patrons. You are a gem.

  5. 5 On August 18th, 2008, Chip Kruthoffer said:

    Just FYI, got an email about this today:
    http://www.jaunter.com/

    (We'd looked at them ages ago.)

  6. 6 On August 19th, 2008, Dave Pattern said:

    Thanks Chip — I've never heard of Jaunter before! It looks like they're storing details of all successful searches.

Leave a Reply