HIPpie — how to build a dictionary
Many thanks to those of you who've tested the code from yesterday! Those of you outside of the UK might want to see if this version works slightly faster for you:
The next thing I'll be looking at is how to optimise the spellchecker dictionary for each library. Some of you will already have read this in the email I sent out this morning or in the comment I left previously, but I'm thinking of attacking it this way:
1) Start off with a standard word list (e.g. the 1000 most commonly used English words) to create the spellcheck dictionary for your library, as the vast majority should match something on your catalogue.
2) Add some extra code to your HIP so that all successful keyword searches get logged. Those keywords can then be added to your dictionary.
It could even be that starting with an empty dictionary might prove to be more effective (i.e. don't bother with step 1) — just let the "network effect" of your users searching your OPAC generate the dictionary from scratch (how "2.0" is that?!)
To avoid any privacy issues, the code for capturing the successful keywords could be hosted locally on your own web server (I should be able to knock up suitable Perl and PHP scripts for you to use). Then, periodically, you'd upload your keyword list to HIPpie so that it can add the words to your spellchecker dictionary.
What about if you don't have SirsiDynix HIP? Well, as mentioned previously, the spellchecker has been implemented as a web service (more info here), and the HIP spellchecker makes use of that web service to get a suggestion. At the moment it only returns text or XML, but I'm planning to add JSON as an option soon. Also, if you have a look at the HIP stylesheet changes, you can see the general flow of the code:
1) insert a div with an id of "hippie_spellchecker" into the HTML
2) make a call to "http://library.hud.ac.uk/hippie_perl/spellchecker2.pl" with your library ID (currently "demo") and the search term(s) as the parameters
3) the call to "spellchecker2.pl" returns JavaScript to update the div from step 1
4) clicking on the spelling suggestion triggers the "hippie_search" JavaScript function which is responsible for creating a search URL suitable for the OPAC (which might include things like a session ID or an index to search)
None of the above 4 steps are specifically tied to the SirsiDynix HIP and should be transferable to other OPACs. I've put together a small sample HTML page that does nothing apart from pull in a suggestion using those 4 steps:
If you do want to have a go with your own OPAC, please let me know — at some point I'll need people to register their libraries so that each can have their own dictionary, and I might start limiting the number of requests that any single IP address can make using the "demo" account. Also, it would be good to build up a collection of working implementations for different OPACs.
posted in HIPpie | 6 Comments



