Relevancy ranking in Summon

Yesterday, Tim Fletcher tweeted me a question about Summon:

How does Summon rank results? is there a logic?

…it’s not the kind of question that you can answer in 140 characters, but I quickly knocked off an email to Tim. This morning David F. Flanders suggested I should also blog the response.
So, first of all, a quick caveat: much of the following was gleaned from various presentations over the last couple of years or so and may not be 100% accurate (I’m particularly good at misunremembering stuff!)
The first time I saw Summon (back in early 2009), I believe Serials Solutions were still using the default relevancy ranking that comes with the Open Source Lucene software (which is documented here). In a nutshell, Lucene generates a score for each indexed item (that matches the search query) and then those items are sorted by score (in descending order) to produce the ranked results.
I’ve read quite a few times that the relevancy ranking engine in Lucene is regarded as one of the best, which might be one of the reasons why SirsiDynix recently moved Enterprise from using Brainware to Lucene.
When you mention Lucene, chances are Solr won’t be too far behind. Solr (which is also Open Source) extends Lucene to provide a host of extra features, including facets.
As Summon has developed, and in response to customer feedback, Serials Solutions have gradually tweaked the way their Lucene installation generates the scores by giving each result an additional boost (or reduction) depending on a variety of factors, including:

  • Currency – newer items are given a slight boost over older items
  • Content type – books, ebooks and journal articles get a boost to their scores, whilst newspaper articles and book reviews have their scores reduced
  • Local collections – things that come from the user’s library (e.g. books, repository items, local archives, etc) get a little boost

Additionally, the Summon search engine handles certain words and phrases differently. For example, Lucene normally treats the singular and plural version of words as the same, so searches for “africa hospital” and “africas hospitals” both bring back roughly the same number of results. However, Summon understands that “africa aid” isn’t the same thing as “africa aids“.
Given that few users go beyond the first page of results (I was told the exact figure last week, but it’s slipped from my memory — I think it was less than 5%?), Serials Solutions put a lot of effort into trying to ensure that the most relevant results appear on that first page. Given that the Summon master index is fast approaching 1,000,000,000 items, that’s no trivial task!
As they say, the proof of the pudding is in the eating, so feel free to run some searches on our Summon instance to see how well you think it ranks the results.

Summon 4 HN — bits o’ code

As part of the JISC Summon 4 HN project, we’ll be releasing some chunks of code that I’ve knocked together for our Summon implementation at Huddersfield.
The code will cover these areas:

  1. updating Summon with MARC record additions, updates and deletions from Horizon
  2. providing live availability information from Horizon without resorting to screen-scraping the OPAC
  3. customising 360 Link using jQuery

In theory, the first 2 might also be of interest to Horizon sites that are implementing an alternative OPAC (e.g. VuFind or AquaBrowser) where you need to set up regular MARC exports. The latter might be of interest to 360 Link sites in general.
Keep an eye on the Project Code section of the Summon 4 HN blog for details of the code 🙂


I couldn’t find a relevant photo for this blog post, so instead, let’s have another look at those infamous MIMAS #cupcakes from ILI2009 🙂
ili2009_013

Here comes Summ(er|on)

It’s probably a sign of getting old and decrepit, but this year has just flown by — it doesn’t seem like two minutes since we kicked off our implementation of Serials Solutions’ Summon and now it’s gone fully live (it actually went fully live halfway through the Mashed Library event we ran the other week).
woods_004
The bulk of the implementation was done and dusted by early January 2010, and the majority of the implementation time was spent populating 360 Link (the Serials Solutions link resolver) with our journal holdings — a task our Journals Team found much easier than when we implemented SFX back in 2006.  As the plan had always been to run Summon in parallel to MetaLib during the 2009/10 academic year, it meant we had lots of time to play and tweak. 
We flipped the link resolver over from SFX to 360 Link in late January and then formally “soft” launched Summon during the University’s Research Festival in early March.  Throughout the academic year, usage of Summon has been growing and the vast majority of the feedback has been positive 🙂
As part of the JISC Summon4HN Project, we’ll be documenting the implementation and releasing chunks of code that we hope might be of use to the community, including:

  • code for automating the export of deleted, new and updated MARC records from Horizon so that they can be imported into Summon (or VuFind, AquaBrowser, etc)
  • code for creating “dummy” journal title records (so that known journal titles can be easily located in Summon, e.g. American Journal of Nursing)
  • a basic mod_perl implementation of the DLF spec for exposing availability data for library collections
  • details of the various tweaks we’ve made to our 360 Link instance

Also, as part of the roll out of Summon, we’ve been revamping our E-Resources Wiki to provide a browseable list of resources — as with the journal titles, we’ve been dropping dummy MARC records into Summon so that known resources can be located via a search (e.g. Mintel Reports).