Searching Group Agenda for 2005, June 21

Meeting notes for this meeting

Chris's concerns (from his blog):

Currently the search works well, in that its functionally correct and returns correct and complete results. the main points ill be making are

  • We have 2 audiences, the librarians and the public
  • Most important, we return correct results, speed comes a close second
  • The public do not care (for the most part) why/how the results are derived, just as long as they are right
  • Innovative tools will win us a little fame

I cant stress the correct results enough, its very frustrating to search for an item, see that its on the shelf then find it isnt.

I agree that our goal is 1. accuracy and 2. speed. As I mentioned recently to a French audience, Koha's search results are currently 'correct and complete' for small collections. But when you have 150K records and a search returns an un-sorted list of (say) 5,000 items that match your search that's just not a feasable system. So in this case, NPL actually has a sorting routine for just such a condition (using current MARC setup) however, the overhead for performing it makes searches that large return results in 1-3 minutes. So for larger collections speed and accuracy are closely related. -- Joshua

1. Zebra
  • Zebra is an indexing and search retrieval back end
  • It indexes many formats including native MARC21 and UNIMARC
  • The retrieval engine is a fully standards-complaint Z39.50 Server
  • Z39.50 is not the _latest_ technology for searching ... but :
  • Zebra when combined with yaz-proxy ( can convert CQL to RPN queries (Z39.50 default query type)

So our questions are:

  • is CQL (Common Query Language) the way to go?
  • should it replace marc tables in Koha?
2. opensearch

So with some additional elements added opensearch could be the ultimate federated searching engine. Mike and I have also found a way to pass CQL queries through opensearch (though that's not in the original spec).

Other References:

If you're new to Koha you may find these references useful in evaluating these topics.

  • Demos with 150K Records:
         Current OPAC:
         Plucene demo:
         marc_words demo:
         mysqlfulltext demo:
