Searching Group Meeting 2005-06-02
Searching Group Agenda for 2005, June 21
Meeting notes for this meeting
Chris's concerns (from his blog):
Currently the search works well, in that its functionally correct and returns correct and complete results. the main points ill be making are
- We have 2 audiences, the librarians and the public
- Most important, we return correct results, speed comes a close second
- The public do not care (for the most part) why/how the results are derived, just as long as they are right
- Innovative tools will win us a little fame
I cant stress the correct results enough, its very frustrating to search for an item, see that its on the shelf then find it isnt.
I agree that our goal is 1. accuracy and 2. speed. As I mentioned recently to a French audience, Koha's search results are currently 'correct and complete' for small collections. But when you have 150K records and a search returns an un-sorted list of (say) 5,000 items that match your search that's just not a feasable system. So in this case, NPL actually has a sorting routine for just such a condition (using current MARC setup) however, the overhead for performing it makes searches that large return results in 1-3 minutes. So for larger collections speed and accuracy are closely related. -- Joshua
1. Zebra
- Zebra is an indexing and search retrieval back end
- It indexes many formats including native MARC21 and UNIMARC
- The retrieval engine is a fully standards-complaint Z39.50 Server
- Z39.50 is not the _latest_ technology for searching ... but :
- Zebra when combined with yaz-proxy (http://www.indexdata.dk/yazproxy) can convert CQL to RPN queries (Z39.50 default query type)
So our questions are:
- is CQL (Common Query Language) the way to go?
http://www.loc.gov/z3950/agency/zing/cql/
- should it replace marc tables in Koha?
2. opensearch
- http://opensearch.a9.com
- http://dilettantes.blogspot.com/2005/06/gussying-up-opensearch.html
- http://liblime.com/opensearchportal.html
So with some additional elements added opensearch could be the ultimate federated searching engine. Mike and I have also found a way to pass CQL queries through opensearch (though that's not in the original spec).
Other References:
If you're new to Koha you may find these references useful in evaluating these topics.
- Demos with 150K Records:
Current OPAC: http://search.athenscounty.lib.oh.us Plucene demo: http://search.athenscounty.lib.oh.us/cgi-bin/koha/plucene/search.cgi?query=stephenson marc_words demo: mysqlfulltext demo: Zebra: http://liblime.com/zap/advanced.html