Switch to Solr RFC
From Zebra to Solr
Status: | unknown | |
Sponsored by: | St Etienne University | |
Developed by: | BibLibre | |
Expected for: | 2011-05-01 | |
Bug number: | Bug 5360 | |
Work in progress repository: | No URL given. | |
Description: | We propose to switch from zebra to solr as indexing engine. See our mail on koha-devel, and the following thread : http://lists.koha-community.org/pipermail/koha-devel/2010-October/034447.html
zebra is fast and embeds native z3950 server. But it has also some major drawbacks we have to cope with on our everyday life making it quite difficult to maintain.
I think that every one agrees that we have to refactor C4::Search. Indeed, query parser is not able to manage independantly all the configuration options. And usage of usmarc as internal for biblio comes with a serious limitation of 9999 bytes, which for big biblios with many items, is not enough. - this limitation is Koha's fault, not Zebra's. BibLibre investigated in a catalogue based on solr. A University in France contracted us for that development. This University is in relation with all the community here in France and solr will certainly be adopted by all the libraries France wide (citation required - web searching did not find any statement of intent for all French libraries to adopt Solr). We are planning to release the code on our git early spring next year and rebase on whatever Koha version will be released at that time 3.4 or 3.6.
Solr indexes with data with HTTP.
You can see the results on solr.biblibre.com and catalogue.solr.biblibre.com http://catalogue.solr.biblibre.com/cgi-bin/koha/opac-search.pl?q=jean http://solr.biblibre.com/cgi-bin/koha/admin/admin-home.pl you can log there with demo/demo lgoin/password http://solr.biblibre.com/cgi-bin/koha/solr/indexes.pl is the page where ppl can manage their indexes and links. a) Librarians can define their own indexes, and there is a plugin that fetches data from rejected authorities and from authorised_values (that could/should have been achieved with zebra but only with major work on xslt). b) C4/Search.pm count lines of code could be shrinked ten times. You can test from poc_solr branch on git://git.biblibre.com/koha_biblibre.git But you have to install solr. |
Concerns
- Does it run on free software java? The Getting Started page says "Gnu's GCJ is not supported and does not work with Solr."
- Solr uses at least 1Gb of memory by default and http://vufind.org/wiki/performance suggests more than 4Gb, which will require more expensive hosting than Koha 3.2. This concern is also supported by this post on Index Data's blog.
- Zebra provides Z39.50 and SRU targets. How will this be taken care of in a Solr solution?
- Zebra support should be retained at least until such a time that there is a general consensus to deprecate it.
- See the advantages and disadvantages section of Record Indexing and Retrieval Options for Koha for lists of advantages and disadvantages.
Meetings
How To Help
- Add your test query use cases.
- Local Searching.
- Simple User Test Queries for Koha with Solr/Lucene Common user queries specify no search options and would often need to be rewritten in the appropriate query language and/or use carefully chosen defaults previously set by the staff administrator to return the result set intended by the user query.
- Solr/Lucene Test Queries for Koha
- SimpleServer as a Z39.50/SRU Server Test Queries for Koha with Solr/Lucene
- Metasearch (Federated Search).
- Local Searching.
- Add your wishlist
- Test the solr branch wip/solr and provide feedback
Documentation for Solr
Record Indexing and Retrieval Options for Koha
The comparative options content originally posted here has been moved to Record Indexing and Retrieval Options for Koha.
Options Advantages and Disadvantages
See Options Advantages and Disadvantages.
Configuration
Record Indexing Server Aspect of Configuration
Solr/Lucene Record Indexing Server Aspect of Configuration
See the summary description of BibLibre work on Solr/Lucene Record Indexing Server Aspect of Configuration.
Z39.50/SRU Server Aspect of Configuration
SimpleServer and YAZ with Solr/Lucene Z39.50/SRU Server Aspect of Configuration
See the summary description of SimpleServer and YAZ with Solr/Lucene Z39.50/SRU Server Aspect of Configuration.
Z39.50/SRU Client Aspect of Configuration
Metasearch (Federated Search) Aspect of Configuration
Planning is still needed for any Koha OPAC metasearch functionality with Solr/Lucene.
Code Functionality
Record Indexing Aspect of Code Functionality
Data::SearchEngine::Solr and Solr/Lucene Record Indexing Aspect of Code Functionality
See the summary description of BibLibre work on Data-SearchEngine-Solr and Solr/Lucene Record Indexing Aspect of Code Functionality.
Local Record Retrieval Aspect of Code Functionality
Data::SearchEngine::Solr and Solr/Lucene Local Record Retrieval Aspect of Code Functionality
See the summary description of BibLibre work on Data-SearchEngine-Solr and Solr/Lucene Local Record Retrieval Aspect of Code Functionality.
Z39.50/SRU Server Aspect of Code Functionality
SimpleServer with Data::SearchEngine::Solr and Solr/Lucene Z39.50/SRU Server Aspect of Code Functionality
See the summary description of BibLibre work on SimpleServer with Data-SearchEngine-Solr and Solr/Lucene Z39.50/SRU Server Aspect of Code Functionality.
Metasearch (Federated Search) Aspect of Code Functionality
Planning is still needed for any Koha OPAC metasearch functionality with Solr/Lucene.
#kohahack12
debrief of #kohahack12
big work in progress
different tasks:
- inventory of what we have in solr biblibre and zebra community version
- move specific zebra stuffs in opac-search.pl in pm files
- write a Data::SearchEngine::Zebra like Solr cpan modules
- implements thesse in an abstract search layer
todo next:
- think about queries and how to handle both in koha UI
- begin to add use cases to see how the layer responds
References:
- how it works today in biblibre version: http://wiki.koha-community.org/wiki/Switch_to_Solr_RFC#how_it_works_today
- how it could work tomorrow in koha community version: https://github.com/clrh/wip-searchengine-layer
- Data::SearchEngine::Zebra somewhere here: https://github.com/biblibre
- needs review: https://docs.google.com/a/biblibre.com/drawings/d/1ZdsQsoThYgIVSgH3LqgRZy17xm9X7XkLT6RG3fDYCzs/edit
- BZ 5166 7759 and 7430
how it works today
- Goal Explain how solr works in koha (git.biblibre.com:koha_biblibre.git branch dev/solr)
- start of documentation http://descartes.biblibre.com/docs/solr/en/x14060.html#techsearchguide1
- Where now in Koha we explicit use Zebra http://wiki.koha-community.org/w/images/Zebrism.odt
External libs
- (from cpan)
- https://github.com/gphat/data-searchengine
- Data::SearchEngine::Solr
- Webservice::Solr
C4::Search::Engine.pm
- find_searchengine
- search
if C4::Context->preference("SearchEngine") == Solr
calls C4::Search::Engine::Solr::SimpleSearch
- index
- add_to_index_queue escape_string
C4::Search.pm
(should not be like that)
AddToIndexQueue
SimpleSearch
calls C4::Search::Engine::find_searchengine calls C4::Search::Engine::search
IndexRecord
same scheme as SimpleSearch but call "index"
C4::Search::Engine::Solr.pm
sub SimpleSearch $q, $filters, $params, $caller
- $q : the query
- $filters : filters
- $parms ( $page : page number $count : maximum number of results to return $sort : indexes to sort on ("idx_name (asc|desc)"))
- $facets
- open connection create a new connection: C4::Search::Engine::Solr (Data::SearchEngine::Solr->new)
- get facets
- apply filters
- launch query Data::SearchEngine::Query->new (page, count, query) returns Data::SearchEngine::Solr::Results
- get results Data::SearchEngine::Solr::Results
sub IndexRecord
- loop over identifiers
- getauthority or getbiblio (depending on parameters)
- instantiate a solr document Data::SearchEngine::Item->new
- loop over indexes (cf indexes table). If index is linked to a plugin, this plugin returns data to index. If it's a controlfield, it's indexed, otherwise we loop over subfields. If it's a date, normalize to ISO format.
- If it's a bibliographic record, get authorised fields and index labels create index passing it its values Data::SearchEngine::Item->set_value
- send it to solr Data::SearchEngine::Solr->add (uses Webservice::Solr to send everything to solr)
examples:
one result Data::SearchEngine::Solr::Results
$VAR1 = bless( {, referer: http://xxx
'spell_suggestions' => {},, referer: http://xxx
'query' => bless( {, referer: http://xxx
'count' => '30',, referer: http://xxx
'page' => 1,, referer: http://xxx
'query' => "encyclop\\x{e9}die poterie",, referer: http://xxx
'filters' => {},, referer: http://xxx
'facets' => {}, referer: http://xxx
}, 'Data::SearchEngine::Query' ),, referer: http://xxx
'spell_frequencies' => {},, referer: http://xxx
'pager' => bless( {, referer: http://xxx
'total_entries' => 1,, referer: http://xxx
'entries_per_page' => 30,, referer: http://xxx
'current_page' => 1, referer: http://xxx
}, 'Data::SearchEngine::Paginator' ),, referer: http://xxx
'items' => [, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '16204',, referer: http://xxx
'id' => 'biblio_16204', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_16204', referer: http://xxx
}, 'Data::SearchEngine::Item' ), referer: http://xxx
],, referer: http://xxx
'facets' => {, referer: http://xxx
'str_title-series' => [],, referer: http://xxx
'str_author' => [, referer: http://xxx
'30387',, referer: http://xxx
1,, referer: http://xxx
'39966',, referer: http://xxx
1,, referer: http://xxx
'Cosentino',, referer: http://xxx
1,, referer: http://xxx
'Cosentino Peter',, referer: http://xxx
1,, referer: http://xxx
'Peter',, referer: http://xxx
1,, referer: http://xxx
'Peter Cosentino',, referer: http://xxx
1, referer: http://xxx
],, referer: http://xxx
'str_subject' => [, referer: http://xxx
'1166',, referer: http://xxx
1,, referer: http://xxx
'373',, referer: http://xxx
1,, referer: http://xxx
"C\\x{e9}ramique",, referer: http://xxx
1,, referer: http://xxx
'technique',, referer: http://xxx
1, referer: http://xxx
],, referer: http://xxx
'str_holdingbranch' => [, referer: http://xxx
'CENTRE',, referer: http://xxx
1, referer: http://xxx
],, referer: http://xxx
'str_ccode' => [, referer: http://xxx
'1',, referer: http://xxx
1, referer: http://xxx
], referer: http://xxx
},, referer: http://xxx
'elapsed' => '0.0799260139465332', referer: http://xxx
}, 'Data::SearchEngine::Solr::Results' );, referer: http://xxx
6 results: Data::SearchEngine::Solr::Results
$VAR1 = bless( {, referer: http://xxx
'spell_suggestions' => {},, referer: http://xxx
'query' => bless( {, referer: http://xxx
'count' => '30',, referer: http://xxx
'page' => 1,, referer: http://xxx
'query' => 'poterie',, referer: http://xxx
'filters' => {},, referer: http://xxx
'facets' => {}, referer: http://xxx
}, 'Data::SearchEngine::Query' ),, referer: http://xxx
'spell_frequencies' => {},, referer: http://xxx
'pager' => bless( {, referer: http://xxx
'total_entries' => 6,, referer: http://xxx
'entries_per_page' => 30,, referer: http://xxx
'current_page' => 1, referer: http://xxx
}, 'Data::SearchEngine::Paginator' ),, referer: http://xxx
'items' => [, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '16501',, referer: http://xxx
'id' => 'biblio_16501', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_16501', referer: http://xxx
}, 'Data::SearchEngine::Item' ),, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '16204',, referer: http://xxx
'id' => 'biblio_16204', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_16204', referer: http://xxx
}, 'Data::SearchEngine::Item' ),, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '9861',, referer: http://xxx
'id' => 'biblio_9861', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_9861', referer: http://xxx
}, 'Data::SearchEngine::Item' ),, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '18307',, referer: http://xxx
'id' => 'biblio_18307', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_18307', referer: http://xxx
}, 'Data::SearchEngine::Item' ),, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '18535',, referer: http://xxx
'id' => 'biblio_18535', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_18535', referer: http://xxx
}, 'Data::SearchEngine::Item' ),, referer: http://xxx
bless( {, referer: http://xxx
'values' => {, referer: http://xxx
'recordid' => '19763',, referer: http://xxx
'id' => 'biblio_19763', referer: http://xxx
},, referer: http://xxx
'score' => 0,, referer: http://xxx
'id' => 'biblio_19763', referer: http://xxx
}, 'Data::SearchEngine::Item' ), referer: http://xxx
],, referer: http://xxx
'facets' => {, referer: http://xxx
'str_title-series' => [, referer: http://xxx
"Dialogues c\\x{e9}ramiques",, referer: http://xxx
2,, referer: http://xxx
"Dialogues c\\x{e9}ramiques (Dunkerque).",, referer: http://xxx
2, referer: http://xxx
],, referer: http://xxx
'str_author' => [, referer: http://xxx
'2980',, referer: http://xxx
2,, referer: http://xxx
'33518',, referer: http://xxx
2,, referer: http://xxx
'Dunkerque, Nord',, referer: http://xxx
2,, referer: http://xxx
"Mus\\x{e9}e d'art contemporain",, referer: http://xxx
2,, referer: http://xxx
'19913',, referer: http://xxx
1,, referer: http://xxx
'19914',, referer: http://xxx
1,, referer: http://xxx
'24720',, referer: http://xxx
1,, referer: http://xxx
'24721',, referer: http://xxx
1,, referer: http://xxx
'30387',, referer: http://xxx
1,, referer: http://xxx
'30807',, referer: http://xxx
1,, referer: http://xxx
'35619',, referer: http://xxx
1,, referer: http://xxx
'35620',, referer: http://xxx
1,, referer: http://xxx
'39966',, referer: http://xxx
1,, referer: http://xxx
'41085',, referer: http://xxx
1,, referer: http://xxx
'48684',, referer: http://xxx
1,, referer: http://xxx
'48685',, referer: http://xxx
1,, referer: http://xxx
'Cosentino',, referer: http://xxx
1,, referer: http://xxx
'Cosentino Peter',, referer: http://xxx
1,, referer: http://xxx
'Daniel de',, referer: http://xxx
1,, referer: http://xxx
'Daniel de Montmollin (textes)',, referer: http://xxx
1,, referer: http://xxx
"Dunkerque, Mus\\x{e9}e d'art contemporain",, referer: http://xxx
1,, referer: http://xxx
"Fr\\x{e8}re de Taiz\\x{e9}",, referer: http://xxx
1,, referer: http://xxx
'Hamid',, referer: http://xxx
1,, referer: http://xxx
'Montmollin',, referer: http://xxx
1,, referer: http://xxx
'Montmollin Daniel de',, referer: http://xxx
1,, referer: http://xxx
"Mus\\x{e9}e d'art contemporain, Dunkerque",, referer: http://xxx
1,, referer: http://xxx
'Ouazzani',, referer: http://xxx
1,, referer: http://xxx
'Ouazzani Thami',, referer: http://xxx
1,, referer: http://xxx
'Peter',, referer: http://xxx
1,, referer: http://xxx
'Peter Cosentino',, referer: http://xxx
1,, referer: http://xxx
'Pierre-Yves Videlier (illustrations )',, referer: http://xxx
1,, referer: http://xxx
'Potter',, referer: http://xxx
1,, referer: http://xxx
'Potter Tony',, referer: http://xxx
1,, referer: http://xxx
'Thami',, referer: http://xxx
1,, referer: http://xxx
'Tony',, referer: http://xxx
1,, referer: http://xxx
'Triki',, referer: http://xxx
1,, referer: http://xxx
'Triki Hamid',, referer: http://xxx
1,, referer: http://xxx
'Videlier',, referer: http://xxx
1,, referer: http://xxx
'Videlier Yves',, referer: http://xxx
1,, referer: http://xxx
'Yves',, referer: http://xxx
1,, referer: http://xxx
"avec la participation de Brigitte Barberi Daum ; photographies Christian Lignon, G\\x{e9}rard Dufrene ; conception et r\\x{e9}alisation graphique Amina Bennani",, referer: http://xxx
1,, referer: http://xxx
'textes Hamid Triki, Thami Ouazzani',, referer: http://xxx
1, referer: http://xxx
],, referer: http://xxx
'str_subject' => [, referer: http://xxx
"C\\x{e9}ramique",, referer: http://xxx
2,, referer: http://xxx
'1166',, referer: http://xxx
1,, referer: http://xxx
'18355',, referer: http://xxx
1,, referer: http://xxx
'18384',, referer: http://xxx
1,, referer: http://xxx
'1947-....',, referer: http://xxx
1,, referer: http://xxx
'1958-....',, referer: http://xxx
1,, referer: http://xxx
'19802',, referer: http://xxx
1,, referer: http://xxx
'33519',, referer: http://xxx
1,, referer: http://xxx
'33824',, referer: http://xxx
1,, referer: http://xxx
'35622',, referer: http://xxx
1,, referer: http://xxx
'373',, referer: http://xxx
1,, referer: http://xxx
'Camille',, referer: http://xxx
1,, referer: http://xxx
"Chagu\\x{e9}",, referer: http://xxx
1,, referer: http://xxx
'Expositions',, referer: http://xxx
1,, referer: http://xxx
'Maroc',, referer: http://xxx
1,, referer: http://xxx
'Safi',, referer: http://xxx
1,, referer: http://xxx
"Thi\\x{e9}baut",, referer: http://xxx
1,, referer: http://xxx
'Virot',, referer: http://xxx
1,, referer: http://xxx
'technique',, referer: http://xxx
1, referer: http://xxx
],, referer: http://xxx
'str_holdingbranch' => [, referer: http://xxx
'CENTRE',, referer: http://xxx
5,, referer: http://xxx
'AURENCE',, referer: http://xxx
1, referer: http://xxx
],, referer: http://xxx
'str_ccode' => [, referer: http://xxx
'1',, referer: http://xxx
6, referer: http://xxx
], referer: http://xxx
},, referer: http://xxx
'elapsed' => '0.0832719802856445', referer: http://xxx
}, 'Data::SearchEngine::Solr::Results' );, referer: http://xxx