Record Indexing and Retrieval Options for Koha

From Koha Wiki

Jump to: navigation, search
Home
Home
Home
Home
Home
Home
Home
Home > Documentation > Search Engine
Koha > Technical > Development > RFCs
Koha > Technical > Development > RFCs > Koha components for RFCs > Aspects of multiple components RFCs > Architecture, internals, and plumbing RFCs
Koha > Technical > Development > RFCs > Koha components for RFCs > Aspects of multiple components RFCs > Record Formats RFCs > MARC Formats RFCs > MARC Authority data support RFCs
Koha > Technical > Development > RFCs > Koha components for RFCs > Aspects of multiple components RFCs > Record Formats RFCs > MARC Formats RFCs > MARC Bibliographic data support RFCs
Koha > Technical > Development > RFCs > Koha components for RFCs > Library System Modules RFCs > Catalogue RFCs > Searching RFCs
Koha > Technical > Development > RFCs > Koha components for RFCs > Library System Modules RFCs > Catalogue RFCs > Z39.50/SRU and OpenSearch Servers RFCs
Koha > Technical > Development > RFCs > Koha version targeted RFCs > RFCs not targeted at a particular Koha version

Comparative summary of record indexing and retrieval options for Koha.

[The following may be oversimplified or even mistaken in some important respects. Most of the information is taken from inspection of source code, testing, and imperfect knowledge of use. Some questions about options have been helpfully answered by Index Data and Knowledge Integration from October through early December 2010. As of 16 December 2010, some follow-up questions about options have not been answered by either Index Data nor Knowledge Integration. Some code functionality descriptions are guessed logically and may be especially liable to be mistaken, particularly in the interactions between YAZ and other Index Data products. Please correct mistakes and make other improvements.]

Contents

Lists of Software Used for Record Indexing and Retrieval Options

All options are mix and match options. No options which may be developed should be considered mutually exclusive.

Local Record Indexing and Retrieval Options

  • A words index table in the Koha SQL database is currently used in Koha as the nozebra option. This option has been widely used in at least Chinese Koha implementations because Zebra had not originally supported Unicode and other problems with Zebra.
  • Net::Z3950::ZOOM, YAZ, Pazpar2, and Zebra are currently used for Koha.
  • Data::SearchEngine::Solr and Solr/Lucene are being used for the development by BibLibre for their Switch to Solr RFC.

Z39.50/SRU Server Options

  • Zebra and YAZ are currently being used for Koha.
  • Net::Z3950::SimpleServer and YAZ with Data::SearchEngine::Solr, and Solr/Lucene.
  • Net::Z3950::Simple2ZOOM and YAZ with Data::SearchEngine::Solr and Solr/Lucene.
  • JZKit, CQL-Java with Data::SearchEngine::Solr and Solr/Lucene.
    • JZKit from Knowledge Integration provides in one codebase functionality for which Index Data has several products dependent upon YAZ. JZKit is the equivalent of YAZ and some products complementing YAZ with JZKit having less functionality in server side query support and more functionality for metasearch available under a free software, AGPL 3 license.

Metasearch (Federated Search) Options

  • Net::Z3950::ZOOM, YAZ and Pazpar2 are currently being used for Koha.
    • Metaproxy is added to option details described below.
  • JZKit and CQL-Java.
  • Net::Z3950::ZOOM, YAZ, Data::SearchEnginge::Solr, Solr/Lucene, Data::SearchEngine::*, LWP, and new Koha metasearch management code.

Options Advantages and Disadvantages

The options are not currently listed with enough permutations to be mutually exclusive. The options are currently listed as mix and match options.

Local Record Indexing and Retrieval Options Advantages and Disadvantages

Words Index Table in the Koha SQL Database for Local Record Indexing and Retrieval Advantages and Disadvantages

Advantages of Words Index Table in the Koha SQL Database for Local Record Indexing and Retrieval
  • The nozebra option using the nozebra table for indexing is simple to implement and run.
Disadvantages of Words Index Table in the Koha SQL Database for Local Record Indexing and Retrieval
  • The existing Koha implementation using the nozebra table in the Koha SQL database is extraordinarily overly simplistic.
    • Support for many library standards based queries and other sophisticated queries is limited at best.

Net::Z3950::ZOOM, YAZ, Pazpar2, and Zebra for Local Record Indexing and Retrieval Advantages and Disadvantages

Advantages of Net::Z3950::ZOOM, YAZ, Pazpar2, and Zebra for Local Record Indexing and Retrieval
  • Supports library standards even for local record indexing as used in Koha.
  • Many sophisticated options for record indexing and queries are available in Zebra which have yet to be configured in Koha.
  • Mature and widely used free software Z39.50/SRU server with the exception of the more recently added support for Unicode searching.
    • Unicode searching uses libicu which is the C version of International Components for Unicode (ICU), also used in Java and thus used by Solr/Lucene via icu4j. However, the Zebra specific wrapper for libicu lacks maturity unlike the much more widely used and tested Solr/Lucene wrapper for icu4j.
  • Pazpar2 supports metasearch (federated search) for Z39.50/SRU and other database targets including Solr/Lucene. Pazpar2 metasearching is not dependent upon Zebra and does not exclude the possibility of metasearch with a non-Zebra option for local record indexing. See #Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy for Metasearch Advantages and Disadvantages further below.
  • Pazpar2 supports primitive record merging/deduplication and FRBRisation clustering.
  • Compiled C code provides very fast performance with the exception of problems when using the ICU with Zebra.
Disadvantages of Net::Z3950::ZOOM, YAZ, Pazpar2, and Zebra for Local Record Indexing and Retrieval
  • Zebra has bugs when configured for Unicode searching which uses the ICU. Zebra support for the ICU was added in 2008 and thus is still an immature implementation. As stated in the Unicode support qualification for the maturity advantage of Zebra, usage of libicu is wrapped in immature translator code specific to Zebra. The Zebra specific wrapper for Unicode support using libicu is much less widely used and less well tested than the very widely used and tested wrapper for Unicode support in Solr/Lucene, using icu4j.
    • CPU usage increases greatly when using the ICU for multiple concurrent queries. [Needs quantification.] Perhaps this problem is similar to the previously fixed Zebra bug 1978 where some attribute option which is problematically inefficient under the ICU is being sent to Zebra.
    • Scan queries used in Koha for facets do not return results when using the ICU. See Zebra bug 2048, ICU inverse maps does not work. The bug was marked won't fix on 21 October 2010 by Sebastian Hammer as part of a systematic closing of some open bugs for which a fix was neither readily apparent nor funded for fixing after having been open for more than a year. Index Data is open to fixing the bug with support funding. A workaround could be to parse the facets out of the full records returned in the result set without using a scan query.
    • Truncation other than right truncation returns an unsupported truncation attribute error, error 120, when using the ICU. See Zebra bug 2049, Truncation=regular does not work with ICU normalized terms. A patch to fix right truncation was committed in 2008. The remaining unfixed aspects of the bug were marked won't fix on 21 October 2010 by Sebastian Hammer as part of a systematic closing of some open bugs for which a fix was neither readily apparent nor funded for fixing after having been open for more than a year. Index Data is open to fixing the remaining aspects of the bug with support funding.
    • Position queries return false matches as if position had not been queried or as if position anywhere had been specified.
    • Completeness queries return an unsupported use attribute error, error 114, when using the ICU. Aside from actually working correctly, an unsupported completeness attribute error, error 122, would be a more appropriate error.
  • Updating Zebra indexes sometimes fails with insufficient error reporting.
  • Memory leaks have been reported for Zebra.
    • Zebra frequently fails to close persistent connections properly.
    • Some users have opted to run Zebra from cron and on demand instead of as a daemon to control memory usage.
  • Pazpar2 disadvantages especially in relying upon CCL are described further below in #Disadvantages of Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy for Metasearch.
  • No web based interface has been written for managing configuration files limiting the accessibility of configuration to users who know how to edit configuration files on the command line.
  • A support contract for the Koha community collectively with Index Data would be needed to fix Zebra bugs and guide best practise implementation for Koha.

Data::SearchEngine::Solr and Solr/Lucene for Local Record Indexing and Retrieval Advantages and Disadvantages

Advantages of Data::SearchEngine::Solr and Solr/Lucene for Local Record Indexing and Retrieval
  • Web based configuration of indexes using Solr/Lucene would be accessible to users for whom editing configuration files on the command line would not be an option.
  • Lucene might be used for some shortcomings of Solr/Lucene.
  • Java based Solr/Lucene is liable to be more accessible to more Koha Perl developers than for source code modifications than C based Zebra.
  • Metasearch (federated search) of other Solr/Lucene database targets is supported in Solr/Lucene 1.4.
  • Solr/Lucene is an Apache Foundation project with a large development and support community.
    • Wikipedia's use of Lucene shows the scalability of Lucene for a relatively simple data schema.
    • Simple popular OPACs such as VuFind, BlackLight, and Open Library use Solr/Lucene.
  • Data::SearchEngine::Solr provides abstraction independent of Koha.
Disadvantages of Data::SearchEngine::Solr and Solr/Lucene for Local Record Indexing and Retrieval
  • Out of memory errors are a common problem for Solr/Lucene.
  • Hardware requirements for Koha users would be significantly greater with Solr/Lucene than for using Zebra. [Needs comparative quantification.]
    • RAM would have requirements significantly greater than Zebra.
    • An additional computer would be needed to run a Java server, such as Tomcat, at an acceptable performance level for many Koha users' collections when using Solr/Lucene.
  • Query support limitations.
    • Ordered proximity searches are not supported by Solr/Lucene except phrase searches.
    • Left anchored queries may not be supported by Solr/Lucene.
    • Completeness queries may not be supported by Solr/Lucene.

[All those things, though currently supported with direct input with Zebra for local record indexing, are not promoted in the web interface of koha... So would not be a big loss... Completeness works only on phrase search, and not with ICU enabled.] Advantages and disadvantages include not only functionality which is currently well implemented but also underused and unused functionality which enables future features.

  • Metasearch (federated search) using Solr/Lucene is limited to other Solr/Lucene database targets excluding the wealth of Z39.50/SRU (the library database API) and other database targets. Consequently, most database targets in the library world would be excluded from Solr/Lucene metasearch.
  • Some popular uses of Solr/Lucene are an insufficient recommendation.
    • Wikipedia's use of Lucene directly shows the scalability of Lucene with a relatively simple data schema but not Solr/Lucene. Donations to the Wikimedia foundation for Wikipedia are sufficient to afford sufficient hardware to make Wikipedia use of Lucene scalable.
    • Popular OPACs using Solr/Lucene are overly simplistic. Full library automation systems which are not using Solr/Lucene are needed for most library tasks at libraries which use one of the popular OPACs using Solr/Lucene. The popular OPACs using Solr/Lucene are thus running along side full library automation systems which are not using Solr/Lucene. Problems with the popular OPACs such as lack of authority control and a faceting model which ignores library standards are not problems which are necessarily intrinsic to Solr/Lucene. The criticism merely qualifies the advantage of the existence of popular OPACs using Solr/Lucene.
      • [I (hdl) donot think that blacklight or Vufind search is overly simplistic compared to other propriertary ils... or even that koha advanced search would be really over simplified by this change.] Some aspects of the current record indexing and retrieval model in Koha is overly simplistic in a comparable manner to popular OPACs using Solr/Lucene and it would be unfair to criticise BibLibre's work on Solr/Lucene for replicating existing Koha functionality which can always be improved later.
  • No data type is available for storing binary data which prevents storing complete MARC records as is in MARC communications, NISO Z39.2, ISO 2709, format directly in Solr/Lucene without corruption. The problem is discussed in the blacklight-development list thread storing marc21 in solr. Encapsulating MARC records in base 64 encoding for Solr/Lucene or using the binary data type in Lucene bypassing Solr/Lucene may be an alternatives but using the blob data type in the Koha SQL database may be better.
  • Data::SearchEngine is a relatively new Perl module which has not yet withstood the test of time with wide usage.

Z39.50/SRU Server Options Advantages and Disadvantages

Zebra and YAZ for a Z39.50/SRU Server with Solr/Lucene Advantages and Disadvantages

See #Net::Z3950::ZOOM, YAZ, Pazpar2, and Zebra for Local Record Indexing and Retrieval Advantages and Disadvantages above.

SimpleServer and YAZ for a Z39.50/SRU Server with Solr/Lucene Advantages and Disadvantages

Advantages of SimpleServer and YAZ for a Z39.50/SRU Server with Solr/Lucene
  • Perl based Net::Z3950::SimpleServer is accessible to Perl developers contributing to Koha for further development.
Disadvantages of SimpleServer and YAZ for a Z39.50/SRU Server with Solr/Lucene
  • Parsing queries is left as an exercise to the programmer.
  • Some query mapping to Solr/Lucene may be limited by lack of underlying Solr/Lucene support. Additional result set filtering in Koha could overcome some Solr/Lucene limitations.
    • Bib-1 position attributes may be unsupportable as Solr/Lucene queries for lack of underlying Solr/Lucene support.
    • Bib-1 completeness attributes may be unsupportable as Solr/Lucene queries for lack of underlying Solr/Lucene support.
  • Connector support PQF query rewriting as Solr/Lucene query limitations.
    • Ordered proximity queries are unsupported by Solr/Lucene except for phrase queries.
  • A support contract for the Koha community collectively with Index Data might be needed to learn how to use some insufficiently documented features of SimpleServer.

Simple2ZOOM and YAZ for a Z39.50/SRU Server with Solr/Lucene Advantages and Disadvantages

Advantages of Simple2ZOOM and YAZ for a Z39.50/SRU Server with Solr/Lucene
  • Perl based Net::Z3950::Simple2ZOOM is accessible to Perl developers contributing to Koha for further development.
  • Evergreen uses Simple2ZOOM.
Disadvantages of Simple2ZOOM and YAZ for a Z39.50/SRU Server with Solr/Lucene
  • Only index (Bib-1 use attributes) mapping to Solr/Lucene is supported.
    • Bib-1 structure attributes, such as the distinction between word list and phrase queries, are unsupported.
    • Bib-1 relation attributes, such as range queries, are unsupported.
    • Bib-1 position attributes are unsupported. (May be a problem for lack of underlying Solr/Lucene support.)
    • Bib-1 truncation attributes are unsupported.
    • Bib-1 completeness attributes are unsupported. (May be a problem for lack of underlying Solr/Lucene support.)
  • Connector support PQF query rewriting as Solr/Lucene query limitations.
    • Support for OR and NOT connectors are unknown.
    • Proximity connector unsupported.
  • Untested and incompletely defined solution.
    • A support contract for the Koha community collectively with Index Data might be needed for development to make it a sufficiently comparable alternative to Zebra.
  • Evergreen use of Simple2ZOOM has limitations inherent to Simple2ZOOM. Very different record indexing and retrieval models between Koha and Evergreen provide little helpful opportunity for Koha to borrow code.

JZKit for a Z39.50/SRU Server with Solr/Lucene Advantages and Disadvantages

Advantages of JZKit for a Z39.50/SRU Server with Solr/Lucene
  • Java based matching Java based Solr/Lucene.
  • Recommended by IndexData for a Java Z39.50/SRU toolkit.
  • Ian Ibbotson, developer of JZKit, has a long history of work on Z39.50 as well as collaborating with Index Data developers such as Adam Dickmeiss.
  • JZKit is integrated into some library automation systems. [Knowing which ones would be helpful.]
Disadvantages of JZKit for a Z39.50/SRU Server with Solr/Lucene
  • Java based.
    • No Perl bindings for integrating with Perl based Koha if that would ever be desirable.
    • Few Java experts in the Perl based Koha community for modifying the source code.
  • Only index (Bib-1 use attributes) mapping to Solr/Lucene is supported.
    • Bib-1 structure attributes, such as the distinction between word list and phrase queries, are unsupported.
    • Bib-1 relation attributes, such as range queries, are unsupported.
    • Bib-1 position attributes are unsupported. (May be a problem for lack of underlying Solr/Lucene support.)
    • Bib-1 truncation attributes are unsupported.
    • Bib-1 completeness attributes are unsupported. (May be a problem for lack of underlying Solr/Lucene support.)
  • Connector support PQF query rewriting as Solr/Lucene query limitations.
    • Support for OR and NOT connectors has not been identified in the source code for the publicly released core toolkit.
    • The proximity connector is unsupported.
  • Never developed as a product.
  • No documentation.
  • Some developed features may not have been integrated into the publicly released core toolkit. (This disadvantage may apply to every other options, such as those from Index Data, but unlike Index Data there is no non-free complementary software from Knowledge Integration.)
  • A support contract for the Koha community collectively with Knowledge Integration would be needed for development to make it a sufficiently comparable alternative to Zebra.

Metasearch (Federated Search) Options Advantages and Disadvantages

Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy for Metasearch Advantages and Disadvantages

Advantages of Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy for Metasearch
  • Supports many different types of database targets including Solr/Lucene targets.
  • Supports primitive record merging/deduplication and FRBRisation clustering.
  • Compiled C code provides very fast performance.
Disadvantages of Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy for Metasearch
  • CCL is the base query language used by Pazpar2 for conversion into other query languages.
    • The level of support for query conversion to other query languages is limited for non Z39.50/SRU targets.
  • Some complementary software from Index Data is not available under a free software license. (The problem is a problem for having a total free software solution. As programs separate from Koha using a standard communication protocol, there would be no known adverse impact on the Koha software license when using identified non-free software from Index Data which complements Pazpar2.)
    • Database based Torus administration for Pazpar2 is not available with a free software license.
  • A support contract for the Koha community collectively with Index Data would be needed to guide best practise implementation for Koha and provide possible additional development.

The record merging/de-duplication model in Pazpar2 is not modeled on the real world with imperfect data. Pazpar2 uses an all or nothing model for match points with inadequate normalisation which is liable to produce false matches and miss valid matches.

A good manifestation record merging model is available from work at MELVYL, union catalogue for the University of California libraries. Karen Coyle has provided the MELVYL record merging algorithm used for books as part of her work with Open Library. The algorithm includes the weightings which were not published in MELVYL online catalog reference manual.

JZKit for Metasearch Advantages and Disadvantages

Advantages of JZKit for Metasearch
  • Supports different types of database targets including Solr/Lucene targets.
  • Query limitations identified for JZKit as a Solr/Lucene gateway would not apply as a metasearch client to Z39.50/SRU targets.
  • Administration of metasearch targets would use an SQL database for which the supporting software is available with a free software AGPL 3 license.
  • Recommended by IndexData for a Java Z39.50/SRU toolkit.
  • Ian Ibbotson, developer of JZKit, has a long history of work on Z39.50 as well as collaborating with Index Data developers such as Adam Dickmeiss.
  • JZKit is integrated into some library automation systems. [Knowing which ones would be helpful.]
Disadvantages of JZKit for Metasearch
  • Java based.
    • No Perl bindings for integrating with Perl based Koha. The utility of sending queries via Net::Z3950::ZOOM to YAZ and then to JZKit and obtaining metasearch result sets is unknown. If JZKit would be possible to use via Net::Z3950::ZOOM to YAZ, and then to JZKit; the JZKit limitations for mapping queries to Solr/Lucene queries would apply.
    • Few Java experts in the Perl based Koha community for modifying the source code.
  • Never developed as a product.
  • No documentation.
  • Some developed features may not have been integrated into the publicly released core toolkit. (This disadvantage may apply to every other options, such as those from Index Data, but unlike Index Data there is no non-free complementary software from Knowledge Integration.)
  • A support contract for the Koha community collectively with Knowledge Integration would be needed to understand how to configure and run JZKit properly as well as obtaining features not integrated into the public release.

Net::Z3950::ZOOM, YAZ, Data::SearchEnginge::Solr, Solr/Lucene, Data::SearchEnginge::*, LWP, and New Koha Code for Metasearch Advantages and Disadvantages

Advantages of Net::Z3950::ZOOM, YAZ, Data::SearchEnginge::Solr, Solr/Lucene, Data::SearchEnginge::*, LWP, and New Koha Code for Metashearch
  • Maximum control would be available for allowing complex queries automatically optimised for particular database targets.
  • Support would exist for different types of database targets including Solr/Lucene targets.
  • Perl based metasearch management code would maximise accessibility to Perl developers contributing to Koha for further development.
  • Some experience already exists in the Koha community using YAZ as a metasearch client.
Disadvantages of Net::Z3950::ZOOM, YAZ, Data::SearchEnginge::Solr, Solr/Lucene, Data::SearchEnginge::*, LWP, and New Koha Code for Metasearch
  • Perl based.
    • Interpreted Perl code may be too slow relative to compiled C code in Pazpar2 and Metaproxy.
  • Community of pre-existing metasearch users would not exist for new metasearch management code written for Koha.
  • Optimising use of YAZ for efficiency as a metasearch client may not be widely understood.
  • A support contract for the Koha community collectively with Index Data would be needed to understand how to use YAZ most efficiently with new Koha metasearch management code.

Configuration

[Configuration options which are not especially important for distinguishing comparative differences of capabilities between between possible options for software to implement have been omitted. A complete summary of configuration options including, tokenisation configuration, stop words, etc. may also be helpful. Please add summaries of omitted configuration options if you are inclined.]

Record Indexing Server Aspect of Configuration

Words Index Table in the Koha SQL Database Record Indexing Server Aspect of Configuration

Populating the Koha nozebra table with words from MARC records uses the configuration of the kohafield and seealso columns from the Koha MARC frameworks is based on the MARC to Koha SQL database column mapping. The MARC to Koha SQL database column mapping uses a single MARC field and subfield combination for the Koha SQL column mapping. The single MARC field and subfield combination is expressed in the Koha MARC frameworks using the kohafield column and extended by the seealso column.

See the Koha source code example of a Koha MARC framework. The staff administration user interface uses admin/koha2marclinks.pl for basic MARC to Koha SQL database column mapping supported by the Koha MARC frameworks.

The staff administration interface has an interface for editing the kohafield and seealso column values for Koha MARC frameworks using admin/marc_subfields_structure.pl for editing Koha MARC bibliographic frameworks and admin/auth_subfields_structure.pl for editing Koha MARC authorities frameworks.

Changes to the index mapping are updated using the command line script misc/migration_tools/rebuild_nozebra.pl.

Zebra Record Indexing Server Aspect of Configuration

Zebra configuration includes associating Zebra indexes based on MARC record elements used with particular Z39.50 use attributes.

See Koha source code examples of etc/zebradb/marc_defs/marc21/biblios/record.abs configuration for Zebra indexes and etc/zebradb/biblios/etc/bib1.att for Bib-1 Z39.50 attributes. XPath based indexing for .abs files would be more flexible than the deprecated work in Koha. Although Koha has not yet changed to XPath based indexing, two forks of Koha have used XPath indexing to allow extended functionality in an XML record model which contains MARCXML records within differing container partitions.

Changes to the index mapping are updated using the command line script misc/migration_tools/rebuild_zebra.pl.

Solr/Lucene Record Indexing Server Aspect of Configuration

Indexes could be configured by data-config.xml in Solr/Lucene. Solr/Lucene has a web interface for configuration.

In the BibLibre proof of concept and work in progress for Solr/Lucene based record indexing, the Koha SQL database indexes table is used to store user specified indexes and the mapping of user specified MARC subfields from user specified fields to particular indexes. The staff administration interface uses solr/indexes.pl to specify indexes and solr/mappings.pl to specify mappings.

Changes to the Koha index mappings are updated on the fly [i.e. no cronjob required anymore] or using the command line script (if some batch process is required) misc/migration_tools/rebuild_solr.pl .

Z39.50/SRU Server Aspect of Configuration

Zebra and YAZ Z39.50/SRU Server Aspect of Configuration

Zebra Z39.50/SRU Explain records are configured by an XML file which need to be updated to specify options supported if supported options change. See the Koha source code example of etc/zebradb/explain-biblios.xml.

YAZ configuration includes CQL to PQF mappings for the Zebra server to support SRU CQL query language. The mappings can be used when called by the Zebra server for SRU queries or in using YAZ as a client to query any Z39.50 server with SRU CQL queries passed to YAZ for translation. See the Koha source code example of etc/zebradb/cql.properties.

The YAZ retrieval facility is configured in yazgfs.xml with appropriate linked XSLT stylesheets to transform the record syntax returned by Zebra into an appropriate record syntax such as MARC 21, MARCXML, Dublin Core, etc. See the Koha source code example of yazfgs in etc/koha-conf.xml.

SimpleServer and YAZ with Solr/Lucene Z39.50/SRU Server Aspect of Configuration

CQL to PQF mapping for SimpleServer is configured by configuring the pqf.properties file from YAZ.

Explain records are configured in yazgfs.xml from YAZ.

Creating other possible configuration files is left as an exercise for the implementing programmer.

Simple2ZOOM with Solr/Lucene Z39.50/SRU Server Aspect of Configuration

PQF to CQL mapping for Simple2ZOOM is configured in a Simple2ZOOM XML configuration file described in Net::Z3950::Simple2ZOOM::Config. The CQL index mapping specified by the index element nested within the map element in the configuration file would be used to specify a Solr/Lucene index.

The YAZ retrieval facility may need to be specially configured in yazgfs.xml with appropriate linked XSLT stylesheets to transform whatever record syntax is returned by Solr/Lucene into an appropriate record syntax such as MARC 21, MARCXML, Dublin Core, etc. Special XSLT stylesheets may not be needed if Koha would intercept the Solr/Lucene query.

Explain records are configured in yazgfs.xml from YAZ.

JZKit and CQL-Java with Solr/Lucene Z39.50/SRU Server Aspect of Configuration

Solr/Lucene indexes for Z39.50 Bib-1 use attributes in JZKit would be configured in config/crosswalks/QueryModel/bib-1.xml.

JZKit CQL support would be provided by CQL-Java. CQL to PQF mappings for CQL-Java are contained in etc/pqf.properties.

Examination of the JZKit source code has been insufficient to know whether Z39.50 server Explain records are generated from the actual JZKit server configuration or from configuring an a2j.properties file. Having Explain records based on the actual support configured on the server would be a significant improvement over the static Explain configuration files used in Index Data products.

Z39.50/SRU Client Aspect of Configuration

YAZ Z39.50/SRU Client Aspect of Configuration

CCL to PQF mappings for the YAZ client to support the CCL query language are configured in a ccl.properties file. See the Koha source code example of etc/zebradb/ccl.properties.

JZKit and CQL-Java Z39.50/SRU Client Aspect of Configuration

JZKit supports CCL via CQL-Java client support, however, the means of configuration has not been identified. No investigation has been done into how CQL-Java manages CCL queries or specifically how CCL might be configured for CQL-Java.

Metasearch (Federated Search) Aspect of Configuration

YAZ Copy Cataloguing Metasearch Aspect of Configuration

Koha copy cataloguing Z39.50 targats are configured from cataloguing/z3950_search.pl and stored in the z3950servers table in the Koha SQL database.

Pazpar2 and Metaproxy Metasearch Aspect of Configuration

Pazpar2 can be configured for metasearch functionality to query multiple databases in addition to the Koha Zebra records database and not merely Z39.50 targets. Additional database targets may be added to the Koha source code example of etc/pazpar2/pazpar2.xml and each additional target would have an appropriate database target configuration file such as the example of the Koha Zebra target database configuration etc/pazpar2/koha-biblios.xml. In the current version of Pazpar2, Solr/Lucene databases can be added as targets.

Pazpar2 configuration can be used for primitive FRBRisation intending to group various manifestations of a work together. See the Koha source code example of etc/pazpar2/marc21-work-groups.xsl.

Pazpar2 configuration includes primitive record merging/de-duplication rules. See the Koha source code examples in etc/pazpar2/koha-biblios.xml and etc/pazpar2/marc21-work-groups.xsl.

Metaproxy configuration uses XML files for each database target with parameters sufficient for Metaproxy. The source code has an example of target authentication contained in etc/example.target-auth. XSLT record stylesheets are identified in etc/retrieval-info.xml. No investigation has been done into how to configure Metaproxy and Pazpar2 to work well together.

JZKit Metasearch Aspect of Configuration

JZKit has some functionality such as metasearch similar to Pazpar2 from Index Data. Configuration of that functionality is unknown except that configuration of database targets queried is stored in an SQL database.

Code Functionality

Record Indexing Aspect of Code Functionality

Words Index Table in the Koha SQL Database Record Indexing Aspect of Code Functionality

[Someone interested may wish to add a summary of code functionality for the nozebra option.]

Zebra Record Indexing Aspect of Code Functionality

Zebra indexes are updated by Koha scripts, such as misc/bin/zebraqueue_daemon.pl when records are added, updated, or deleted.

Data::SearchEngine::Solr and Solr/Lucene Record Indexing Aspect of Code Functionality

Solr/Lucene indexes would be updated by Koha scripts when records are added, updated, or deleted. In the BibLibre work in progress for Solr/Lucene based indexing, calls to C4::Search, such as C4::Search::IndexRecord would read the index configuration in the SQL database and then call Data::SearchEngine to update the index.

There is a major commit in the BibLibre git branch for the BibLibre Solr/Lucene proof of concept for which the diff gives a good idea of the starting approach taken by BibLibre.

Local Record Retrieval Aspect of Code Functionality

Words Index Table in the Koha SQL Database Local Record Retrieval Aspect of Code Functionality

[Someone interested may wish to add a summary of code functionality for the nozebra option.]

Net::Z3950::ZOOM, YAZ, Pazpar2, and Zebra Local Record Retrieval Aspect of Code Functionality

CCL queries are generated from opac/opac-search.pl and staff catalogue/search.pl forms. The CCL queries are passed to C4::Search.

In addition to the advanced search form, a single text input element is provided in the OPAC and staff client for typing queries to be passed to C4::Search. CCL, CQL, and PQF queries may be passed to C4::Search. C4::Search::buildQuery unnecessarily removes much potentially meaningful syntax passed from typed queries and should be fixed.

C4::Search calls C4::Search::PazPar2 for returning grouped and deduplicated records from Zebra and possible additional database targets as configured. CCL queries are sent via Net::Z3950::ZOOM to YAZ and then to Pazpar2. Pazpar2 converts the CCL queries to PQF queries for Zebra directly based on its own configuration or indirectly via YAZ.

Queries are passed from YAZ or Pazpar2 to YAZ, and then to Zebra. Zebra checks against its indexes for matches to queries. Zebra returns result sets or other appropriate responses to YAZ for presentation after possible filtering by Pazpar2. Additional processing of the result sets is done by Koha with the possible addition of initiating Z39.50 scan queries for faceting.

Data::SearchEngine::Solr and Solr/Lucene Local Record Retrieval Aspect of Code Functionality

Abstracted queries would be generated from the Koha user interface for opac/opac-search.pl and staff catalogue/search.pl forms. See the partial work in opac/opac-search.pl diff for BibLibre's work in progress for Solr/Lucene.

In addition to the advanced search form, a single text input element is provided in the OPAC and staff client for typing queries to be passed to C4::Search.

Queries would be passed to C4::Search. Solr/Lucene search syntax would be generated for queries to the local Solr/Lucene using C4::Search::Solr. If properly abstracted, the same query could be used to generate queries in other query languages such as CCL, PQF, CQL, etc. as appropriate for metasearch or some other purpose. Queries for the local record index would be passed from C4::Search::Solr to Solr/Lucene.

Solr/Lucene would check indexes for matches to queries. Solr/Lucene would return the result sets including facets or other appropriate responses. Additional processing of result sets would be done by Koha. BibLibre Solr/Lucene work in progress is using Data::Pagination to control pagination of the result set.

Z39.50/SRU Server Aspect of Code Functionality

Zebra and YAZ Z39.50/SRU Server Aspect of Code Functionality

Zebra receives Explain requests from clients and returns Explain records to requesting clients.

The Zebra server receives Z39.50/SRU queries from Z39.50/SRU clients. Zebra passes CQL queries to YAZ to be rewritten as PQF queries.

Zebra checks against its indexes for matches to queries. Zebra returns result sets or other appropriate responses, such as error codes, to YAZ for transformation. YAZ uses XSLT transformations to transform the the record syntax returned by Zebra into the record syntax requested by the 39.50/SRU client. YAZ returns the transformed records to Zebra requesting the Z39.50/SRU clients.

SimpleServer with Data::SearchEngine::Solr and Solr/Lucene Z39.50/SRU Server Aspect of Code Functionality

SimpleServer would receive Explain requests from clients and return Explain records to requesting clients.

SimpleServer would receive Z39.50/SRU queries from Z39.50/SRU clients. SimpleServer would pass CQL queries to YAZ to be rewritten as PQF queries.

SimpleServer provides a hash for the query as part of a Net::Z3950::APDU::Query object. Parsing the query directly is an alternative to using the hash provided. See BibLibre work on a PQF parser for SimpleServer as a gateway to C4::Search in misc/z3950.pl as part of their Koha git repository for Solr/Lucene work and in lib/Regexp/Grammars/Z3950/RPN.pm for their rg-z3950-rpn git repository.

SimpleServer would pass PQF queries to Koha. Koha would rewrite the queries for Data::SearchEngine::Solr and pass the queries to Data::SearchEngine::Solr. Data::SearchEngine::Solr would write the queries as Solr/Lucene queries and pass the queries to Solr/Lucene.

Solr/Lucene would check against its indexes for matches to queries. Solr/Lucene would return result sets or other appropriate responses to Koha.

Koha would retrieve full bibliographic records matching the query result set IDs from the SQL database. Koha would return result sets or other appropriate responses, such as error codes, to SimpleServer. SimpleServer would return result sets or other appropriate responses, such as error codes, to the requesting Z39.50/SRU clients.

Simple2ZOOM with Data::SearchEngine::Solr and Solr/Lucene Z39.50/SRU Server Aspect of Code Functionality

[Untested and incompletely described functionality for using Simple2ZOOM]

Simple2ZOOM would receive Explain requests from clients and return Explain records to requesting clients.

Simple2ZOOM would receive Z39.50/SRU queries from Z39.50/SRU clients. Simple2ZOOM would rewrite PQF queries as CQL queries.

Simple2ZOOM would pass PQF queries to Koha. Koha would rewrite the queries for Data::SearchEngine::Solr and pass the queries to Data::SearchEngine::Solr. Data::SearchEngine::Solr would write the queries as Solr/Lucene queries and pass the queries to Solr/Lucene.

Solr/Lucene would check against its indexes for matches to queries. Solr/Lucene would return result sets or other appropriate responses to Koha.

Koha would retrieve full bibliographic records matching the query result set IDs from the SQL database. Koha would return result sets or other appropriate responses, such as error codes, to Simple2ZOOM. Simple2ZOOM would return result sets or other appropriate responses, such as error codes, to the requesting Z39.50/SRU clients.

JZKit, CQL-Java with Data::SearchEngine::Solr and Solr/Lucene Z39.50/SRU Server Aspect of Code Functionality

JZKit would receive Explain requests from clients and return Explain records to requesting clients.

JZKit would receive Z39.50/SRU queries from Z39.50/SRU clients. JZKit would pass CQL queries to CQL-Java to be rewritten as PQF queries.

JZKit would rewrite PQF queries as Solr/Lucene queries and pass them to Koha. Koha would pass the queries to Solr/Lucene via Data::SearchEngine::Solr.

Solr/Lucene would check against its indexes for matches to queries. Solr/Lucene would return result sets or other appropriate responses to Koha.

Koha would retrieve full bibliographic records matching the query result set IDs from the SQL database. Koha would return result sets or other appropriate responses, such as error codes, to JZKit. JZKit would return result sets or other appropriate responses, such as error codes, to the requesting Z39.50/SRU clients.

Metasearch (Federated Search) Aspect of Code Functionality

Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy Metasearch Aspect of Code Functionality

Net::Z3950::ZOOM and YAZ Metasearch Copy Cataloguing Aspect of Code Functionality

[The existing copy cataloguing code functionality is described here. Different implementations would be possible including some options which may use Pazpar2 and Metaproxy.]

The staff Z39.50 copy cataloguing client cataloguing/z3950_search.pl search form generates PQF queries for Z39.50 targets and calls Net::Z3950::ZOOM directly which in turn calls YAZ for querying the various intended targets.

The staff Z39.50 copy cataloguing client queries are passed from YAZ to one or more remote Z39.50 servers. The remote Z39.50 servers return the result sets or other appropriate responses, such as error codes, to YAZ for presentation.

Net::Z3950::ZOOM, YAZ, Pazpar2, and Metaproxy Metasearch Non-Copy Cataloguing Aspect of Code Functionality

CCL queries are generated from opac/opac-search.pl and staff catalogue/search.pl forms. The CCL queries are passed to C4::Search.

In addition to the advanced search form, a single text input element is provided in the OPAC and staff client for typing queries to be passed to C4::Search. CCL, CQL, and PQF queries may be passed to C4::Search. C4::Search::buildQuery unnecessarily removes much potentially meaningful syntax passed from typed queries and should be fixed.

C4::Search would call Metaproxy which would in turn call C4::Search::PazPar2 for returning grouped and deduplicated records from configured database targets. [Metaproxy has not yet been configured for Koha and metasearch works in Koha currently with Pazpar2 and without Metaproxy.] CCL queries are sent via Net::Z3950::ZOOM to YAZ and then to Pazpar2. Pazpar2 converts the CCL queries to the query language configured for each database target including Z39.50/SRU, Solr/Lucene, and other types of targets.

Queries are passed from Pazpar2 to YAZ, and then to each database target. Each database target checks against its indexes for matches to queries. Each database target returns result sets or other appropriate responses, such as error codes, to YAZ for presentation after possible filtering by Pazpar2 and Metaproxy. Additional processing of the result sets is done by Koha.

JZKit Metasearch Aspect of Code Functionality

[There are no Perl bindings for Java based JZKit equivalent to the Net::Z3950::ZOOM Perl bindings for C based YAZ. There has been no answer from Knowledge Integration about whether the following functionality would work as a workaround for not having Perl bindings for JZKit.]

Abstracted queries would be generated from opac/opac-search.pl and staff catalogue/search.pl forms. The queries would be passed to C4::Search.

In addition to the advanced search form, a single text input element would be provided in the OPAC and staff client for typing queries to be passed to C4::Search.

C4::Search would rewrite abstracted user queries as CCL, CQL, or PQF queries for use with JZKit. C4::Search would pass queries to Net::Z3950::ZOOM, to YAZ, and then to JZKit.

Queries would be passed from Pazpar2 to YAZ, and then to each database target. Each database target would check against its indexes for matches to queries. Each database target would return result sets or other appropriate responses, such as error codes, to JZKit. JZKit would return the combined result sets or other appropriate responses, such as error codes, to YAZ for presentation. Additional processing of the result sets would be done by Koha.

Personal tools