Troubleshooting Zebra

From Koha Wiki
Jump to navigation Jump to search

Typical Zebra problems and solutions.

General information

Zebra is a separate database system which is optimized for book information.  Zebra should have a cron job that does re-indexing on a regular basis, pulling any new data out of MySql and organizing it into the Zebra database.

If you are having problems, you should:

  1. Manually run the re-indexing to see if you spot any error messages.
  2. Manually run a command-line search from yaz-client and see if that returns results
  3. Make sure you keep watching your apache and koha log files, even while you debug zebra.  Often an issue in Zebra will show up in those logs.

Indexing issues

Doing a manual re-index

Often this is a good way of clearing out problems. The method is detailed in this FAQ entry.

No such record type

When reindexing zebra, after the export phase has completed:

15:21:07-05/01 zebraidx(1502) [warn] No such record type: dom./home/robin/koha-dev/etc/zebradb/biblios/etc/dom-config-marc.xml

This seems to mean that zebra is looking in the wrong place for its modules. In Debian, they are in /usr/lib/x86_64-linux-gnu/idzebra-2.0/modules, in other distros they may be somewhere under /usr/lib64 for example.

Edit the files:

zebra-authorities.cfg
zebra-authorities-dom.cfg
zebra-biblios.cfg
zebra-biblios-dom.cfg

and set modulePath to point to the directory that contains mod-grs-marc.so, for example:

# modulePath - where to look for loadable zebra modules
modulePath: /usr/lib/x86_64-linux-gnu/idzebra-2.0/modules

Note: upgrading to recent versions of the packages will patch the installed files for you.

Which indexes are defined?

  • Listing indexes using xsltproc
  • Replace INSTANCE_NAME below
  • on latest versions search on metadata field of biblio_metadata table.
 xsltproc /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
 <(sudo koha-mysql INSTANCE_NAME <<< \
 "select  marcxml from biblioitems where ExtractValue( marcxml, '//datafield[@tag=035]/subfield[@code=\"a\"]' ) !=  limit 1\\G" | \
 sed -n 's/marcxml: //;2,$p' )

Record length of 101459 is larger than the MARC spec allows (99999 bytes)

The MARC disk format (ISO 2709) is old and crappy by modern standards. One part of this crappiness reveals itself in the fact that a MARC record can not be larger than 99,999 bytes. This will show up if you are trying to index a record that has a large number of items (common with serials, for example), or just has a lot of text in the record itself.

Koha can do the indexing by using the MARCXML format rather than ISO 2709, and this gets around the problem. If you add '-x' to the reindex_zebra.pl command when indexing biblios, it will do this. You can't use this when indexing authorities, but that matters less.

This has been the default in package installations for some time.

How many records are searchable?

There is probably more than way to do this, but following search will return all records indexed in your library:

 allrecords,AlwaysMatches=""

Comparing staff and OPAC can make sense, if you are using OPACSuppression. You can also compare the number to the number of records in your installation:

 SELECT COUNT(biblionumber) FROM biblio;

Records with lots of items not indexing

You may notice that a record with a lot of items isn't retrievable using Zebra. It's probably because it's over Zebra's default limit of 1MB.

You can increase this limit (e.g. to 4096) by changing "zebra_max_record_size" in koha-conf.xml and try re-indexing the problem records. They will likely be slow to load, but they should index successfully now.

Searching and Searching Issues

Manually searching with yaz-client

yaz-client is the command-line tool that connects directly to zebra.

A typical connection looks like:

$ yaz-client -c /etc/koha/zebradb/ccl.properties unix:/var/run/koha/zebradb/bibliosocket


A typical connection on a package install looks like:

$ yaz-client -c /etc/koha/zebradb/ccl.properties unix:/var/run/koha/$instance/bibliosocket

where $instance is the name of the instance you'd like to connect to.

The -c /etc/koha/zebradb/ccl.properties portion gives yaz-client the ability to use ccl, a simpler way to state searches.  Koha uses ccl when it generates search strings, so when you do tests manually, it helps to use this -c flag.
Alternatively, once connected to Zebra, you can use set_cclfile /etc/koha/zebradb/ccl.properties to tell Zebra to use the ccl.properties file from Koha.

The unix:/var/run/koha/zebradb/bibliosocket is the connection string that comes from your /etc/koha/koha-config.xml file.  That string could also be tcp:@, in which case you should connect with tcp:[ip-address-of-your-computer].  So, something like tcp:127.0.0.1

Once you have connected, you need to define what dataset to search.  Usually, you will choose "biblios"

Z> base biblios

for authorities, you'd choose

Z> base authorities

You might consider storing configuration options in ~/.yazclientrc so that you don't have to configure yaz-client for each connection.

Now you are ready to do your search.  A simple search will look something like

Z> f love
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 229, setno 1
SearchResult-1: term=love cnt=229
records returned: 0
Elapsed: 0.027506

Where f is short for "find", and "love" is the key-word we are looking for.

If you want to do a more complex search, testing out the ccl parser, you do something like:

Z> querytype ccl2rpn
Z> f love or hate
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 230, setno 4
SearchResult-1: term=love cnt=229, term=hate cnt=1
records returned: 0
Elapsed: 0.001270 

Where querytype tells yaz-client that we are going to give it a "CCL" search string.  Then, we can use the search terms that koha displays on the webpage for a search:

Home › Results of search for 'kw,wrdl: love'

(This text was copied from a koha search)

Default values when using CCL

When you're writing CCL, you should keep in mind that there are many hidden default values affecting your search.

  • First, if you don't provide an index, the default of "kw" will be used.

From here: "If no use attribute is provided, a default of BIB-1 Use Any (1016) is assumed."

  • Second, if you don't provide an index (ie qualifier), the attributes for "term" will be used.

From here: "The attributes for the special qualifier name term are used when no CCL qualifier is given in a query."

Taking these three facts into account, consider the following examples:

  • (1) "love or hate" in yaz-client is actually '"kw,term,phr=love" or "kw,term,phr=hate"'.

However, "love or hate" in Koha is likely going to be '"kw,wrdl=love or hate"' which gets translated to '"kw,wrdl=love or kw,term,phr=hate"'. Despite these differences, both queries should return the exact same search results, and this is because "term" contains the special attribute "s=al" which acts like "wrdl" (ie it tokenizes the term using spaces and ANDs the tokens together).

  • (2) "love hate" in yaz-client is actually '"kw,term,phr=love hate"' which is essentially equivalent to '"kw,wrdl=love hate"' which is equivalent to '"kw,st-word=love" and "kw,st-word=hate"'.

However, a search for '"ti=love hate"' is very different. That query is actually equivalent to '"ti,phr=love hate"'. You'll likely get very few matches for this search in comparison to "love hate". This might not seem obvious at first. It's also worth noting that you can use parentheses to group query elements. For instance:

  • (3) "ti=(love or hate)" is equivalent to '"ti,phr=love" or "ti,phr="hate"'

You can learn more about CCL and RPN at the following links:

Why does a search return that result

Sometimes it's pretty confusing why zebra gives particular results to a search. You can make yaz-client tell you what it's actually using to get the results.

$ yaz-client unix:/var/run/koha/demo/bibliosocket
Connecting...OK.
Sent initrequest.
Connection accepted by v3 target.
ID     : 81
Name   : Zebra Information Server/GFS/YAZ
Version: 4.2.30 98864b44c654645bc16b2c54f822dc2e45a93031
Options: search present delSet triggerResourceCtrl scan sort extendedServices namedResultSets
Elapsed: 0.001347
Z> format xml
Z> elements zebra::snippet
Z> base biblios
Z> find tra                   <-- the search term to test
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 5, setno 1
SearchResult-1: term=tra cnt=5
records returned: 0
Elapsed: 0.005154
Z> show 1
Sent presentRequest (1+1).
Records: 1
Record type: XML
<record xmlns="http://www.indexdata.com/zebra/">
<snippet name="Any" type="w">306.76 <s>TRA</s></snippet>                 <-- these snippets show you what matched your search
  <snippet name="Any" type="w">306_760000000000000_<s>TRA</s></snippet>
  <snippet name="Any" type="w">306.76 <s>TRA</s></snippet>
</record>nextResultSetPosition = 2
Elapsed: 0.049820

If you see:

Diagnostic message(s) from database:
[25] Specified element set name not valid for specified database -- v2 addinfo 'zebra::snippet'

then make your /etc/koha/marc21-retrieval-info-bib-dom.xml (for packages) file look like:

<?xml version="1.0" encoding="UTF-8"?>
<retrievalinfo xmlns="http://indexdata.com/yaz">
<retrieval syntax="marc21" name="F">
<backend syntax="xml" name="marc">
<marc inputformat="xml" outputformat="marc"
      inputcharset="utf-8"
      outputcharset="utf-8"/>
</backend>
</retrieval>
<retrieval syntax="marc21" name="B">
<backend syntax="xml" name="marc">
<marc inputformat="xml" outputformat="marc"
      inputcharset="utf-8"
      outputcharset="utf-8"/>
</backend>
</retrieval>
<retrieval syntax="xml" /> 
<retrieval syntax="xml" name="index"/> 
<retrieval syntax="xml" name="marc"
       identifier="info:srw/schema/1/marcxml-v1.1"/>
<retrieval syntax="xml" name="marcxml"
       identifier="info:srw/schema/1/marcxml-v1.1"/>
<retrieval syntax="xml" name="zebra::*"/>
</retrievalinfo>

the important lines being:

   <retrieval syntax="xml" /> 

and maybe:

   <retrieval syntax="xml" name="zebra::*"/>

These should already be present in recent Koha versions.

Watching Zebrasrv

If you can manually search with yaz-client but koha is still not searching properly, you may want to run zebrasrv manually with logging turned on, and watch for errors.

zebrasrv is the server side of zebra. It is usually started through the init scripts in /etc/init.d.

If you look at that file, you should be able to rebuild the commandline needed to start it and run manually in case this documentation gets out of date.

zebrasrv typically runs as another user (koha) to do the test correctly then, you will want to be that user.

The commandline to run zebrasrv may look something like one of the following:

$ /usr/bin/zebrasrv -v none,fatal,warn -f /etc/koha/koha-conf.xml -u netadmin -S
$ /usr/bin/zebrasrv -v none,fatal,warn,all,sessiondetail,zebraapi,requestdetail -f /etc/koha/koha-conf.xml -u netadmin -S

The second one has a lot more logging enabled than the first one.  The -v option tells which logging areas to enable.

The -f option tells zebrasrv where your configuration file is.

The -S option tells zebrasrv to run in the foreground and show you all the connection information.

You should run this and then do a search from Koha.  

You might see something like: 

[request] Auth idPass user -
[request] Init ERROR 1011 ID:81/81

Which shows you a bad username or password.  (Koha parses out the username and passwords from the same file that zebra uses.  But some characters are treated differently.  So if you are getting a user/password error, try changing the username / password by removing some of the non alpha-numeric characters like #$%^&*)

Searching with PQF/RPN

By default, when you're searching Zebra, it will want to use "querytype" of "prefix", which is short for "Prefix Query Format" or PQF.

The syntax can be intimidating, but it's much more powerful than using ccl2rpn. It looks like the following:

@or @attr 1=1016 love @attr 1=1016 hate 

Or it could be written as:

@attr 1=1016 @or love hate 

While this might not seem particularly difficult to understand, it can be more difficult for more complex queries.

While it's technical, you can read about the PQF grammar at http://www.indexdata.com/yaz/doc/tools.html#PQF.

David Cook has spent hours reading documentation and experimenting with queries, so you might also look at his comments on the following Bugzilla reports:

One of these days, he'll write a clear and concise guide to understanding and writing PQF queries, but today is not that day. He hopes that people can glean knowledge from his ramblings and resource suggestions.

Error messages in Zebra/YAZ

What is an Unsupported Use attribute?

Sometimes, you might encounter a diagnostic message from Zebra that says "[114] Unsupported Use attribute". Take the following example:

Z> f kw,st-numeric="Harry Potter"
Sent searchRequest.
Received SearchResponse.
Search was a bloomin' failure.
Number of hits: 0, setno 33
Result Set Status: none
records returned: 0
Diagnostic message(s) from database:
    [114] Unsupported Use attribute -- v2 addinfo '1016'
Elapsed: 0.000738

We might try looking up the error message (http://www.loc.gov/z3950/agency/contributions/1.html), but that's not very helpful. Instead, let's think about the query.

"kw" means we're searching the "Any" index, which exists in the "w" (word) and "p" (phrase) registers. However, "st-numeric" means we're checking in the "n" (numeric) register.

But what about this "v2 addinfo '1016'"? Well, 1016 is the attribute number for "kw". We get this message because there is no "Any" index in the "n" register.

Makes sense.

However, take a look at another example...

Z> f Host-item,wrdl="Harry Potter"
Sent searchRequest.
Received SearchResponse.
Search was a bloomin' failure.
Number of hits: 0, setno 36
Result Set Status: none
records returned: 0
Diagnostic message(s) from database:
    [114] Unsupported Use attribute -- v2 addinfo '1033'
Elapsed: 0.000728


In this case, "Host-item" means we're searching in the "Host-item" index. Like above, 1033 is the attribute number for "Host-item". "wrdl" stands for "word list" and it checks the "w" and "p" registers (according to http://www.indexdata.com/zebra/doc/querymodel-zebra.html#querymodel-pqf-apt-mapping). If we look at "biblio-zebra-indexdefs.xsl", we see that "Host-item" has an index in the "w" register... so why are we getting this error?

Well, the index is only created in the "w" register, if there's something to put in the index. In this case, my records haven't contained data that would be indexed into a Host-item index, so we get this diagnostic message!

What is an Unsupported Truncation attribute?

Typically, we use the Bib-1 attribute set when querying Zebra (https://www.loc.gov/z3950/agency/defns/bib1.html).

If we get a message saying "Unsupported Truncation attribute", it means that a particular Bib-1 Truncation Attribute we're trying to use isn't supported by Zebra.

This should be rare, as Zebra claims to support all truncation attributes (http://www.indexdata.com/zebra/doc/querymodel-rpn.html).

However, if you're using ICU, it's possible that you will see this message, as it doesn't support all truncation attributes. (By the way, you can verify that you're using ICU by checking zebradb/etc/default.idx for the icuchain directive.)

If you use "@attr 5=103" or the "fuzzy" CCL qualifier in your query, you'll likely see the following returned by yaz-client: [120] Unsupported Truncation attribute -- v2 addinfo '103'

In your Zebra server log, you might see something like this: 12:19:22-05/05 zebrasrv(2) [request] Search biblios ERROR 120 1 1+0 RPN @attrset Bib-1 @attr 1=1016 @attr 5=103 test

If you're not specifying @attr 5=103 in your PQF query or "fuzzy" in your CCL query, chances are that you have the system preference QueryFuzzy enabled. If that's the case, you will need to disable it before you're able to successfully search Zebra again.

Is your zebra running?

You better go catch it then.