Search Info

From Koha Wiki
Jump to navigation Jump to search

For people used to web searches and database oriented application Koha way of dealing with searches (since v.3) might seem a bit alien. This is a small set of notes (original article) related to some not very obvious questions which I had to answer in order to do some configurations which seemed trivial at the beginning. I am not saying this is wrong or ineffective, just different and different means time to learn. I’m writing this to help you reduce this time.


Search does not happen in the database

…and reindexing (in this case called "fast-indexing") is done on fixed intervals

In fact Koha stores the content in the database but does not use the database for searching. Instead all new content is indexed into another application called Zebra and for those who already have dealt with Koha in previous versions imagine Zebra as your own Z39.50 server, which in fact is what it is. So one of the first things to know is that reindexing is not automatic or triggered by a database modification. Instead, when an item is added or modified, an entry is added into the zebraqueue. Fast indexing is done upon fixed intervals and is triggered by an external cron launched script. During fast indexing, the script reads new entries in the zebraqueue and adds or modifies zebra indexes accordingly. Therefore, if you add a biblio you will not find it immediately in search, you have to wait for the fast indexing to take place and this can be quite annoying if you need for example to add an authority and reuse it on the spot when you add your biblio. Of course you can schedule the fast indexing to take place quite often to make the interval short but a delay will exist nevertheless.


Search means PQF

Since Z39.50 is very very old it must have been designed by someone who was teaching computer science and could not imagine people could not know polish notation so expect some very strange queries. But first thing first, what are these queries? In fact Koha communicates with Zebra using a query language which is not quite meaningful at the beginning. And this is not just for searching but for everything such as selecting a biblio or authority by id. You can find these searches if you look in koha/var/log/koha-zebradaemon-output.log. There are a lot of documents on the net but as I said it’s a different approach. You will find the grammar of the language but very few explained examples. Let’s have one:

Assume you are doing a keyword search for “marketing”, here is the pqf for it:

@attrset Bib-1 @attr 1=1016 @attr 4=6 @attr 5=1 marketing

in translation, search:

  • using attributes defined in Bid-1 (ie. the file koha/etc/zebradb/biblios/etc/bib1.att) and
    * in field 1016 (@attr 1=1016) which means any, which means the “Any” index
    * where the term can be a word list (@attr 4=6)
    * using right truncation (@attr 5=1)


But how can you do an or?

Let’s assume you want to search in Author or Title or Subject, guess the pqf?

@attrset Bib-1 @attr 4=6 @attr 5=1 @or @attr 1=1003 marketing @or @attr 1=4 marketing @attr 1=21 marketing

Now you understand why I mentioned the polish notation. But here are some links which you might find useful also:


And how to test my wonderful search?

You can use yaz-client as for any Z39.50 server. Just connect locally (on unix: socket) and select the biblios (data)base

yaz-client unix:/usr/share/koha304/var/run/zebradb/bibliosocket
Z>base biblios
Z>f @attrset Bib-1 @attr 4=6 @attr 5=1 @or @attr 1=1003 marketing @or @attr 1=4 marketing @attr 1=21 marketing

Forget PQF, it’s time for CCL

Where you happy to have learned this wonderful new language? Well, don’t be impatient. The actual search is done using CCL, a newer type of language which is actually converted in PQF before execution. So if you where wondering what: kw,wrdl: marketing means you just found it. It’s CCL. Don’t worry you did not learned PQL for nothing since there are a lot of places where PQF it’s used directly. Just search for @attr in koha/lib/C4/*.pm and you will find a lot of results. But not for the searches. The CCL equivalent of the above PQF query is:

au,wrdl: marketing or ti,wrdl: marketing or su,wrdl: marketing

Somewhere in Search.pm it gets converted into PQF and sent to Zebra.

But where is MARC?

You might ask what are these 1016, 1003, 4 or 21. They don’t look like some known MARC fields. In fact they are not, they are Zebra indexes (remember the “Any” index?). And this is another thing to get your head bumped into but as your head might be spinning already it deserves a different post.