Recovering from a corrupted zebra index
Occasionally you may run into an issue where your Zebra server begins segfaulting due to a corrupted index. In many cases, this will require a full reindex to recover from, but there are things we can do to make the process less painful, especially for catalogs of any significant size (1 hour+ for a full reindexing).
Zebra keeps the indexes in files. This means that you can take snapshots of the indexes to back them up.
If you have no backups of your Zebra index, your only recourse is a full (hours long, in many cases) reindex where you stand. Your catalog will be down for the duration.
The Koha debian packages do this for you as part of the package backup procedure. Koha users with manual installs will need to
- locate their zebra indexes
- write a shell script to take a nightly backup
- put that in their crontab
We recommend keeping 5 sets, just in case.
In the event of a corrupted index killing your Zebra, *assuming you have backups,* the general procedure is as follows:
- go get your backup index files and have them ready (for the packages, those are kept on the machine at /var/spool/koha/instancename/instancename-date.tar.gz. Unzip them, files/directories you need will be in the var/lib/koha/instancename/biblios directory)
- stop your zebrasrv/down your catalog (for the packages that's koha-disable instancename)
- locate and delete your corrupt zebra indexes (for the packages this will be /var/lib/koha/instancename/biblios). Be sure to double check the permissions/owners on the indexes. You'll need to re-do them on the backups - it's best to write it down!
- copy the backup files into their place
- re-do, if necessary, permissions and owner/group on the index files
- bring up your catalog (with the packages, that's koha-enable instancename)
This will bring your indexes back to a working state, sans any additions done in the interval between the index backup and the restore. The business of the library can continue normally at this time.
To recover fully, run a full reindex to catch the inevitable newly cataloged items between when the snapshot was taken and the failure occurred. Do this after hours.