Bug7284 authority matching improvement

From Koha Wiki
Jump to navigation Jump to search

Authority matching improvement

Description

At present, the automatic authority matching is of limited use because it fails on headings with more than one subfield, doesn't take into account subfield codes, and considers punctuation significant. An improved matching algorithm should be able to match the following headings to the correct authorities

This development change the behaviour of authorities matching for:

  • libraries having BiblioAddsAuthorities ON
  • libraries running misc/link_bib_to_authorities


Upgrading from a previous version

If you upgrade from 3.6 to 3.8, the authority matching won't work anymore until you've updated the zebra configuration file. The following updates are needed:

Copy the following files from Koha source into zebra database:

  • etc/zebradb/authorities/etc/bib1.att,
  • etc/zebradb/marc_defs/marc21/authorities/authority-koha-indexdefs.xml,
  • etc/zebradb/marc_defs/marc21/authorities/authority-zebra-indexdefs.xsl,
  • etc/zebradb/marc_defs/marc21/authorities/koha-indexdefs-to-zebra.xsl, and
  • etc/zebradb/marc_defs/unimarc/authorities/record.abs

Reindex all authorities: misc/migration_tools/rebuild_zebra.pl -a -r


New systempreferences are created by the upgrade

  • AutoCreateAuthorities - When this and BiblioAddsAuthorities are both turned on, automatically create authority records for headings that don't have any authority link when cataloging. When BiblioAddsAuthorities is on and AutoCreateAuthorities is turned off, do not automatically generate authority records, but allow the user to enter headings that don't match an existing authority. When BiblioAddsAuthorities is off, this has no effect.
  • LinkerModule - Chooses which linker module to use for matching headings (current options are as described above in the section on linker options: "Default," "FirstMatch," and "LastMatch")
  • LinkerOptions - A pipe-separated list of options to set for the authority linker
  • LinkerRelink - When turned on, the linker will confirm the links for headings that have previously been linked to an authority record when it runs. When turned off, any heading with an existing link will be ignored.
  • LinkerKeepStale - When turned on, the linker will never *delete* a link to an authority record, though, depending on the value of LinkerRelink, it may change the link.


LinkerModule

Currently available linker options are:

  • Default: retains the current behavior of only creating links when there is an exact match to one and only one authority record; if the 'broader_headings' option is enabled, it will try to link to headings to authority records for broader headings by removing subfields from the end of the heading (NOTE: test the results before enabling broader_headings in a production system because its usefulness is very much dependent on individual sites' authority files)
  • First Match: based on Default, creates a link to the *first* authority record that matches a given heading, even if there is more than one authority record that matches
  • Last Match: based on Default, creates a link to the *last* authority record that matches a given heading, even if there is more than one record that matches

LinkerOptions

  • At present the only option implemented is "broader_headings". With this option, the linker will

try to match the following heading as follows: =600 10$aCamins-Esakov, Jared$xCoin collections$vCatalogs$vEarly works to 1800.

First: Camins-Esakov, Jared--Coin collections--Catalogs--Early works to 1800 Next: Camins-Esakov, Jared--Coin collections--Catalogs Next: Camins-Esakov, Jared--Coin collections Next: Camins-Esakov, Jared (matches! if a previous attempt had matched, it would not have tried this)

This is probably relevant only to MARC21 and LCSH, but could potentially be of great use to libraries that make heavy use of floating subdivisions.

Improved Koha's authority linker cron job (misc/link_bibs_to_authorities.pl)

Added the following options to the misc/link_bibs_to_authorities.pl script:
--auth-limit Only process those headings that match the authorities matching the user-specified WHERE clause.
--bib-limit Only process those bib records that match the user-specified WHERE clause.
--commit Commit the results to the database after every N records are processed.
--link-report Display a report of all the headings that were processed.