ICU Chains Library
Jump to navigation
Jump to search
ICU Chains Configuration and Customization
After ICU_chains_configuration, it may be necessary to modify words-icu.xml or phrases-icu.xml, based on the grammar of the language being searched.
See http://www.indexdata.com/zebra/doc/icuchain-files.html, http://www.indexdata.com/yaz/doc/yaz-icu.html, http://userguide.icu-project.org/transforms/general/rules and http://www.unicode.org/cldr/charts/latest/transforms/index.html
Languages
Arabic
- Developer: Yasserkad
- File: words-icu.xml
- Language: Arabic
- Locale: ar
- Status: Complete
<icu_chain locale="ar">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<transliterate rule="{ الا > ا "/>
<transliterate rule="{ الأ > أ "/>
<transliterate rule="{ الإ > إ "/>
<transliterate rule="{ الآ > آ "/>
<transliterate rule="{ الب > ب "/>
<transliterate rule="{ الت > ت "/>
<transliterate rule="{ الث > ث "/>
<transliterate rule="{ الج > ج "/>
<transliterate rule="{ الح > ح "/>
<transliterate rule="{ الخ > خ "/>
<transliterate rule="{ الد > د "/>
<transliterate rule="{ الذ > ذ "/>
<transliterate rule="{ الر > ر "/>
<transliterate rule="{ الز > ز "/>
<transliterate rule="{ الس > س "/>
<transliterate rule="{ الش > ش "/>
<transliterate rule="{ الص > ص "/>
<transliterate rule="{ الض > ض "/>
<transliterate rule="{ الط > ط "/>
<transliterate rule="{ الظ > ظ "/>
<transliterate rule="{ الع > ع "/>
<transliterate rule="{ الغ > غ "/>
<transliterate rule="{ الف > ف "/>
<transliterate rule="{ الق > ق "/>
<transliterate rule="{ الك > ك "/>
<transliterate rule="{ الل > ل "/>
<transliterate rule="{ الم > م "/>
<transliterate rule="{ الن > ن "/>
<transliterate rule="{ اله > ه "/>
<transliterate rule="{ الو > و "/>
<transliterate rule="{ الي > ي "/>
<display/>
<casemap rule="l"/>
</icu_chain>
Chinese / zh_TW
- Developer: Thomas -- https://groups.google.com/forum/#!msg/kohataiwan/BlGak5iVvgE/u3-37wepdmYJ
- File: words-icu.xml
- Language: Chinese / zh_TW
- Locale: zh_TW
- Status: Untested
<icu_chain locale="zh_TW.UTF-8">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<display/>
<casemap rule="l"/>
</icu_chain>
Kurdish (کوردی)
- Developer: D.Roshani
- File: words-icu.xml
- Language: Kurdish (کوردی)
- Locale: ku
- Status: Untested
<icu_chain locale="ku">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<transliterate rule="{ ئ > ئـ "/>
<transliterate rule="{ ئا > ا "/>
<transliterate rule="{ بێ > ب "/>
<transliterate rule="{ پێ > پ "/>
<transliterate rule="{ تێ > ت "/>
<transliterate rule="{ جێ > ج "/>
<transliterate rule="{ چێ > چ "/>
<transliterate rule="{ حێ > ح "/>
<transliterate rule="{ خێ > خ "/>
<transliterate rule="{ دال > د "/>
<transliterate rule="{ رێ > ر "/>
<transliterate rule="{ ڕێ > ڕ "/>
<transliterate rule="{ زێ > ز "/>
<transliterate rule="{ ژێ > ژ "/>
<transliterate rule="{ سێ > س "/>
<transliterate rule="{ شێ > ش "/>
<transliterate rule="{ عین > ع "/>
<transliterate rule="{ غین > غ "/>
<transliterate rule="{ فێ > ف "/>
<transliterate rule="{ ڤێ > ڤ "/>
<transliterate rule="{ قێ > ق "/>
<transliterate rule="{ کێ > ک "/>
<transliterate rule="{ گێ > گ "/>
<transliterate rule="{ لێ > ل "/>
<transliterate rule="{ ڵێ > ڵ "/>
<transliterate rule="{ لام > م "/>
<transliterate rule="{ نوون > ن "/>
<transliterate rule="{ هە > ھ "/>
<transliterate rule="{ ئه > ە "/>
<transliterate rule="{ ئو > و "/>
<transliterate rule="{ ئۆ > ۆ "/>
<transliterate rule="{ ئوو > وو "/>
<transliterate rule="{ ئی > ی "/>
<transliterate rule="{ ئێ > ێ "/>
<display/>
<casemap rule="l"/>
</icu_chain>
Polish
- Developer: Fsomers
- File: words-icu.xml
- Language: Polish
- Locale: pl
- Status: Incomplete
<transliterate rule="{ ą > a "/>
<transliterate rule="{ Ą > a "/>
<transliterate rule="{ ć > c "/>
<transliterate rule="{ Ć > c "/>
<transliterate rule="{ ę > e "/>
<transliterate rule="{ Ę > e "/>
<transliterate rule="{ ł > l "/>
<transliterate rule="{ Ł > l "/>
<transliterate rule="{ ń > n "/>
<transliterate rule="{ Ń > n "/>
<transliterate rule="{ ó > o "/>
<transliterate rule="{ Ó > o "/>
<transliterate rule="{ ś > s "/>
<transliterate rule="{ Ś > s "/>
<transliterate rule="{ ź > z "/>
<transliterate rule="{ Ź > z "/>
<transliterate rule="{ ż > z "/>
<transliterate rule="{ Ż > z "/>
I would like to see a full xml <icu_chain locale="pl"> tag here.
Swedish
- Developer: Gaetan Boisson, Fridolin Somers
- File: words-icu.xml
- Language: Swedish
- Locale: sv-SE
- Status: Untested
- Notes:
<icu_chain locale="sv-SE">
<transform rule="[^åäöÅÄÖ] NFD"/><!-- do not undiactric some characters -->
</icu_chain>
Thai
- Developer: Ajahn Ratanawanno -- http://lists.indexdata.dk/pipermail/zebralist/2015-August/002630.html
- File: words-icu.xml
- Language: Thai
- Locale: th
- Status: Untested
- Notes: According to the Koha mailing list thread, "The result in searching in Thai seems to be much better but still not really satisfy me if I compare to http://koha.library.tu.ac.th/, maybe I don't have search by keyword enable like their library.", so perhaps this needs some work. Perhaps the administrators at http://koha.library.tu.ac.th/ could share their words-icu.xml?
<icu_chain locale="th">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-' > '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<display/>
<casemap rule="l"/>
</icu_chain>