Proposal for Wiki Curator 3.20 Thomas Dukleth

From Koha Wiki
Jump to navigation Jump to search

A proposal from Thomas Dukleth for curating the Koha wiki.

General Comment

This proposal has been greatly revised following suggestions in discussion with Martin Renvoize. Most importantly, Martin has attributed some problems to the use of Postgres for the MediaWiki database and the need to migrate the database to MySQL.

I propose to fix some significant outstanding problems in the wiki which were created in the rush with which the Koha MediaWiki experiment suddenly became the only Koha wiki without sufficient testing following the unfortunate response of PTFS/LibLime to take down the previous DocuWiki based wiki when the Koha community set up their own independent copy of the Koha bugzilla database. I had advocated for testing MediaWiki as a replacement for DocuWiki, citing Semantic MediaWiki as one significant reason for favouring MediaWiki. I had been unaware that implementing MediaWiki using a Postgres database would prevent important Semantic MediaWiki extensions from working properly. At the time when we were first testing MediaWiki, the MediaWiki PostgreSQL page did not link to the specific set of bugs and those of us preparing a test implementation were not aware of significant specific problems for using Postgres as a MediaWiki database. Without the knowledge of a sufficient reason not to use to use Postgres, we were persuaded by the real advantages of Postgres as a database preference generally. Problems of the initial rushed effort had been compounded by a more recent extended period of neglect of the wiki on my part.

The need to migrate the database which imposes difficulties is a reason not to introduce structural changes to the database which might complicate or prevent a successful migration effort. Briefly, that means waiting on implementing Semantic MediaWiki with structural changes entailed and waiting on upgrading the MediaWiki version. Issues with MediaWiki in Postgres may explain why command line tests were failing when we were considering actual use of Semantic MediaWiki in 2010. [See Migrating to MySQL further below for consideration of migration difficulties.]

On other issues, I propose to take a light touch towards maintenance overall but not an entirely absent touch which my efforts had devolved into recently. In other words, I do not view the role as policing the wiki. As a corollary principle, I intend to follow the 'first do no harm' doctrine which on some occasions has not been followed by people working making a sincere effort for wiki maintenance. Curation and maintenance would require far too much effort without reasonably maintained features which help all contributors do what is necessary to maintain the wiki as they create content.

Galen Charlton at Equinox has been providing administration and hosting of MediaWiki which I hope they will continue to provide.

Major Goals

Navigation Maintenance

Navigation has become increasingly problematic.

Every wiki category needs to be a subcategory of one and only one higher level category with one unifying top level home category for navigation to work comprehensively from the wiki main page even as we prepare to transition to using Semantic MediaWiki extensions with a more faceted use of categories. [The top level category need not be named 'Home' as it had been historically. 'Koha' may be a sufficient top level category name for a Koha wiki.]

The SelectCategory extension used for maintaining authority control consistency for category assignment when creating and editing pages does not scale and needs to be retired but probably not before we can transition to a properly implemented set of Semantic MediaWiki extensions. [See Phasing Out SelectCategory Extension further below for replacing the SelectCategory with the Semantic Forms extension.]

First Priority Problems for Navigation Maintenance

The Category Breadcrumb extension is intended to provide contextual navigation links from every page to related higher level categories for efficient browsing of wiki content as a classification browsing feature could for a library automation system. However, lack of attention to the wiki for an extended period has allowed problems with improperly specified category relationships to provide a mess of redundant and disjointed links even when categories appear to have been applied correctly for affected pages themselves at one time.

Currently, Category Breadcrumb is revealing problems which should be fixed. The problems might be hidden using a later upstream version or effort might be taken to patch Category Breadcrumb but such a possibility would merely mask the underlying problems about category assignment and relationships which ought to be fixed. I had to modify Category Breadcrumb for it to work at all in the first place during a period of neglect by its original author. After migrating the database to properly support Semantic MediaWiki, more faceted use of categories and some further development of Semantic MediaWiki to better support contextual navigation might replace the functionality currently provided by Category Breadcrumb.

While I am still investigating all the causes of the problems leading to improper display of categories, as of the beginning of April 2015 I have been able to confirm well known recurring causes, determine some others, and make a reasonable guess about the remaining aspect of one. Following good testing procedures and the ever important "first do no harm" doctrine, I have continued to check for the causes of all the problems instead of rushing carelessly to correct those which I have currently identified and possibly creating additional problems. I have also delayed creating some wiki best practises pages until I my investigation would better reveal actual problematic construction of categories.

Redundant Application of Parent Categories

There are some obvious instances which need correcting in which parent categories have been applied to a page in addition to applying the most specific relevant child category or categories. For the purposes of navigation, parent, grandparent, etc. relationships are automatically inherited by child categories. Instances of redundant use of parent categories have been multiplied when the form of an existing page has been copied to make new similar pages. Prominent examples of copying the form of previous pages to create new pages with the same problem are many IRC meeting pages.

I have raised redundant use of parent categories issue in the past on the mailing list. I will address the issue in some best practises pages linked from the main page.

Some people may have been motivated to redundantly add parent categories to the categories for a wiki page possibly as an additional aid to full text searching with the knowledge that MediaWiki categories are ultimately flat. The ultimately flat structure of categories means that only directly specified categories are stored with a wiki page itself, not hierarchical relationships specified in category pages for categories. However, such concerns for full text searching would be much better addressed by any of the following.

Appropriate naming of subsidiary categories already covers known instances of the problem, such as including the text 'RFCs' at the end of a category name for designating categories matching bugzilla. (After database migration to properly support Semantic MediaWiki, the practise of appending contextual designators, such as the text 'RFCs', to category names may need changing in favour of templates for more faceted generic reuse of categories if we would have Semantic Forms providing a value list feature for consistency.)

Use of templates can designate the overall basic type of wiki page, such as a template for RFCs without redundantly designating the category. Additionally, use of other semantic elements in page templates can designate other aspects, such as the status of an RFC, however, replacing categories with semantic elements in a page should await a proper implementation of a value list with Semantic Mediawiki. for which semantic page templates may be able to provide a value list for consistency of application. A more faceted use of categories as part of a transition to Semantic MediaWiki.

Multiple Parent Categories

Many cases of what appear to be orphan navigation links are the accidental result of the sincere efforts of some people to be helpful in introducing reorganised category relationships which have not been consistent in maintaining parent child relationships for categories. Multiple parent categories for a single category are semantically dubious and lead to confused category parentage which may appear as redundant category orphans in the navigation links presented by Category Breadcrumb.

When contemplating category organisation, care should be taken to avoid obvious semantic ambiguity from incomplete work. Currently, an incomplete example of category reorganisation has left a mess from parallel but overlapping thesauri. Multiple category parents might be semantically correct for distinct thesauri but distinct thesauri should not share the same unitary MediaWiki category. The good intention behind the parallel example was not to create a distinct thesaurus but to test an example for category reorganisation for which the work is incomplete.

I have not found any MediaWiki maintenance tool for tracing categories for which multiple parents have been assigned. We could write a screen scraping tool but the number of categories which would need checking currently is quite small. The obvious category for which multiple parent assignment needs correcting is 'Development'. Further examination may show others.

Mysterious Unchangeable Category Assignments

Templates which apply categories need management and/or restraining to avoid creating redundant navigation links applying child categories with appropriate specificity to individual pages. At one time, we had had a problem with overly general category assignment by template which could not be corrected in a page using the template when a more specific child category was assigned to the page without removing the otherwise unproblematic template from the page. At the time when I had noticed such problems, I raised the issue on the mailing list; manually applied the category to each page or a more specific child category; and removed the redundant overly general category assignment from the relevant template.

At the beginning of April 2015 when I had exhausted various possibilities for why the 'IRC Meetings' category seemed to be assigned mysteriously to all IRC meeting wiki pages redundantly to any actual assignment for the individual wiki pages themselves, Katrin Fischer noticed that "IRC Meetings" was the name of both a category and a wiki page. Thanks again Katrin for the insight. Any pages linked from the 'IRC Meetings' page seemed to have the mysterious category assignment. While the mysterious assignment was not necessarily wrong it should not be unchangeable or redundant with category assignments to the wiki pages themselves.

Various workarounds are possible with very slightly changing the name of either the wiki page or the category with the same name. Merely changing capitalisation of one would sufficient as a workaround for this case.

Addressing the underlying problem would be better. There may be a bug in either my modified version of Categoory Breadcrumb which we have installed or MediaWiki. My suspicion lies with Category Breadcrumb and I will check what might be done to fix the matter as it may have been caught and fixed upstream. There is always the possibility that someone working on MediaWiki thought of the behaviour as a feature and not a bug at one time.

Ongoing Issues

Both the uncategorised categories and the uncategorised pages problems are rather small. Uncategorised pages and categories had been a problem for almost all the pages which had been added in the previous DocuWiki wiki over the years. Full text search was especially insufficient to compensate for lack of categorisation.

Categories need to be examined for appropriate parallels in the bug tracker for adding bug pages to the wiki as needed. Categories also need to be examined for other parallels which may be helpful and necessary.

I think like a library cataloguer with a great concern for semantic means of locating information. There has often been a conceptual mismatch between project developers and library science. There has also often been a conceptual mismatch between most librarians and library science. I intend to do my best to bridge those gaps using the wiki where necessary and helpful.

Some wiki navigation has been lost from the good efforts of some people to 'clean' navigation. Outdated content should be updated even minimally if it can be and archived in a manner which still allows it to be found semantically if it cannot. See categories for which content may have been lost in a semantically relevant manner. Marking pages as outdated with an info box may be more effective than moving or recategorising content in a manner which loses semantic access. After the issue was raised in the first part of the #koha general irc meeting 17 December 2014, I will research marking outdated content in a manner which such content can be identified as part of a text query which should help people interested more easily identify what needs updating. [In all cases, we never follow the much abused Wikipedia project practise of deleting pages with useful content for lack of notability (a spam control measure often gone amuck with some Wikipedia editors). We do delete actual spam (a distinction lost on some Wikipedia editors).]

Deactivating Ordinary Breadcrumb Extension

The ordinary Breadcrumb extension which provides a means to retrace steps beyond the common forward and back function of web browsers is probably more of a nuisance than a help to most people. Given the perceived general consensus against, including my own view, I have deactivated it at the beginning of April 2015. People needing such functionality would be better served by a web browser history extension running locally in their own web browsers as opposed to a MediaWiki extension running on the wiki server.

In the former Koha wiki, using DokuWiki, for which most pages had been created without any category assignment, ordinary breadcrumbs functionality was useful as a minimal navigational aid while we worked on correcting the problem of lack of navigation.

Encouraging Best Practises

As I had done with the previous Koha community wiki, formerly in DokuWiki, I will create some pages suggesting some helpful content creation practises and explaining how some of the existing navigation features work, so that more people may be better able to use them to find wiki content by classified subject categories in addition to text queries. The more people who understand how some features work the easier the wiki will be to maintain for everyone.

Road Map

Elements of a Koha development road map had become lost with the change to a time based release from the former feature based release cycle. Existing pages should be examined for possibly incorporating or adapting them to serve part of the function of a roadmap or for linking from a current roadmap page or set of pages.

Other people have made a good start on the Koha roadmap. As always with Koha, additional effort by anyone is welcome.

Fixing Breakage

Please telephone me +1 212 674-3783 if the wiki seems to be fundamentally broken. Telephoning me is often the best way to obtain my attention in a reasonably quick time as I often have not had time to attend to read most email in real time.

Deferred Goals

Major Development

I suggest deferring any major development such as upgrading to a more recent MediaWiki version etc. to the Koha 3.22 release cycle when more of my time could be available for managing the task and testing carefully or otherwise after some more routine neglected maintenance and preparation has been done.

Migrating to MySQL

To successfully migrate the database to MySQL, we may find that we need to stay at the version of MediaWiki we have been using or first upgrade exactly the most appropriate version and no further while using the Postgres database to successfully migrate to a MySQL database using the only set of scripts available for the migration task. Great care must be taken with much testing of copies of the database to avoid upgrading or migrating the production database in some way which might leave us with neither an effective way forward for migrating to MySQL nor an effective way back for any newly added content. We should also avoid leaving the edit history and user data behind as would be the consequence of using Manual dumpBackup to create an XML dump of the wiki for importing into a MySQL database.

Upgrading MediaWiki

The installed MediaWiki extensions which I have modified may break with an upgrade as they have in the past, however, SelectCategory and Mutli-Category Search should be left behind once we have implemented Semantic MediaWiki correctly. Even CategoryBreadcrumb might be superseded by a good implementation of Semantic MediaWiki. I would otherwise need to commit my modifications for at least CategoryBreadcrumb upstream and/or otherwise prepare for the changes to avoid having broken features. See timeline.

Implementing Semantic MediaWiki

I would consult with Martin Renvoize and others who have some significant experience implementing Semantic MediaWiki properly to ensure that we will have an implementation that is reasonable, will pasts tests appropriately, and not merely seem to work with some known implementation difficulties waiting to bite as we have had over the use of Postgres as a database.

Phasing Out SelectCategory Extension

When we have migrated the database to properly support Semantic MediaWiki, we should consider the possibility of modifying some Semantic MediaWiki code in consultation with upstream to complement the limitations of auto-completion guesses as currently provided by Semantic MediaWiki and Semantic Forms extensions with some of the greater utility of selecting from a context sensitive visible list such as provided by the Semantic Drilldown extension, similar to what has been provided by SelectCategory functionality but in a more constrained context sensitive manner. [In the past, I have seen a Semantic MediaWiki implementation in which the issue of listing as opposed to guessing values had been addressed by some features of that wiki. However, as with any implementation of anything, most Semantic MediaWiki implementations are not good examples of the potential for what may be done for providing semantic access to wiki content.]

In the absence of extending Semantic Forms to provide selection lists in addition to the existing functionality of guessing with auto-complete, I find that the SelectCategory extension so helpful in my own work that I may even opt to have SelectCategory work for my user only even after we would transition to Semantic MediaWiki extensions. Sadly, even with Semantic MediaWiki category redirects would not work as an alternative means of authority control for categories analogous to tracings and references in library subject authority records.

Phasing Out Multi-Category Search Extension

Multi-Category Search may be helpful as we use categories in a more faceted manner in transition to supporting Semanitc MediaWiki. However, as with the SelectCategory extension, Multi-Category Search does not scale. After we have migrated the database to properly support Semantic MediaWiki, Multi-Category Search should be replaced by a good implementation of Semantic MediaWiki with Semantic Forms and Semantic Drilldown extensions.

Multi-Category Search extension may seem to be broken in the Google Chrome web browser but the extent of my testing shows that to merely be a display issue for some explanatory text and button labels. Functionality seems unimpaired. Perhaps a more recent upstream version would resolve the issue but any other work on the wiki seems more important especially with Multi-Category extension due to be phased out as we transition to using Semantic MediaWiki.

Multi-Category Search has been useful for searching using a category name precisely instead of merely guessing at a word in the category name when conducting a full text search. However, such functionality should be integrated with a full text search and not merely a separate feature. The author intended Multi-Category Search to be used for finding wiki pages which have a particular combination of categories as may be especially helpful in a more faceted use of categories.

Even without a more faceted approach, it is sometimes appropriate to apply multiple highly specific category assignments where appropriate for pages for which an individual general category is insufficient for capturing the various issues treated in the page, especially for some individual RFC pages which affect very specific but distinct aspects of Koha.

Wikipedia Templates

Many of the fine features such as special templates seen in Wikipedia require an extraordinary amount of work to implement which in the case of Wikipedia is easily provided by an army of contributors which may not necessarily be available all the time from the Koha community. The number of template dependencies for some simple Wikipedia info boxes can be extraordinarily large. However, contributors are always welcome to help do what is needed to provide what people want from the wiki.

Timeline

Various responsibilities have taken my time away from Koha which has led to my own neglect of the Koha wiki. These responsibilities will continue to take most of my time until at least February 2015. [Extreme cold preventing recovery from some seasonal respiratory infection took away far too much of my time from mid January to late March.]

I expect to progressively increase the time I have available for the wiki during Koha 3.20.

As stated above, I suggest deferring major development such as database migration to MySQL, upgrading MediaWiki, and implementing Semantic MediaWiki to the Koha 3.22 release cycle to allow proper time to test well and avoid breaking things including any modified MediaWiki extensions which would be needed until Semantic MediaWiki would be running well.

Background

Towards the end of the time when the Koha project had been using the former Koha wiki using DokuWiki, I had done much work to improve navigation following the model which we had developed at the first KohaCon in 2006. I had multiple pending proposals with some modification of DokuWiki code and CSS for improving navigation and readability.

I worked on implementing MediaWiki, as an alternative to DokuWiki, as a larger project with more features and extensions for navigation etc. and more familiar to more people from the Wikipedia implementation for which MediaWiki was created.

I have maintained my own modifications of the SelectCategory and CategoryBreadcrumb extensions for providing authority control consistency and improving navigation in the Koha wiki.

I have generally worked on the Koha wiki in sprints of effort especially during the time of KohaCon when I had been unable to attend the conference.

In 2005, before Koha had a single proper documentation document, I had created a very comprehensive combination roadmap and feature list which also linked into various documents which served as documentation for Koha at the time. Some of that work could be reimplemented to serve current needs.