Git Splitting and Shrinking

From Koha Wiki
Jump to navigation Jump to search

On this page I (Frédéric) open the discussion about splitting Koha current repository into two or more sub-repo.

Situation

Current Koha Git repository has a long and valuable history. It contains everything, and every contribution, since the beginning of the project. This long history has a price:

  • Size -- Koha current repository is large. It occupies, after packing, about 355M 171M. This is more and more problematic because the translation files (.po files) are growing up indefinitely at an accelerate pace.
  • Monolithic -- It contains parts of the project that could/should be managed separately: .po files for example, but also static files, images, libraries like jQuery, etc.

We could imagine three scenarios in order to split and shrink Koha Git repository:

  1. Keep history -- Create two (or more) Koha sub-projects, one for Koha code koha-core, and one for Koha .po files koha-i18n, and keep the whole history in both of the repositories.
  2. Keep history as history -- We froze the current Git repository in which Koha history will live for eternity as a koha-legacy repo, and initialize two new clean and fresh repositories.
  3. Keep-loose history -- Keep the history in koha-core, start fresh for koha-l8n.


Evaluation

I've played with this idea and tried to evaluate the size profit. Here is table comparing Git repo sizes for current Koha (koha-legacy), Koha without .po files (koha-core), Koha translation files (koha-i18n), with or without history (new Git repo) :

Git repositories size comparaison
With history New repo
koha-legacy 171M 20M
koha-core 40M 9M
koha-i18n 130M 11M

But the choice is complicated by the fact that splitting the repository and keeping its history is not that simple (see above). In that table, a size of 40M for a koha-core repo with history is theoretical. Practically, its not possible to remove completely .po files from the repo without removing partially its history. If we keep all tags, .po files can't be removed. And so the koha-core repo has not a 40M size but a size of 116M. However there is a solution: removing all tags allow to reach a 40M size ; we keep all history but it's not possible anymore to reach a Koha version by its tag (we can with koha-legacy).

Howto split and shrink

Mirror Koha Git repository, track all its branches, and compact it:

git clone --mirror git://git.koha-community.org/koha koha-full
cd koha-full
for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v main`; do
   git branch --track ${branch##*/} $branch
done
git gc --aggressive --prune=now

Optionally, it's possible to remove remote branches fetches locally (all those branches created to follow bugs and enhancements):

for b in `git branch -r | grep  origin/new/*` ; 
do git branch -d -r $b;
done

Create two repo koha-core and koha-i18n:

cd ..
cp -r koha-full koha-core
cp -r koha-full koha-i18n

Keep only .po files in koha-i18n:

cd koha-i18n
git filter-branch --subdirectory-filter misc/translator/po -- --all
git gc --aggressive --prune=now

Keep all except .po files in koha-core:

cd ../koha-core
git filter-branch -f --index-filter \\
   'git rm --cached -r --ignore-unmatch misc/translator/po' \\
   --prune-empty -- --all
git gc --aggressive --prune=now

We can see that koha-core still contains .po files and has a size of 116M:

git rev-list --objects --all | grep "\.po$" | less

Solution -- koha-core is 116M due to repo tags. If we remove them (is it acceptable?), we can slim down koha-core to a size of 40M:

for tag in `git tag`; do echo $tag; git tag -d $tag; done
git gc --prune=now

Howto use Koha main repo with localization repo

At the end of the day, we have two Koha repo (and more later...) :

 * koha-core -- all Koha ressources except .po files
 * koha-i18n -- all Koha .po files

Developers who don't have to deal with internationalization just have to clone koha-core as before.

git clone git://git.koha-community.org/koha-core


Developers who need localization files, and international implementers who want to deploy Koha using Git, need to clone both repo, something like:

git clone git://git.koha-community.org/koha-core
cd koha-core
git submodule add git://git.koha-community.org/koha-i18n misc/translator/po


Discussion

Discussion restarted at http://comments.gmane.org/gmane.education.libraries.koha.devel/7609

Choosing a scenario

  • How about scenario 3 -- Keep history for main repo, keep the history in koha-core, start fresh for koha-l8n --Chris 17:53, 12 September 2011 (EDT)
  • Yes, it would stop koha-core inflation while keeping available the history in the main repo. Please note that, unfortunately, koha-core size couldn't be reduced to the theoretical 40M value (see Evaluation section below) --Frédéric
  • I Prefer scenario 3 from far (for example: 2 weeks ago I had to git blame and search for some code that was more than 4 years old, so it's usefull) --Paul Poulain 04:03, 13 September 2011 (EDT)
  • Don't forget that even in scenario 2, koha-legacy repo can be used to blame code --fdemians 05:20, 13 September 2011 (EDT)
  • I think I still prefer 3, having to flick between 2 repos to do blame will get harder and more annoying over time.--Chris 05:27, 13 September 2011 (EDT)
  • cleaning git repo = can't this be achieved with some tricks like those explained here : http://help.github.com/remove-sensitive-data/ ? --Paul Poulain 06:28, 13 September 2011 (EDT)
  • Thanks Paul, your link is interesting. I've already used the techniques described in this article. After few more test, I've discovered that the impossibility to really remove .po files from koha-core comes from tags. I've updated this page consequently. We may have to choose to loose or to keep tags into Koha repo. --fdemians 13:49, 13 September 2011 (EDT)
  • I'm going to vote for keeping the tags, they are hugely useful, especially when answering questions. "Line 321 of C4/Auth.pm in version 3.2.10 has a bug on it" git checkout v3.02.10, check line 321 --Chris 16:30, 13 September 2011 (EDT)
  • About the tags, I agree with chris that it's usefull. But maybe not *all* of them. For example, there are many from version 1 or 2 that are probably useless. Could we have cheese & dessert and recreate manually many tags, but not all ?--Paul Poulain 11:06, 15 September 2011 (EDT)
  • I would go for option 3 -- marcelr

Scheduling and coordinating the operation

If we choose one scenario (scenario 3), we have to figure out:

  • when to split effectively Koha current repo on git.koha-community.org
  • who do the job
  • implications for developers