svndumpfilter & svndumpfilter2: Extract svn paths to separated repository

If you have a svn repository and you would like to extract one ore more paths to put them in a separated repository it could be exhausting, especially if the origin svn repository is very big.

svndumpfilter

svn’s svndumpfilter tool helps you to meet the target: You can create a dump of the origin svn use svndumpfilter to filter the paths you would like to extract and reload the created dump in a new repository. The general syntax is:

svnadmin dump /path/to/origin-svn| svndumpfilter include --drop-empty-revs --renumber-revs path1-to-extract path2-to-extract > /path/to/filtered.dump

The options –drop-empty-revs and –renumber-revs are used to filter all revisions where no changes are made to the paths to extract and to renumber all revisions for the new repository. To load the dump in a new repository use

svnadmin create /path/to/new-svn && svnadmin load /path/to/new-svn < /path/to/filtered.dump

svndumpfilter works as long as you do not have any commits that for example move or copy files from a path that you do not want to extract to a path you would like to extract. In such situations you will get an error like:

svndumpfilter: Invalid copy source path '/.../.../'
svnadmin: Can't write to stream: Broken pipe

svndumpfilter2

This is where svndumpfilter2 comes into play; from the manual:

“[…] This is similar in concept to the official tool `svndumpfilter’, but it’s able to cope with revisions which copy files into the area of interest from outside it (in which situation a Node-copyfrom won’t be valid in the output dump file) […]”).

The syntax is as follows:

svnadmin dump /path/to/origin-svn | ./svndumpfilter2 --renumber-revs --drop-empty-revs /path/to/origin-svn path1-to-extract path2-to-extract > /path/to/filtered.dump

Creating a new repository and importing the dump works as described above.

svndumpfilter & svndumpfilter2

Unfortunately you could run out of memory when using svndumpfilter2. To handle the problem it could be useful to first use svndumpfilter to extract the smallest possible set of paths including all the paths you would like to extract and all paths that are needed by svndumpfilter to create a (consistent) dump file. This dump file could be loaded in a new temporary repository and in a second step svndumpfilter2 could be used to extract all paths that originally should be separated.

Summarized the commands are:

  1. Try to export to a dump file using svndumpfilter (if this step succeed, you are done):
    svnadmin dump /path/to/origin-svn | svndumpfilter include --drop-empty-revs --renumber-revs path1-to-extract path2-to-extract > /path/to/filtered.dump
  2. Add the “missing” path and try it again:
    svnadmin dump /path/to/origin-svn | svndumpfilter include --drop-empty-revs --renumber-revs path1-to-extract path2-to-extract path1-needed-by-svndumpfilter > /path/to/filtered.dump
  3. Repeat step 2 until the dump file could be created.
  4. Import the dump file to a temporary repository:
    svnadmin create /path/to/tmp-svn && svnadmin load /path/to/tmp-svn < /path/to/filtered.dump
  5. Use svndumpfilter2 to extract the paths from the temporary repository:
    svnadmin dump /path/to/tmp-svn | ./svndumpfilter2 --renumber-revs --drop-empty-revs /path/to/tmp-svn path1-to-extract path2-to-extract > /path/to/filtered.dump
  6. Import the dump to your new repository an remove the temporary one created in step 4.

Logically, this only works if the created temporary repository is much smaller than the origin repository to reduce the memory consumption of svndumpfilter2.

There is also a svndumpfilter3 script that bases on svndumpfilter2 which reduces the memory costs but it seems to be unmaintained at the moment and there are reported bugs that are not fixed.