svndumpfilter & svndumpfilter2: Extract svn paths to separated repository

If you have a svn repository and you would like to extract one ore more paths to put them in a separated repository it could be exhausting, especially if the origin svn repository is very big.

svndumpfilter

svn’s svndumpfilter tool helps you to meet the target: You can create a dump of the origin svn use svndumpfilter to filter the paths you would like to extract and reload the created dump in a new repository. The general syntax is:

svnadmin dump /path/to/origin-svn| svndumpfilter include --drop-empty-revs --renumber-revs path1-to-extract path2-to-extract > /path/to/filtered.dump

The options –drop-empty-revs and –renumber-revs are used to filter all revisions where no changes are made to the paths to extract and to renumber all revisions for the new repository. To load the dump in a new repository use

svnadmin create /path/to/new-svn && svnadmin load /path/to/new-svn < /path/to/filtered.dump

svndumpfilter works as long as you do not have any commits that for example move or copy files from a path that you do not want to extract to a path you would like to extract. In such situations you will get an error like:

svndumpfilter: Invalid copy source path '/.../.../'
svnadmin: Can't write to stream: Broken pipe

svndumpfilter2

This is where svndumpfilter2 comes into play; from the manual:

“[…] This is similar in concept to the official tool `svndumpfilter’, but it’s able to cope with revisions which copy files into the area of interest from outside it (in which situation a Node-copyfrom won’t be valid in the output dump file) […]”).

The syntax is as follows:

svnadmin dump /path/to/origin-svn | ./svndumpfilter2 --renumber-revs --drop-empty-revs /path/to/origin-svn path1-to-extract path2-to-extract > /path/to/filtered.dump

Creating a new repository and importing the dump works as described above.

svndumpfilter & svndumpfilter2

Unfortunately you could run out of memory when using svndumpfilter2. To handle the problem it could be useful to first use svndumpfilter to extract the smallest possible set of paths including all the paths you would like to extract and all paths that are needed by svndumpfilter to create a (consistent) dump file. This dump file could be loaded in a new temporary repository and in a second step svndumpfilter2 could be used to extract all paths that originally should be separated.

Summarized the commands are:

  1. Try to export to a dump file using svndumpfilter (if this step succeed, you are done):
    svnadmin dump /path/to/origin-svn | svndumpfilter include --drop-empty-revs --renumber-revs path1-to-extract path2-to-extract > /path/to/filtered.dump
  2. Add the “missing” path and try it again:
    svnadmin dump /path/to/origin-svn | svndumpfilter include --drop-empty-revs --renumber-revs path1-to-extract path2-to-extract path1-needed-by-svndumpfilter > /path/to/filtered.dump
  3. Repeat step 2 until the dump file could be created.
  4. Import the dump file to a temporary repository:
    svnadmin create /path/to/tmp-svn && svnadmin load /path/to/tmp-svn < /path/to/filtered.dump
  5. Use svndumpfilter2 to extract the paths from the temporary repository:
    svnadmin dump /path/to/tmp-svn | ./svndumpfilter2 --renumber-revs --drop-empty-revs /path/to/tmp-svn path1-to-extract path2-to-extract > /path/to/filtered.dump
  6. Import the dump to your new repository an remove the temporary one created in step 4.

Logically, this only works if the created temporary repository is much smaller than the origin repository to reduce the memory consumption of svndumpfilter2.

There is also a svndumpfilter3 script that bases on svndumpfilter2 which reduces the memory costs but it seems to be unmaintained at the moment and there are reported bugs that are not fixed.

Comments

  1. Ismo Hääväräinen

    It seems that you’re not the only one fed up with the cumbersomeness of svndumpfilter. There now is a tool named svndumpsanitizer that seems to be the best I’ve tried in this cathegory so far. http://miria.linuxmaniac.net/svndumpsanitizer/

  2. Maurits van Rees

    svndumpsanitizer looked nice but did not work for me; calling svndumpfilter after it for the final cleanup (removing empty revisions and renumbering) I either got the ‘Invalid copy source path’ again or no superfluous revisions were actually removed. This might be because the original repository had a rename in it from folder ‘Foo’ to ‘foo’ and this was the exact directory that I needed the dump for.

    Anyway, svndumpfilter2 worked perfectly for me, so thanks to Simon Tatham for creating it and thanks to you for blogging about it!

  3. Harald Wilhelmi

    Just recently I released svnfiltereddump. It is still beta but should not have the ‘‘Invalid copy source path” issue. See https://github.com/TNG/svnfiltereddump andhttps://github.com/TNG/svnfiltereddump/wiki/Formated-Man-Page.

  4. akash dubey

    Thanks mate for the nice tutorial , i was quite interested to used tool https://github.com/TNG/svnfiltereddump.

    i was able to install it using “sudo easy_install svnfiltereddump” ( i am on python-2.6) .. well it was damn easy to do 🙂 .. however the man pages never came along with that not sure how .. & i am not abale to pave my wya using svnfiltereddump -h ( it gives the info, but does not spoonfeed :/)so i do have a huge repo to filter and svndumpfilter/2/3/4 have not been helpful and i am kind of counting on your tool.. can you please enlighten me bud!

  5. Sri

    Step 5 throws the following error:
    -bash: ./svndumpfilter2: No such file or directory
    svnadmin: Can’t write to stream: Broken pipe

    Where is svndumpfilter ? How do I use it?

  6. Jan Jonas

    Hi Sri,
    you can download svndumpfilter2 here.

  7. Michel Bisson

    Thanks so much. You wouldn’t believe how much time me and my coworker has spent on this problem. We tried just about everything except this concept of reducing the dump file size via a temporary repo to be able to use svndumpfilter2.
    I ran into some problems though and that is also why I’m writing this comment.
    As the first dump file was created ad I tried to load it into a temporary repo the process of loading stopped very early with an error saying the Fire doesn’t exist. I looked quite a while in Google till someone mentioned that very often this operation of loading ‘filtered’ dumps fails on brand new repos. Hummm!! I though first I just need to commit some junk into it and then try it again. That didn’t work. Then I created all the paths concerned in my operation and then deleted them immediately. BANG it all worked beautifully afterwards. My guess is that since a filtered dump contains unknown paths and because the load program doesn’t seem to be able to create parent directories automatically then it failed. After I had created the paths and deleted them, it seems the parents paths were still there and the new path could be created from the dump file. I automatized the whole operation in a bash script and now I’m really pleased I can do the path moves from one repo to another without any trouble. Thanks to you and your marvelous idea.

  8. Michel Bisson

    Sorry for the typing mistakes. I’m still excited about having finally solved this issue.
    So the error I had was ‘File doesn’t exist’ and not Fire doesn’t…

Leave a comment

Your email address will not be published. Required fields are marked *

*