Revision as of 16:56, 3 December 2009 by Jglick (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)

No longer in use by NetBeans. See: ExternalBinaries


External Binaries Storage in Mercurial

Mercurial, like other distributed version control systems (DVCS), keeps the whole history of your repository on your local disk. This history is copied from a server when you do a remote clone. (Doing a local clone will under most conditions share storage of the repository.)

While there have been proposals for changes to Hg to keep only a subset of history locally, downloading the remainder on demand from an official source, no such functionality is expected in the near future.

Since the NetBeans project uses a number of large external binaries such as libraries, and since our build system expects them to be kept under version control (we are not using e.g. Maven to retrieve named versions from a public repository), it is desirable to logically manage them with Hg yet not keep historical binaries in the actual Hg repository. Otherwise the repository could get substantially bigger as binaries are routinely upgraded.

The simplest solution seems to use Hg's encode/decode hooks. This is now set up on the NetBeans repository.

Storage model and basic hook operation

The system relies on a custom Hg extension, external.py. It is generic (could be used with other projects). The extension must be registered in .hg/hgrc so Hg will load it:

external = /path/to/external.py

Each external binary is actually stored in the repository as a special text file. You can see this representation using the hg cat command. For example:

$ hg cat o.apache.tools.ant.module/external/ant-libs-1.7.0.zip
<<<EXTERNAL 5FE8B5F60AEFC07CB4174DB9603897425788D80E-ant-libs-1.7.0.zip>>>

Here 5FE...80E is the SHA-1 hash of the file's contents. In most cases, when you upgrade a binary, it will have a new filename, e.g. ant-libs-1.7.1.zip. However on occasion developers upload a subtly different version of a binary without changing its name. The external hook is not confused by such situations since it always identifies files by unique hash.

The extension must be registered so it will be used on certain files:

*/external/*.{zip,jar,gz,bz2,gem,dll} = upload: https://hg.netbeans.org/binaries/upload
*/external/*.{zip,jar,gz,bz2,gem,dll} = download: http://hg.netbeans.org/binaries/

(Binary files with other name patterns, e.g. ZIPs used in test cases, will not be stored specially. Remember that it is NB policy that all third-party binaries must be kept in external directories directly beneath module project directories, and that in the new Hg repository layout, module project directories are all at top level. Hence the glob patterns shown above should cover all third-party binaries. Text files such as licenses in external directories are stored normally.)

The decode hook will try to download e.g.


the first time you do a checkout of this file with the hook registered. Then the actual binary is placed into your working copy of the file so you can build with it.

It will also save the binary to a cache directory: ~/.hgexternalcache, or on Windows C:\Documents and Settings\John Q. Hacker\.hgexternalcache. (Can be overridden with the environment variable HGEXTERNALCACHE.) Any subsequent checkout will just use the cache, so you will not need to be online. (If another developer pushed a new external binary into the repository and you pulled and checked out, then just that new binary would be downloaded and cached.)

The encode hook is very similar. It is run when committing a new or modified binary, or just doing some other operation which checks the file's contents, such as hg stat or hg di. If the external binary (identified by contents, not just name) is not already present in your cache, the hook:

  1. checks if it is already on the server (unusual but possible),
 in which case it is simply downloaded
  1. tries to upload it to the specified form (an HTTP POST operation)
  2. verifies that it can be downloaded from the server
  3. adds it to your cache

Since these operations use the network, you need to be online when first working with the new binary.

https://hg.netbeans.org/binaries/upload is password-protected; you need to use the same username and password as for pushing to the Hg repositories. You will be prompted for these if an upload is needed. You can also keep your username and optionally password in hgrc as a convenience, so you will not be prompted:

*/external/*.{zip,jar,gz,bz2,gem,dll} = \
  upload: https://jhacker:supersecret@hg.netbeans.org/binaries/upload

(There should be no line break in the real file!)

You can also do such uploads manually if you need to. Just open https://hg.netbeans.org/binaries/upload in a web browser. If prompted, authenticate with your regular Hg username and password. A file upload form with a submit button will appear. Choose the binary on disk, e.g. /path/to/nb_all/o.apache.tools.ant.module/external/ant-libs-1.7.0.zip. It will be saved under the correct name including hash automatically.

The upshot is that once the encode and decode hooks have been correctly registered, you do not need to pay much attention to the fact that binaries are hosted remotely. You can just use normal Hg operations to check out working copies, update, merge, add, remove, rename, commit, push, etc. You will notice that diffs show the special text files, but diffing binary files would not work anyway so this is no loss.

Hook registration

The problem remains - how is the external extension registered? Each developer could be required to register it by hand for each new clone, but this would quickly become tiresome. Instead there is some automation included in the build system. Here is how it works:

  1. You do a clone of the main repository. The initial checkout will show the special text contents for the binaries, so they are not usable yet.
  2. You run some Ant build target. nbbuild/build.xml#bootstrap is always run before anything else on a fresh checkout. A special Ant task is run which performs extension registration.
  3. If run from inside a Hg repository (not e.g. a source download), and external.py has not yet been installed, it is installed now.
  4. .hg/hgrc will be edited to register the extension and the encode/decode hooks.
  5. The existing checked-out "binaries" are deleted and checked out again using the decode hook. If some files are missing from your cache, they will be downloaded now, so you need to be online.
  6. If the contrib repository is present, this will be fixed up as well in the same way.

Since these steps are automated, you should not need to pay much attention. Just

$ hg clone http://hg.netbeans.org/main nb_all
$ ant -f nb_all/nbbuild/build.xml

should suffice to retrieve sources, download binaries, and perform a complete build.

Interaction with CRLF hooks

A complication arises for Windows users who have configured CRLF line ending translation as an encode/decode hook. This would normally be done in C:\Mercurial\Mercurial.ini or similar:

:* = cleverencode:
:* = cleverdecode:

Such a configuration will unfortunately override the external hooks. The bootstrap target should automatically detect this situation and abort. You just need to edit your configuration to say

{}** = cleverencode:
{}** = cleverdecode:

(This is semantically the same, but sorts after */external/.... and so does not take precedence!)

After saving your modified global Hg configuration, just rerun the Ant build and it ought to work.

In the future this annoyance ought to be automated away. The main obstacle is finding Mercurial.ini on your disk reliably.

Source code

For those interested in implementation:


  • MQ does not seem to like adding or removing external binaries Hg bug #887
  • A merge conflict (i.e. binary changed differently in the two branches being merged) causes the merge operation to abort, rather than just reporting the conflict.
  • HTTP proxies are not handled yet. NB bug #136322
  • If you hg up null your local copy of the hook implementation disappears, complicating subsequent operations.
  • Encode/decode hooks are not prioritized well, complicating registration of the external hook on Windows. Hg bug #195 Workaround is to use a special match-all pattern.

Other approaches considered

  1. Do nothing and hope that Hg will one day support TrimmingHistory.
  2. Use an experimental clone --punch to pull only part of the history of the repository. Requires a patched Hg.
  3. Use an Ant task to perform downloads of binaries according to a textual manifest during the build. Requires manual binary upload and special editing of manifest files. Now in use instead of solution described here.
  4. Use Ivy or Maven to manage libraries. Would require all libs used in NB to be present in an official repository. Requires diligence in never touching a given version of a lib. Extra build steps and technology to learn and deploy.
  5. More recently available: BfilesExtension
Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo