HgMigration

Technical status of CVS -> Mercurial migration

Status and technical details of switching the VCS for NetBeans sources to Mercurial.

Contents


Other pages of interest

Content migration script

In apisupportx/hgimport (in trunk). Expects to migrate ../.., i.e. containing NetBeans sources. No particular module list - check out . (everything). Only clean checkout supported (unbuilt sources). Any output created in apisupportx/hgimport/out.

Daily builder Artifacts should include several demo Hg repos. These will be recreated each day, so for testing purposes only.

Status: all standard IDE clusters buildable. Javadoc builds. NBMs for stableuc build. experimental modules build. JNLP builds. Tests build.

Script functions:

  • Module paths normalized to abbreviated CNB with common prefixes implied;
 e.g. java/j2seproject -> java.j2seproject,
 core/progress -> api.progress,
 openide/loaders -> openide.loaders
  • regular naming eliminates guesses about where module sources are
  • flat structure should be more efficient, easier to search
  • look at a preview
  • Several repos created, such as:
  • main with:
  • all modules in standard distro
  • perhaps all modules on stable UC
  • basic build infrastructure and IDE branding
  • test infrastructure
  • contrib with additional buildable modules
  • would for now expect to be checked out as subdir of main,
   perhaps using "forest" extension
  • www with */www (most importantly, testwww/www
  • misc with unbuildable modules, ...anything else uncategorized
  • truly obsolete stuff should be deleted from CVS trunk first
  • .cvsignore files translated to .hgignore automatically
  • new file paths should be automatically substituted in relevant files
  • */build.xml, */nbproject/project.properties, ...
  • need to check carefully for actual file paths,
  e.g. not all occurrences of string image should be replaced with o.n.m.image!
  • should produce as output a *.diff of all substitutions, for human review
  • complete translation table should be saved for future use
  • (for example, could be used by Sustaining to create an automatic translation script
   for patches created in new structure to be backported or vice-versa)
  • files with excessively long Hg storage paths are not imported

Source reorgs are conservative with respect to build system: module file paths change, but not basic build logic.

Files with disposition TBD

Performance tests

performance
enterprise/performancetests
j2ee/performancetests
languages/performancetests
mobility/performancetests
uml/performancetests
visualweb/test/performance
web/performancetests

are now put in main/performance and subdirectories thereof. Oleg Khokhlov says this is OK.

Migrate history or not?

It is viable to import some CVS history into Hg.

Specifically, trunk revisions of text files will be imported. History of binary files will not be imported (just the last trunk revision). History of CVS branches will not be imported. History of files deleted in the CVS trunk will not be imported.

CVS branch names (e.g. release60) and tags applied to branches (e.g. release60-BLDwhatever) will not be imported. Trunk tags will be imported as Hg tags, though we will probably want to delete most of them later, since most are meaningless daily build tags. (In Hg, unlike CVS, tags are just part of versioned data, so deleted tags could be recovered if they were ever needed.)

This subset of history is enough to perform hg annotate and similar commands on files in the repository. You even get changesets (a single commit spanning multiple files) - though changes to binary files like icons, and file deletions, will not be represented.

Any text files mistakenly committed with CRNL line endings will also be converted to NL in the imported history.

.cvsignore files are not imported at all, since they are collected into a .hgignore file.

Importing full history would suffer from two problems:

  1. The hg convert command aborts if it gets too many revisions at once. (This might be fixable using some tricks, but would be extra work.)
  2. The initial repository size would be much bigger, making network clones slow and disk usage heavy. (Storing line-by-line diffs for text files which are present in the repo anyway only seems to add 110Mb of extra space for the main repository, which is reasonable.)

Windows-specific notes

Standard IDE known to build on Windows, but more testing is in order.

To clone the repo on Windows, you must be sure to use a relatively short base path (up to around 50 characters), because of the unresolved issue with long repository filenames (see Hg bug #839 below).

Developers must commit text files with NL line endings. Howeber, CRNL <-> NL translation on Windows is not on by default with Mercurial. It can be enabled by editing the config file (e.g. C:\Mercurial\Mercurial.ini) to turn on cleverencode and cleverdecode mode. Autodetects binary files based on NUL. Seems to work as expected (IDE builds and runs successfully, at least). Repository after import should be free of text files containing CRNL.

Will have an incoming hook on public repositories to reject changesets with bad line endings. Otherwise Windows developers who forget to configure the encoder can cause trouble. (See Hg RFE #882 below.)

Server

http://hg.netbeans.org/

Users and authentication

We will want to import usernames (and passwords?) from the current netbeans.org auth DB.

One issue is what email address should be configured for the user as far as Hg is concerned. (This can be set in ~/.hgrc or $REPO/.hg/hgrc.) We want these to be predictable for purposes of disaster recovery, hg churn, etc. Probably jhacker@netbeans.org should be used in case jhacker is the login ID. (Or simply jhacker; Hg does not require an actual email address.) Tricky to enforce. One possibility is to set up an incoming changeset hook on the server which checks for non-*@netbeans.org addresses (or apparent usernames which are not in the database) in new changesets; if it finds any, send mail to those people asking them to configure Hg correctly.

Commit model

Main proposal is for staged propagation of commits through team branches, with test-based controls to prevent serious problems (e.g. uncompilable sources) in the master repository. Details: MercurialBuildInfrastructurePlan

TBD:

  • What is granularity of "team"? 1-20 people?
  • Because with Hg you have a local repo, a "team of 1" means:
  commit at will, and once a day (or less) do a clean build w/ tests and commit.
  (In the morning! Not right before leaving work!)
  But no protection against developers who refuse to work this way.
  • Who or what merges to master repo - dev lead? Automated process?
  • What is the schedule for pushes to master? ASAP after tests pass?
 Every Monday for team #1, Tuesday for team #2, etc.?
  • When do bugs get mark fixed in the bug tracker?
 When you commit to local repo?
 When you push to team server?
 When team pushes to master?
 Do you put changeset hash in bug database right away,
 or wait for some commit hook to add it for you?
  • Does it really make sense to have a team repo clone,
 or should we instead use named branches to do possibly unstable commits?

Commit notifications

Mail

Hg comes with some script for sending out changes. It does not support sending to different people/aliases based on paths of files changed. But this can be fixed pretty easily with a small patch to notify.py.

RSS

Could perhaps have e.g.

http://hg.netbeans.org/rss/main?modules=o.o.fs,o.o.loaders,...

which would provide personalized RSS feeds of changes in certain areas only.

Code reviews

Crucible currently lacks Hg support.

Review Board is O/S and has Hg support.

CIA

Should set up NB in CIA. Hg includes an extension to make this easy.

Performance

Repository size

Affects download (~ initial clone) time. Also affects local disk usage, though hardlinks help.

Rename problem

Hg (as of 0.9.5) increases storage whenever you move files. See Hg bug #883 below.

Because of this problem, we need to do big source reorgs (as in migration script) now, not later.

Binaries

Upgrading e.g. libs/xerces/external/xerces-2.8.0.jar to libs/xerces/external/xerces-2.8.1.jar will result in both binaries being kept in revlogs in the repo. Over time repo size will grow, perhaps substantially. (Initial binary size: 100Mb out of 500Mb total for main repo. In CVS repo, **/external contains 227Mb of historical versions. All **/*.jar/zip, dating to before external dirs, are 2563Mb!)

Solution: HgExternalBinaries

Scan time

With a lot of files, a simple hg status can take some time.

INotify

Linux users can use the INotify extension. Huge speed improvement for certain operations. But still buggy and not yet recommended.

You will however need to run:

sudo sh -c 'echo 32768 > /proc/sys/fs/inotify/max_user_watches'

to use it on the "main" repository due to the large number of directories. The above works for one session; to make it permanent:

sudo sh -c 'echo "fs.inotify.max_user_watches=32768" >> /etc/sysctl.conf'

A Windows port would be great.

Local disk space

Hardlinks

For Unix users, local clones use hardlinks to reuse .hg storage. The actual checkout must be a full copy (in case editors do not create new inodes when saving changes). Should also work for Windows users if using NTFS.

Network clones of course cannot use hardlinks. It is possible to minimize disk usage (and clone time!) when you are creating a local clone of a server clone (e.g. a release repo) and you already have a local clone of master.

Mirrors

Since Hg is fully distributed, it is possible to set up mirrors easily. After cloning from a mirror, you need to edit .hg/hgrc to set the default parent repo.

You can create a local mirror of master on your own computer, too. (Use hg clone -U to skip checkout of tip.) This can be useful: do all pulls into this local mirror, and do most clones starting from it.

Training

TBD.

Collect URLs for useful summaries, tips & tricks, etc. from other teams.

Need list of best practices. At least, unlike CVS, it seems that best practices should actually work once you understand them.

IDE Support

MercurialVersionControl

Documentation

HgMigrationDocs

HgHowTos

Stuff from other Sun teams

Not externally available at the moment.

Java SE: Hg transition

Java SE: Hg general

Scenarios

Actual tested sequences of Hg operations to accomplish certain tasks.

TBD. Need to collect and test one by one, incl. performance.

Repeated merges in repo clones and named branches

Make sure merge tracking is OK.

Merge from server clone to server clone

Can you merge changes from one server-side clone to another? Do you have to make a separate clone of each?

Seems to work fine and you can do a local clone. You need to use hg clone -r using the revision of master in effect at the time the server clone was made (use tags!!), then update from it. You do not get hardlinks between .hg dirs in this case (see RFE below).

Hg Bugs/Problems

Jesse's issue list

P1: fix is critical for migration to succeed

The server should reject bad merges. Hg RFE #1038

Hg does not let you do a clone of part of the repo, or check out part of a full repo. Hg RFE #105 Hg RFE #515

P2: fix will likely be wanted for continued usage

Hg increases storage whenever you move files. Hg bug #883

Keeping binaries in the repository would mean a noticeable increase in repo size after every update to a new version of a third-party library. Currently working around; details: ExternalBinaries. But it would be much preferable to be able to logically keep binaries in the repository yet avoid pulling them until they are really needed: TrimmingHistory

There is no simple way to see what a merge changeset changed beyond what a "simple" automated merge would have done. Hg RFE #981

Frequent failures to push over HTTPS on Windows. Hg bug #1003

Merging a case rename on Windows can cause file deletion. as yet unfiled

hg clean does not work correctly on Mac OS X. Hg bug #1097

There is no command to back out a bad merge. Hg RFE #1010

hg import does not work for some people. Hg bug #961

No extension in standard distro with "shelve" functionality. hgattic seems to work well, but few developers are going to download and enable a third-party extension.

hg revert -r ... on an uncommitted merge should issue a very strict warning. (The warning should explain to use hg up -C to start over.) Run by an inexperienced user, it can result in other changesets being "silently" backed out. as yet unfiled, but see Hg bug #2915

Handling of self-signed SSL certificates is unfriendly. Hg bug #2596

Bogus merge conflicts. Hg bug #3372

P3: should be fixed as time permits

The Windows setup wizard should offer to enable CRLF translation. Hg RFE #923

Semantics of .hgignore w.r.t. directories is unclear. Hg bug #886 Various other .hgignore problems have been encountered, including Hg bug #951 (now fixed in dev)

There is no simple way to see the patch that would result from all outgoing changesets. Hg RFE #28 and also relates to Hg RFE #219

You are not stopped from transplanting the same changeset twice in different clones. Hg bug #1210

Hg will try to delete an empty dir during update, even if that is your CWD, which can result in odd errors. as yet unfiled

kdiff3 does not work out of the box. Hg bug #1118

If you forget to add new files, neither hg ci nor hg fetch will warn you. Hg bug #1318

The parents log keyword requires --debug for full output. Hg bug #1435

hg transplant -s $repo $rev failed to import http://hg.netbeans.org/jet-main/rev/d1f97dcec7ca into core-main; complained that the revision could not be found. hg in -r $rev $repo worked fine. as yet unfiled

strace hg di -r release65_base o.n.bootstrap/src/org/netbeans/TopSecurityManager.java shows every file in the checkout being stat'd, for no apparent reason. as yet unfiled

hg bisect does not work well with merges. E.g. run

hg bisect -b de345c4fe13f
hg bisect -g be85c2e1048d

and then mark any revision good if it contains nbbuild/test/unit/src/org/netbeans/nbbuild/data/Category.png and you wind up with c429f829c9f3 being blamed, though this is not a descendant of the good revision. as yet unfiled

Incorrect merge of file deletion & resurrection. Hg bug #1740 (Probably also cause of: merging produced a spurious conflict (remote changed, local deleted) for main #8041de5d29cd. Something involving repeated merges of a backout.)

hg ann nbbuild/build.xml fails. Hg bug #1924

hg push sometimes requires --force when it should not. Hg bug #1974

hg fetch scans the working copy too many times. Hg bug #2085

hg pull -u scans the working copy unnecessarily. Hg bug #2092

If you include a #branchname in your paths.default, hg fetch does not work. Hg bug #2982

Nested repositories are not handled correctly by inotify. Hg bug #2319

Possible merge bug (symptoms fixed in 69dcf3fcc35d). phejl says "I removed tomcat.obsolete on branch (module and line in cluster.properties); sg-nb reverted some of his changes, mistakenly removing tomcat.obsolete line from cluster.properties on trunk (bdfefc80db7c); sg-nb added tomcat.obsolete to cluster.properties on trunk (c45b7baf4478); I merged the branch to trunk; tomcat.obsolete module removed on trunk, but still in cluster.properties". Sounds like the merge (996de2fd4b52?) somehow did not work correctly - should have either resulted in the entry being missing from cluster.properties, or a merge conflict. Maybe http://mercurial.selenic.com/bts/issue1740 or maybe something else. But ca913542dad4 (last serverplugins-next) still has tomcat.obsolete listed. (phejl says he definitely didn't readd it on the branch manually; maybe it was merged from trunk as he made couple of trunk -> branch merges.) (as yet unfiled)

Grafting does not record copy origins. Hg bug #3265

There is no equivalent of diff --git without --text, i.e. cannot display file renames correctly without also displaying lengthy binary file adds or modifications. (as yet unfiled)

Irrelevant changes to file mode, e.g. in 08f1e909c26c, cannot be suppressed; about 20% of files in the main repo are g+w, for no apparent reason, and changes to such modes (perhaps only on Windows?) appear in the file log. (as yet unfiled)

P4: nice to have

No equivalent to INotify is available for Windows users.

Performance of hg convert is poor when you want to HgHowTos#TransferAModuleSHistoryToAnotherRepository. Hg RFE #991

hg ci visualweb.* fails if just some, but not all, visualweb.* modules are modified. Hg bug #1925

If you update a dirty checkout and .hgignore changes as a result, it is possible for previously ignored files (e.g. build products from a renamed module) to become no longer ignored, yet there is no warning that this has happened or that it might be safe to delete these files. as yet unfiled

hg log --removed needed to see identical parallel changes to a file. Hg bug #2604

Hyperlinks in hgweb dereference branch names. Hg bug #2296

No way to query extra fields in revsets. Hg RFE #2767

Resolved: already fixed in development versions of Hg, or no longer relevant to NB

CRLF autodetection of binary files changed unpleasantly in Hg 1.0. Hg bug #1066 (fixed in dev)

hg fetch can silently include your password in a commit message. Hg bug #909 Prevented using push hook. (fixed in 1.0)

While you can translate LF <-> CRLF on the client, there is no standard server hook to prevent line ending accidents. Hg RFE #882 (The associated patch can be used without being accepted into Hg proper.) Fixed in Hg 1.0 and installed as push hook on server.

Merging in the presence of named branches can be confusing. Hg bug #756 We will not use named branches for now.

Some operations are unnecessarily slow on Windows. Hg bug #952 (fixed in 1.0)

Reverting a single file in NB can be slow. Hg bug #857 (fixed in 1.0)

The exit code of hg push is unreliable. Hg bug #989 (fixed in 1.0)

Specific error messages from server pre-push hooks are not displayed to an HTTPS client. Hg bug #937 (fixed in 1.0, patch applied to live server)

The order of merge parents for hg fetch is often misleading. Hg RFE #1011 (fixed in 1.0)

Merge restores deleted files if you had locally deleted them but did not commit the delete. Hg bug #988 (fixed in 1.0, working around with server hook to prevent damage from being propagated)

Hg does not prevent you from committing only some files from a merge. Hg bug #1049 (fixed in 1.0)

INotify extension is not bundled in the standard distro. Hg RFE #809 (in 1.0, still discouraged for use)

Diffing some files can be slower in Hg 1.0 than in 0.9.5. Hg bug #1090 (fixed in dev)

Renaming SubSet.java to Subset.java might cause an error for a Windows user trying to update sources. Hg bug #750 (fixed in dev) There are other potential problems with case collisions on Windows. Hg bug #593 (fixed in 1.1)

Misleading display of execute permissions. Hg bug #1042 fixed in 1.0 or 1.0.1

Long source filenames can become even longer in the repository: Hg bug #839. Worked around by preventing store paths > 206 chars from being pushed. (fixed in 1.1)

Don't accidentally name a package beginning with con rather than com or the repo may be corrupted for Windows users: Hg bug #793 (fixed in 1.1)

Git-mode diffs (better to look at renames) are not supported in the web interface. Hg bug #1258 (fixed in 1.1)

The INotify extension breaks pulling from a bundle. Hg bug #1436 (fixed in 1.1.1)

Should not be necessary to use a "gatekeeper" just to prevent clients from pulling changesets about to be rejected by server hooks. Hg bug #1321 (fixed in 1.2)

Client certificates are not supported for HTTPS authentication. Hg RFE #643 (supposedly fixed in dev)

Pulling over HTTP can occasionally abort if a complex push is in progress. Hg bug #1320 (fixed in 1.1)

hg fetch is much slower in 1.3. Hg bug #1752 (fixed in dev)

hg verify is reporting some issues in the NB main repository in cnd.repository and etl.project (no known problems resulting from these). Seems to get corrected by HgBug47098e67a01d scripts.

HgBug47098e67a01d

Some operations, especially hg clone -r, fail to use hardlinks aggressively enough; this can result in unnecessarily large repository storage when using multiple repositories. Hg RFE #919 (new relink extension in dev)

You cannot reliably ignore whitespace changes in diffs, e.g. wrapping code in a block. Hg bug #127 (fixed in dev)

Slow performance claimed on Windows when using filters. Hg bug #1384 no longer reproducible

hg fetch merges incorrectly when using the inotify extension. Hg bug #921 no longer reproducible

Diff hangs on certain patches involving renames. Hg bug #1947 (fixed in dev)

hg heads --active shows heads on closed branches. Hg bug #1893 (fixed in 1.5)

Transplant seems to fail on Windows. Hg bug #1077 (works in 1.3+ when patch.eol=crlf as is recommended)

When using inotify, newly created build products appear as ? (unknown) until you touch .hgignore. Hg bug #884 (fixed in 1.6.1)

Revset subtraction syntax seems broken. Hg bug #2485 (fixed in 1.7)

hg help templates to get immediate help on template keywords and filters. Hg RFE #1486 (fixed in dev)

Rejected: still wanted, but closed without fix

You can download unversioned archives of a snapshot of sources from the web interface, but you cannot download the repository (in the preferred "bundle" format ~ *.hg). Hg RFE #713

Per-repository configuration such as username is not copied during local clone. Hg RFE #918

Revert does not clean up additions. Hg bug #3024

Tagging a server repository (with no checkout) is unnecessarily complicated. Hg bug #916

Hudson + Hg Bugs/Problems

Hudson does not support the Forest extension. Hudson RFE #1143

Hudson has difficulty linking to hgweb. Hudson bug #1038

Hudson should record the Mercurial ID of the repository in the build's metadata somewhere.

Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo