HgMigration
Technical status of CVS -> Mercurial migration
Status and technical details of switching the VCS for NetBeans sources to Mercurial.
Contents |
Other pages of interest
- HgMigrationPlan and HgMigrationChecklist
- HgTrainingMaterials and HgTrainingDelivery
- HgHowTos
- HgTeamIntegration
Content migration script
In apisupportx/hgimport (in trunk). Expects to migrate ../.., i.e. containing NetBeans sources. No particular module list - check out . (everything). Only clean checkout supported (unbuilt sources). Any output created in apisupportx/hgimport/out.
Daily builder Artifacts should include several demo Hg repos. These will be recreated each day, so for testing purposes only.
Status: all standard IDE clusters buildable. Javadoc builds. NBMs for stableuc build. experimental modules build. JNLP builds. Tests build.
Script functions:
- Module paths normalized to abbreviated CNB with common prefixes implied;
e.g. java/j2seproject -> java.j2seproject, core/progress -> api.progress, openide/loaders -> openide.loaders
- regular naming eliminates guesses about where module sources are
- flat structure should be more efficient, easier to search
- look at a preview
- Several repos created, such as:
- main with:
- all modules in standard distro
- perhaps all modules on stable UC
- basic build infrastructure and IDE branding
- test infrastructure
- contrib with additional buildable modules
- would for now expect to be checked out as subdir of main,
perhaps using "forest" extension
- www with */www (most importantly, testwww/www
- misc with unbuildable modules, ...anything else uncategorized
- truly obsolete stuff should be deleted from CVS trunk first
- .cvsignore files translated to .hgignore automatically
- new file paths should be automatically substituted in relevant files
- */build.xml, */nbproject/project.properties, ...
- need to check carefully for actual file paths,
e.g. not all occurrences of string image should be replaced with o.n.m.image!
- should produce as output a *.diff of all substitutions, for human review
- complete translation table should be saved for future use
- (for example, could be used by Sustaining to create an automatic translation script
for patches created in new structure to be backported or vice-versa)
- files with excessively long Hg storage paths are not imported
Source reorgs are conservative with respect to build system: module file paths change, but not basic build logic.
Files with disposition TBD
Performance tests
performance enterprise/performancetests j2ee/performancetests languages/performancetests mobility/performancetests uml/performancetests visualweb/test/performance web/performancetests
are now put in main/performance and subdirectories thereof. Oleg Khokhlov says this is OK.
Migrate history or not?
It is viable to import some CVS history into Hg.
Specifically, trunk revisions of text files will be imported. History of binary files will not be imported (just the last trunk revision). History of CVS branches will not be imported. History of files deleted in the CVS trunk will not be imported.
CVS branch names (e.g. release60) and tags applied to branches (e.g. release60-BLDwhatever) will not be imported. Trunk tags will be imported as Hg tags, though we will probably want to delete most of them later, since most are meaningless daily build tags. (In Hg, unlike CVS, tags are just part of versioned data, so deleted tags could be recovered if they were ever needed.)
This subset of history is enough to perform hg annotate and similar commands on files in the repository. You even get changesets (a single commit spanning multiple files) - though changes to binary files like icons, and file deletions, will not be represented.
Any text files mistakenly committed with CRNL line endings will also be converted to NL in the imported history.
.cvsignore files are not imported at all, since they are collected into a .hgignore file.
Importing full history would suffer from two problems:
- The hg convert command aborts if it gets too many revisions at once.
(This might be fixable using some tricks, but would be extra work.)
- The initial repository size would be much bigger,
making network clones slow and disk usage heavy. (Storing line-by-line diffs for text files which are present in the repo anyway only seems to add 110Mb of extra space for the main repository, which is reasonable.)
Windows-specific notes
Standard IDE known to build on Windows, but more testing is in order.
To clone the repo on Windows, you must be sure to use a relatively short base path (up to around 50 characters), because of the unresolved issue with long repository filenames (see Hg bug #839 below).
Developers must commit text files with NL line endings. Howeber, CRNL <-> NL translation on Windows is not on by default with Mercurial. It can be enabled by editing the config file (e.g. C:\Mercurial\Mercurial.ini) to turn on cleverencode and cleverdecode mode. Autodetects binary files based on NUL. Seems to work as expected (IDE builds and runs successfully, at least). Repository after import should be free of text files containing CRNL.
Will have an incoming hook on public repositories to reject changesets with bad line endings. Otherwise Windows developers who forget to configure the encoder can cause trouble. (See Hg RFE #882 below.)
Server
Users and authentication
We will want to import usernames (and passwords?) from the current netbeans.org auth DB.
One issue is what email address should be configured for the user as far as Hg is concerned. (This can be set in ~/.hgrc or $REPO/.hg/hgrc.) We want these to be predictable for purposes of disaster recovery, hg churn, etc. Probably jhacker@netbeans.org should be used in case jhacker is the login ID. (Or simply jhacker; Hg does not require an actual email address.) Tricky to enforce. One possibility is to set up an incoming changeset hook on the server which checks for non-*@netbeans.org addresses (or apparent usernames which are not in the database) in new changesets; if it finds any, send mail to those people asking them to configure Hg correctly.
Commit model
Main proposal is for staged propagation of commits through team branches, with test-based controls to prevent serious problems (e.g. uncompilable sources) in the master repository. Details: MercurialBuildInfrastructurePlan
TBD:
- What is granularity of "team"? 1-20 people?
- Because with Hg you have a local repo, a "team of 1" means:
commit at will, and once a day (or less) do a clean build w/ tests and commit. (In the morning! Not right before leaving work!) But no protection against developers who refuse to work this way.
- Who or what merges to master repo - dev lead? Automated process?
- What is the schedule for pushes to master? ASAP after tests pass?
Every Monday for team #1, Tuesday for team #2, etc.?
- When do bugs get mark fixed in the bug tracker?
When you commit to local repo? When you push to team server? When team pushes to master? Do you put changeset hash in bug database right away, or wait for some commit hook to add it for you?
- Does it really make sense to have a team repo clone,
or should we instead use named branches to do possibly unstable commits?
Commit notifications
Hg comes with some script for sending out changes. It does not support sending to different people/aliases based on paths of files changed. But this can be fixed pretty easily with a small patch to notify.py.
RSS
Could perhaps have e.g.
http://hg.netbeans.org/rss/main?modules=o.o.fs,o.o.loaders,...
which would provide personalized RSS feeds of changes in certain areas only.
Code reviews
Crucible currently lacks Hg support.
Review Board is O/S and has Hg support.
CIA
Should set up NB in CIA. Hg includes an extension to make this easy.
Performance
Repository size
Affects download (~ initial clone) time. Also affects local disk usage, though hardlinks help.
Rename problem
Hg (as of 0.9.5) increases storage whenever you move files. See Hg bug #883 below.
Because of this problem, we need to do big source reorgs (as in migration script) now, not later.
Binaries
Upgrading e.g. libs/xerces/external/xerces-2.8.0.jar to libs/xerces/external/xerces-2.8.1.jar will result in both binaries being kept in revlogs in the repo. Over time repo size will grow, perhaps substantially. (Initial binary size: 100Mb out of 500Mb total for main repo. In CVS repo, **/external contains 227Mb of historical versions. All **/*.jar/zip, dating to before external dirs, are 2563Mb!)
Solution: HgExternalBinaries
Scan time
With a lot of files, a simple hg status can take some time.
INotify
Linux users can use the INotify extension. Huge speed improvement for certain operations. But still buggy and not yet recommended.
You will however need to run:
sudo sh -c 'echo 32768 > /proc/sys/fs/inotify/max_user_watches'
to use it on the "main" repository due to the large number of directories. The above works for one session; to make it permanent:
sudo sh -c 'echo "fs.inotify.max_user_watches=32768" >> /etc/sysctl.conf'
A Windows port would be great.
Local disk space
Hardlinks
For Unix users, local clones use hardlinks to reuse .hg storage. The actual checkout must be a full copy (in case editors do not create new inodes when saving changes). Should also work for Windows users if using NTFS.
Network clones of course cannot use hardlinks. It is possible to minimize disk usage (and clone time!) when you are creating a local clone of a server clone (e.g. a release repo) and you already have a local clone of master.
Mirrors
Since Hg is fully distributed, it is possible to set up mirrors easily. After cloning from a mirror, you need to edit .hg/hgrc to set the default parent repo.
You can create a local mirror of master on your own computer, too. (Use hg clone -U to skip checkout of tip.) This can be useful: do all pulls into this local mirror, and do most clones starting from it.
Training
TBD.
Collect URLs for useful summaries, tips & tricks, etc. from other teams.
Need list of best practices. At least, unlike CVS, it seems that best practices should actually work once you understand them.
IDE Support
Documentation
Stuff from other Sun teams
Not externally available at the moment.
Scenarios
Actual tested sequences of Hg operations to accomplish certain tasks.
TBD. Need to collect and test one by one, incl. performance.
Repeated merges in repo clones and named branches
Make sure merge tracking is OK.
Merge from server clone to server clone
Can you merge changes from one server-side clone to another? Do you have to make a separate clone of each?
Seems to work fine and you can do a local clone. You need to use hg clone -r using the revision of master in effect at the time the server clone was made (use tags!!), then update from it. You do not get hardlinks between .hg dirs in this case (see RFE below).
Hg Bugs/Problems
P1: fix is critical for migration to succeed
The server should reject bad merges. Hg RFE #1038
Hg does not let you do a clone of part of the repo, or check out part of a full repo. Hg RFE #105 Hg RFE #515
P2: fix will likely be wanted for continued usage
Hg increases storage whenever you move files. Hg bug #883
Keeping binaries in the repository would mean a noticeable increase in repo size after every update to a new version of a third-party library. Currently working around; details: ExternalBinaries. But it would be much preferable to be able to logically keep binaries in the repository yet avoid pulling them until they are really needed: TrimmingHistory
There is no simple way to see what a merge changeset changed beyond what a "simple" automated merge would have done. Hg RFE #981
Frequent failures to push over HTTPS on Windows. Hg bug #1003
Merging a case rename on Windows can cause file deletion. as yet unfiled
Transplant seems to fail on Windows. Hg bug #1077
hg clean does not work correctly on Mac OS X. Hg bug #1097
There is no command to back out a bad merge. Hg RFE #1010
hg import does not work for some people. Hg bug #961
No extension in standard distro with "shelve" functionality. hgattic seems to work well, but few developers are going to download and enable a third-party extension.
hg revert -r ... on an uncommitted merge should issue a very strict warning. (The warning should explain to use hg up -C to start over.) Run by an inexperienced user, it can result in other changesets being "silently" backed out. as yet unfiled
P3: should be fixed as time permits
When using inotify, newly created build products appear as ? (unknown) until you touch .hgignore. Hg bug #884
The Windows setup wizard should offer to enable CRLF translation. Hg RFE #923
Semantics of .hgignore w.r.t. directories is unclear. Hg bug #886 Various other .hgignore problems have been encountered, including Hg bug #951 (now fixed in dev)
There is no simple way to see the patch that would result from all outgoing changesets. Hg RFE #28 and also relates to Hg RFE #219
Tagging a server repository (with no checkout) is unnecessarily complicated. Hg bug #916
You are not stopped from transplanting the same changeset twice in different clones. Hg bug #1210
Hg will try to delete an empty dir during update, even if that is your CWD, which can result in odd errors. as yet unfiled
kdiff3 does not work out of the box. Hg bug #1118
If you forget to add new files, neither hg ci nor hg fetch will warn you. Hg bug #1318
The parents log keyword requires --debug for full output. Hg bug #1435
hg transplant -s $repo $rev failed to import http://hg.netbeans.org/jet-main/rev/d1f97dcec7ca into core-main; complained that the revision could not be found. hg in -r $rev $repo worked fine. as yet unfiled
strace hg di -r release65_base o.n.bootstrap/src/org/netbeans/TopSecurityManager.java shows every file in the checkout being stat'd, for no apparent reason. as yet unfiled
hg bisect does not work well with merges. E.g. run
hg bisect -b de345c4fe13f hg bisect -g be85c2e1048d
and then mark any revision good if it contains nbbuild/test/unit/src/org/netbeans/nbbuild/data/Category.png and you wind up with c429f829c9f3 being blamed, though this is not a descendant of the good revision. as yet unfiled
Incorrect merge of file deletion & resurrection. Hg bug #1740 (Probably also cause of: merging produced a spurious conflict (remote changed, local deleted) for main #8041de5d29cd. Something involving repeated merges of a backout.)
hg heads --active shows heads on closed branches. Hg bug #1893
hg ann nbbuild/build.xml fails. Hg bug #1924
hg push sometimes requires --force when it should not. Hg bug #1974
P4: nice to have
You can download unversioned archives of a snapshot of sources from the web interface, but you cannot download the repository (in the preferred "bundle" format ~ *.hg). Hg RFE #713
No equivalent to INotify is available for Windows users.
Per-repository configuration such as username is not copied during local clone. Hg RFE #918
Performance of hg convert is poor when you want to HgHowTos#TransferAModuleSHistoryToAnotherRepository. Hg RFE #991
diff --reverse does not seem to work. as yet unfiled
hg ci visualweb.* fails if just some, but not all, visualweb.* modules are modified. Hg bug #1925
If you update a dirty checkout and .hgignore changes as a result, it is possible for previously ignored files (e.g. build products from a renamed module) to become no longer ignored, yet there is no warning that this has happened or that it might be safe to delete these files. as yet unfiled
Resolved: already fixed in development versions of Hg, or no longer relevant to NB
CRLF autodetection of binary files changed unpleasantly in Hg 1.0. Hg bug #1066 (fixed in dev)
hg fetch can silently include your password in a commit message. Hg bug #909 Prevented using push hook. (fixed in 1.0)
While you can translate LF <-> CRLF on the client, there is no standard server hook to prevent line ending accidents. Hg RFE #882 (The associated patch can be used without being accepted into Hg proper.) Fixed in Hg 1.0 and installed as push hook on server.
Merging in the presence of named branches can be confusing. Hg bug #756 We will not use named branches for now.
Some operations are unnecessarily slow on Windows. Hg bug #952 (fixed in 1.0)
Reverting a single file in NB can be slow. Hg bug #857 (fixed in 1.0)
The exit code of hg push is unreliable. Hg bug #989 (fixed in 1.0)
Specific error messages from server pre-push hooks are not displayed to an HTTPS client. Hg bug #937 (fixed in 1.0, patch applied to live server)
The order of merge parents for hg fetch is often misleading. Hg RFE #1011 (fixed in 1.0)
Merge restores deleted files if you had locally deleted them but did not commit the delete. Hg bug #988 (fixed in 1.0, working around with server hook to prevent damage from being propagated)
Hg does not prevent you from committing only some files from a merge. Hg bug #1049 (fixed in 1.0)
INotify extension is not bundled in the standard distro. Hg RFE #809 (in 1.0, still discouraged for use)
Diffing some files can be slower in Hg 1.0 than in 0.9.5. Hg bug #1090 (fixed in dev)
Renaming SubSet.java to Subset.java might cause an error for a Windows user trying to update sources. Hg bug #750 (fixed in dev) There are other potential problems with case collisions on Windows. Hg bug #593 (fixed in 1.1)
Misleading display of execute permissions. Hg bug #1042 fixed in 1.0 or 1.0.1
Long source filenames can become even longer in the repository: Hg bug #839. Worked around by preventing store paths > 206 chars from being pushed. (fixed in 1.1)
Don't accidentally name a package beginning with con rather than com or the repo may be corrupted for Windows users: Hg bug #793 (fixed in 1.1)
Git-mode diffs (better to look at renames) are not supported in the web interface. Hg bug #1258 (fixed in 1.1)
The INotify extension breaks pulling from a bundle. Hg bug #1436 (fixed in 1.1.1)
hg help templates to get immediate help on template keywords and filters. Hg RFE #1486 (being fixed in dev)
Should not be necessary to use a "gatekeeper" just to prevent clients from pulling changesets about to be rejected by server hooks. Hg bug #1321 (fixed in 1.2)
Client certificates are not supported for HTTPS authentication. Hg RFE #643 (supposedly fixed in dev)
Pulling over HTTP can occasionally abort if a complex push is in progress. Hg bug #1320 (fixed in 1.1)
hg fetch is much slower in 1.3. Hg bug #1752 (fixed in dev)
hg verify is reporting some issues in the NB main repository in cnd.repository and etl.project (no known problems resulting from these). Seems to get corrected by HgBug47098e67a01d scripts.
Some operations, especially hg clone -r, fail to use hardlinks aggressively enough; this can result in unnecessarily large repository storage when using multiple repositories. Hg RFE #919 (new relink extension in dev)
You cannot reliably ignore whitespace changes in diffs, e.g. wrapping code in a block. Hg bug #127 (fixed in dev)
Slow performance claimed on Windows when using filters. Hg bug #1384 no longer reproducible
hg fetch merges incorrectly when using the inotify extension. Hg bug #921 no longer reproducible
Diff hangs on certain patches involving renames. Hg bug #1947 (fixed in dev)
Hudson + Hg Bugs/Problems
Hudson does not support the Forest extension. Hudson RFE #1143
Hudson has difficulty linking to hgweb. Hudson bug #1038
Hudson should record the Mercurial ID of the repository in the build's metadata somewhere.
Attachments

