This proposal is intended to supersede MercurialBuildInfrastructurePlan and address the questions raised in MercurialBuildInfrastructurePlanQa.
Currently people push to the main repository much as they did under CVS. This creates several problems:
Mercurial never does implicit merges, so any push to a repository must be preceded by a merge in case there have been any intervening changes made since the base revision which the developer was working with. While the fetch command makes the merging fairly simple, during peak hours it is sometimes necessary to fetch more than once before a push.
The contention is caused in part by people being too eager to push after committing, but also in part by there being too many people pushing into one central location.
It is lamentably common for someone to not only commit a bogus change which includes uncompilable source code, but to immediately push it as well, breaking the repository for everyone. Furthermore, most people apparently fail to pay any attention to broken build emails, and often leave work right after pushing, leaving it to someone else to fix or back out their bad change.
This behavior has never been OK, and while using CVS we had written policies forbidding it. But it seems no one cares. Stricter enforcement is needed for such a large group of developers.
Similarly, it is common for changes to cause some tests to fail. In some cases, this is hard to avoid on the developer's side, since the continuous builders run a lot of tests that developers do not, including some that require nontrivial setup (e.g. Glassfish, WTK).
While not as severe as a broken build, an unstable build is of questionable quality. It is undesirable for breakages caused by one team to create a negative impact on other teams.
The obvious solution to these problems is to adopt per-team workspaces. Each developer would then push only to his or her own team's repo. If that repo is broken by changes, only fellow team members are affected. They are most likely to pressure the culprit to fix the problem, or know what to do in case the culprit has left work for the day.
An automated system would then periodically pick up changes from various teams and integrate them into the main repository after verifying that they do not cause any apparent problems.
There should be a process running on a single server (say, deadlock.netbeans.org) which repeatedly runs the following:
The suggested setup in Hudson is to treat the script above as a project and schedule it to run every ten minutes (for example). If there was nothing to do and it finished in one minute, Hudson would wait for nine minutes before trying again; this prevents constant network access even over holidays when nothing is changing. If there were any changes to process, which would surely make it take much longer, Hudson would simply queue up one new instance of the project, which would start running as soon as the previous one finished.
Under this scenario, if everyone pushes to team repos, main should never get broken. In practice, there are reasons for some direct pushes to main, and on occasion these will cause build errors. In such a case, the merge server just does nothing until main is fixed by direct push.
There could still be a read-only main-golden repository which is always guaranteed to be buildable. But this is useless for the merge server; it needs to synch with main.
There will likely be fewer manual merges necessary in such a scenario. A developer will only have to merge with changes made by other team members, and a few automated pulls from main per day. Even if most developer pushes still require a merge, at least it is unlikely that they will require repeated merges to succeed.
An automated merge or two will be created per successful team -> main integration.
For this proposal, the only real requirement on the nature of team repositories is that they be accessible from the builder machine, generally meaning outside SWAN somewhere, and that an account on the builder machine have both pull and push access to them.
Most teams would likely prefer to have their repositories hosted on hg.netbeans.org for visibility and consistency of developer access control, though a team repo could be hosted in a geographically closer location if necessary. hg.netbeans.org should also be backed up consistently, though for Mercurial this is not much of an advantage.
The team repo should use the same pre-push hooks as main does, so that for example a developer accidentally pushing a file using CRLF line endings will be blocked immediately at the team level.
It is both possible and encouraged for team repos to have their own continuous builds. These need not be run on the same server (in fact it might be necessary to run them elsewhere for performance reasons).
Whereas the primary purpose of the server system described above is to pick up changes from teams to pull into main and push changes in main back out to teams, for which checking compilability and basic tests is just a necessity, a team continuous build would have as its primary purpose informing developers early of mistakes. Therefore it could run less stable tests, or slower tests of special interest to the team. For example, it is likely that the CND team's continuous build should run CND functional tests in GUI mode.
The merge server's correctness does not depend on team continuous builds being operational or checking anything in particular, so it is acceptable for teams to maintain their own continuous builds if they wish to retain more control over them.
The merge server is not intended to replace automated builds which run more complete test suites. In particular, its builds must be fairly quick, and fully reproducible.
QE will likely still want to run other automated builds which go through comprehensive test suites and report on the status of individual tests across builds, as well as across different platforms and operating systems.
Tests in a full suite might not be 100% reproducible. GUI mode tests are usually not completely reliable. Even some unit tests rely on thread or garbage collection semantics.
Examples of quick and repeatable tests suited for the merge server:
If a team keeps its repo in good condition, fixes pushed to it should propagate to main within a few hours. This lets QE validate bug fixes using continuous builds from main. It would also be possible to publish binaries from unsuccessful team merge attempts which succeeded in building the IDE but failed some subsequent tests; this would let QE also validate fixes made to an unstable team repo quickly.
It may happen that a change which is perfectly fine inside a team repo is unusable when merged with main. For example, a team member might have begun using some API which was just changed by another team.
Changes in a team repo may also literally conflict with changes in main, so that an automated merge cannot succeed, much less pass a build and tests.
In either case, it is simply up to the team members to resolve the problem. In some cases it may be possible to make a locally safe change in the team repo which has the effect of making it also safe to merge with main. But more commonly it will be necessary to actually merge with main and manually deal with the conflict. In this case, some team member just needs to pull from main, merge, solve the problem by editing sources somehow, commit, and push back to the team repo. The next round of the merge server to check this team's repo will verify whether the conflict was solved properly. Since Mercurial "knows" which changesets have been merged in a repo, it should not be a problem for a team repo to have independently merged in changes from main.