(See JavaScanningTest for 7.1 measurements)
I'm going to do some testing of Java scanning in Beta Dev build of NetBeans IDE 7.0 and will update these notes as I go. I'm starting with clean userdir and with this dev build:
Product Version = NetBeans IDE Dev (Build 101129-b6130c56ca59) (#b6130c56ca59) Operating System = Linux version 2.6.35-23-generic running on amd64 Java; VM; Vendor = 1.6.0_22; Java HotSpot(TM) 64-Bit Server VM 17.1-b03; Runtime = Java(TM) SE Runtime Environment 1.6.0_22-b04 Java Home = /usr/lib/jvm/java-6-sun-220.127.116.11/jre
and I'm running the IDE with
as suggested in issue 177950 to get more detailed logging.
Test 1 - opening and closing projects
I'm following instructions in issue 169252 and starting with opening web.jsf project. Initial scanning took:
Complete indexing of 610 source roots took: 644022 ms (New or modified files: 18148, Deleted files: 0) [Adding listeners took: 1747 ms]
Beta2: Complete indexing of 605 source roots took: 584011 ms (New or modified files: 18203, Deleted files: 0) [Adding listeners took: 1114 ms]
The number itself does not say anything. It may be perfectly OK. It is relative and that's not what I'm after. web.jsf has quite big list of classpath dependencies and scanning all of them might just need that much time. I will test that assumption later. Let's try to restart IDE and see how long it will it take to reload indexed info. That ideally should be quick:
Complete indexing of 610 source roots took: 279685 ms (New or modified files: 0, Deleted files: 0) [Adding listeners took: 1952 ms]
Beta2: I'm getting quite consistently two times: T 000:00:52.288 T 000:00:51.793 T 000:00:40.538 or T 000:01:38.709 T 000:01:36.411 T 000:01:38.702 T 000:01:51.139 It is much better in Beta2 but there seems to be an issue randomly causing the indexing to take twice more time.
Well, that's not a good result. It's almost half the time of full scanning. Let's 'Go to Type' AntProjectHelper and open it. The response is instant. Great. CTRL+SHIFT+1 and open project.ant module. It is one of the dependencies so no scanning should be necessary:
Complete indexing of 2 source roots took: 4842 ms (New or modified files: 40, Deleted files: 0) [Adding listeners took: 9 ms] newRootsToScan(2)= file:/home/dev/main/project.ant/antsrc/ file:/home/dev/main/project.ant/test/unit/src/
Pretty good. Let's close both of the projects. For some reason closing of both projects triggers scanning although IDE is now completly empty - no project, no file in editor, everything is closed. Scanning took:
Complete indexing of 611 source roots took: 248263 ms (New or modified files: 0, Deleted files: 0) [Adding listeners took: 5 ms]
Beta2: Scanning is much faster but still happens: Resolving dependencies took: 2,045 ms Complete indexing of 44 binary roots took: 205 ms Complete indexing of 419 source roots took: 9893 ms (New or modified files: 0, Deleted files: 0) [Adding listeners took: 550 ms]
That's wrong and useless. First, all these projects were scanned couple of minutes ago and I have not changed single file so IDE should not need to recheck all of them again. Second, I closed the projects so I will not need this parsing information. From logging (see log1) it looks like PATHS_REMOVED was correctly received but for some reason all previous dependencies are kept and rescan.
Now, let's try to reopen closed projects again. IDE was not restarted so it should have knowledge that all source folders were scanned about 1 minute ago and are up to date. Let's try Open Recent Project: projects.ant. Scanning took (label has changed after a while to "Refreshing indicies"):
Complete indexing of 118 source roots took: 31835 ms (New or modified files: 0, Deleted files: 0) [Adding listeners took: 233 ms]
Beta2: Scanning is much faster but still happens: Complete indexing of 75 source roots took: 3219 ms (New or modified files: 0, Deleted files: 0)
which is followed by tone of INFO exceptions. I'm not sure how relevant they are but see attachment (exp1) and they finished with:
Finished RefreshCifIndices@6bde06[followUpJob=false, checkEditor=false indexer=TaskListIndexer/2 () in 21,000 ms with result Done
Let reopen also web.jsf:
Complete indexing of 493 source roots took: 196282 ms (New or modified files: 0, Deleted files: 0) [Adding listeners took: 598 ms]
I know that web.jsf has lots of dependencies but figures of opening and closing and reopening projects are quite shocking or unusable - 4 minutes after projects were closed; 4 minutes after they were reopen again. That explains to me why I can see scanning progress very frequently because I thought that in order to make IDE and scanning faster it would be better to close projects which I do not need and reopen ones I work on and try to keep number of open projects between 5 to 10. But in case of projects like web.jsf this does not make any difference. And for NetBeans EE team most of the projects we work with have similar dependencies to web.jsf.
Test 2 - Direct versus Indirect dependencies
Summary of the problem
Let me start from the beginning just to double check my understanding of the problem. Please correct me where necessary.
Let say for example that:
- compilation classpath of Module A contains Module B; and
- compilation classpath of Module B contains Module C; and
- compilation classpath of Module C contains Jar D;
that means that:
- B is direct dependency of A; and
- C is direct dependency of B; and
- D is direct dependency of C; and
- C and D are indirect dependencies of A; and
- D is indirect dependency of B
Let say we opened project A in the IDE. Both direct and indirect classpath is being indexed, that is A, B, C, D.
Jesse's argument is that only direct classpath indexing is necessary. He says "Accuracy is important (1) for code actually in open projects, (2) for signatures which might affect error badges on open projects (i.e. direct deps)". I do agree with that. If Module A is opened why Module C and D should be scanned when user has not shown any intention to use them?
Honza and Tomas's argument is that scanning of indirect dependencies is necessary. Two examples were given:
- Example 1) if a class from direct dependency is opened in the IDE, for example class Foo from Module B then Java features like Code Completion, Goto Source, etc. require that all direct dependencies of Module B (that is A's indirect dependencies) are indexed. There is couple of possible solutions:
- Answer A) Module B is not opened in the IDE and therefore Goto Source, Code Completion, etc. may not work properly and user has to open Module B first. I think such answer is wrong and would force user to do steps which IDE should do automatically so I do not consider this option
tzezula: I agree this is no option as it makes the IDE useability much worse.
- Answer B) trigger scanning of Module B's direct dependencies when first class from Module B is opened in editor.
tzezula: There are two problems with this solution. First the semi-"random" scans. The first scan is shorter but these scans are invoked each time when you navigate to some direct dependent type and makes the user complain that IDE does nothing else than scanning (the users are not complaining about the length of initial scan but about the strange scans invoked during IDE usage). The second problem is the total number of scanned roots which may be twice as big as currently (Java is a statically strongly typed language -> when you navigate to direct dependent module you need to scan its direct dependencies (the indirect dependencies of top project) and completely rescan the module into which you navigated to fix the error types caused by missing dependencies). jlahoda: (re tzezula) The point exactly: currently after the scan is done, the navigation should not cause further scans. With the proposed change, navigation could cause scanning (possibly a very long one, if the newly opened file belongs to a big source root, or to a source root with many direct dependencies). Further navigation may or (more likely) may not work during this scanning, as there may or may not be enough data to perform the navigation. So this basically would force the user to interrupt the work "randomly" and busy wait for the IDE.
- Example 2) GoTo Type would not offer classes from indirect dependencies. Honza says "I, for one, quite often open classes from un-opened projects via Go to Type, and I do not really care if the class is in direct dependency of in indirect dependency. (I typically try to type the class name, and in the rare case the containing source root is not accessible transitively from the opened projects, I open it)". I'm doing this frequently myself but I consider this more of a workaround for a missing feature: what I really want is to tell IDE where all my projects are regardless of dependencies and which project I have open and then offer in GoTo Type all of them. Due to lack of this feature the workaround is to open one of top level projects and then GoTo Type will work in most of the cases. Would be good if it could work always, that is even in case of Type which is not in direct nor indirect dependencies of opened projects.
tzezula: The list of classes can be produced only by scanning or building+processing. The scanning can be done on server and the index can be downloaded. This needs VCS + build server + parsing.api cooperation. The parsing API already allows to download index for given root from an URL. jlahoda: prototype of remoting exists here ("remoting" folder in the checkout). Contact me if you want to know more.
I looked at web.jsf project example and collected some data. Attached document lists direct and indirect dependencies for web.jsf/src and web.jsf/test/qa-functional/src/. There is 6 reports - three for web.jsf and three for functional tests of web.jsf. Direct and indirect dependencies are listed first followed by simplified tree of these dependencies.
Here is my conclusion. Scanning of indirect dependencies does not scale. The higher in dependency tree a project sits the longer the scanning will be regardless of the size of its direct dependencies. And regardless of number of opened projects. In my experience and confirmed by Denis this can take up to 10 minutes which is not acceptable.
tzezula: 10 minutes only with empty user dir, after it's done once you don't do full scan any more. Idea A, Idea B and Idea C explained above the problems like "random" scans and complete number of scan time is 2* current time. Idea D - we are going to add the logging. Idea E the low level thread does not exist on modern OS as the priority is dynamic this would require FS.runBackgroundIO API in file systems as the modules like java, j2ee, php, etc are doing the IO. It will also need a custom implementation of Lucene Directory which will use the FS.runBackgroundIO. Note Y - there is some "remedy" we added into NB 7.1, the possibility to download an existent index. If the build machine creates the index for you when you open the project you just download the index from a remote URL.
Some ideas how this could be improved:
- Idea A) give scanned roots different importance and handle them accordingly. For example any direct dependency or open project has scanning priority P1, indirect dependencies has P2 priority; scanning of any other sources is P3 (not sure there is such case); P2 and P3 scanning should be done on background in low priority task not blocking anything; some scanning could be done max once a day(??);
- Idea B) consider depth of indirect dependencies. Looking at the tree of dependencies of web.jsf/src the deeper you go in indirect dependencies the less likely it is they will be required by user and so their scanning should be done later or with lower priority
- Idea C) based on size of dependency graph decide on different strategies; for small graph index everything in one go; for large graphs apply scanning priorities or simply do not scan everything or scan indirect dependencies only into certain depth
- Idea D) Try experimentally different approaches to scanning and record how often scanning is done, why and how long it took. That could provide us with a data based on which we could tune it up or decide in favor of a particular solution.
- Idea E) have dedicated low priority thread to index everything what has not been indexed yet so that GoTo Type works.
- Note X) The nature of features depending on Java indexing is that they must handle gracefully state when scanning data are not ready yet. So everything should be in place for "scanning on demand" mode.
- Note Y) Nothing is going to help much in case of web.jsf functional test. From the tree of its dependencies it looks like it is very shallow. Is it right or have I made a mistake in generating this tree? Most of the dependencies seems to be coming directly from web.jsf/test/qa-functional/src/ and indirectly from web.kit/test/qa-functional/src/.