ParsingAPIRequirements

Revision as of 15:24, 5 November 2009 by Admin (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)

Parsing API - Usecases, Requirements

  1. Mixing languagages that have almost no relations (Victor Vasilyev): For example HTML, XML, SVG embedded in PHP. Or Java, JavaScript, ... embedded in Velocity template. More...
    1. Format of embedded content can be defined by project.
  2. Establish processing phases for sources embedded in other sources (Victor Vasilyev): After each phase a virtual source for the next processing phase should be generated. More...
  3. Conditional processing (Victor Vasilyev): If a source for the phase declares conditional branches then a resulted virtual source will have variants that are depended upon the control flow. More...
  4. "API must be well documented. Developer documentation is a must." (Peter Nabbefeld): More..


Indexing - Requirements

  • I.1 - (Issue 152541) The first time you open the IDE on a new project, the indexing system should scan through all the source files and index them
  • I.2 - (Issue 152541) On subsequent startups, only files that have changed since the last IDE session should be indexed again.
  • I.3 - (Issue 152541) The implementation should handle "index cleanup", e.g. removing data associated with files that are deleted or renamed. Similarly, at startup, if the IDE notices that files have been deleted (outside of the IDE), the data associated with these files should be removed as well.
  • I.4 - (Issue 152541) The indexing system should allow multiple indexers to process a given file. For example, an RHTML/ERB file should be scanned not just by the Ruby indexer for the Ruby code, but by the JavaScript indexer for potential JavaScript inside <script> blocks, and possibly by the CSS and HTML indexers (if any) as well. As part of this, indexing must support indexing of embedded files as well.
  • I.5 - (Issue 152541) Indexer should be able to skip files they deem unimportant. For example, the JavaScript indexer might want to skip generated GWT JavaScript files, or optimized (compressed) JavaScript files.
  • I.6 - (P3) The indexing system must support "preindexing". This allows us to index libraries (such as the Ruby and Rails libraries, the JavaScript core libraries, and so on) in advance, and ship this binary data along with the IDE. When the indexing system notices it is about to index a source root it has preindexed data for, it should just substitute the preindexed data instead. Ruby on Rails indexing used to take several minutes at startup; with preindexing it dropped to less than a second.
  • I.7 - (filesystems) Ability to index not just files on disk, but also files in .jar files, and files in the System File System
  • I.8 - (the same as I.5) The indexing system should support the ability for different languages to scan different source roots. For example, in a Web project, JavaScript should only scan the web folder, and Java only the src folder.
  • I.9 - (Issue 152541) The indexing system should support isolation, such that each language has its own unique Lucene repository tree it can use without interference from other languages.
  • I.10 - (Issue 152541) The indexing system should support versioning, such that
    1. The language plugin can rev its version number whenever it changes the index data incompatibly, and the indexing system will force reindexing
    2. The indexing infrastructure can rev its own version number whenever it changes the index data incompatibly (for example, by changing to a new version of Lucene) and the indexing system will force reindexing.
  • I.11 - (Issue 152541) The indexing system should be tied into the parsing infrastructure in such a way that any attempt to query the index when the file has been changed, will immediately parse and then run indexing on the file first such that the query will process up to date indexing data.
  • I.12 - (filesystems) Startup indexing must be very fast for projects that have already been indexed in the past. For this reason, the Java and GSF indexers operated on java.io.File instead of FileObject, and indexing decisions are not based on mimetypes. I see that Jan Lahoda has filed an issue on this, so perhaps using mimetypes won't be prohibitively expensive in the future.
  • I.13 - (Issue 152541) The query system should let you query by exact name, prefix, regular expression, or camel case.
  • I.14 - (P4) The query system should let you (optionally) control which keys you want loaded into the document. (This is for performance reasons; if you only plan to look at a few keys, there is no need to load all the other ones).
  • I.15 - (Issue 152541) The Java module has some pretty advanced indexing needs with its .sig files, dependency tracking etc. It must be possible for the Java module to use the Parsing API and provide its own indexing implementation independent of the Indexing API. This is probably obvious, but I'm including it for completeness since with this single exception, I would like the Parsing API and Indexing API to be joined at the hip: "The Parsing and Indexing API".

Performance Requirements

  • Perf.1 - (GSF) Reduce duplicated classes - remove gsfret clone, gsf based on common ClassPath.
  • Perf.2 - (GSF) Reduce number of threads - remove GSF Working thread and GSF RepositoryUpdater thread.
  • Perf.3 - Prevent RequestProcessor thread to be woken each 2 seconds.
  • Perf.4 - Build index of file paths - used by Go To File.
  • Perf.5 - (GSF, tasklist) Sources are crawled just once by single thread, reduces IO cache trashing and disk seeking - remove GSF RepositoryUpdater, task list uses indexing API.
  • Perf.6 - (P3) Do NOT index already closed projects.
  • Perf.7 - Wait until all projects are opened, bulk mode.
  • Perf.8 - An access to the ClassIndex should work during initial scan IZ 158176
Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo