Migrating CND Editor to new APIs



This document is used to discuss CND Editor usecases and the way to move on new APIs

Language embedding and tokenizing

Q: One token or multipe?
A: One token per embedding/inlined block

Q:Who is responsible for tokenizing and mapping identifiers to keywords? Plugg-ins?
A:General lexer for C/C++. Lexer for C++, Lexer for C to map ID->Keyword. Maps are configured using InputAttributes (stored in document properties)

Q:How to find the end of inlined language for difficult cases?
A:delegate recognition of the end of the block to registered handler

interface InlinedLanguageHandler {
   /** checks wheither a token of a top lexer corresponds to the start of supported inlined language
      * i.e. "asm" token for inlined ASM or "sql" token for inlined SQL or "#" for preprocessor
   boolean isStartToken(Token token);

    *   eat characters from lexer input till the end of supported inlined language block.
    *  method is called by top lexer to delegate inlined block boundaries recognition
    * @return end position of inlined block or -1 if not recognized (i.e. "sql" was not followed by "exec" command for inlined SQL)
   int skipInlinedLanguage(LexerInput input, InputAttributes atrs);

Preprocessor influence on tokenizing content

int a = 10;

int is identifier recognized as keyword

ble b = 1.0;

double must be one token as well. But what about flightweight of such token instance? Handle differently from normal ones?
A: Base general lexer eats "\" followed by CR
A: in case of line continuation inside token create non-flightweight one
Q: may be embedding for "\" in tokens?
A: there is a possibility to create token with properties

preprocessor directives

#define X(a) \
int a = 10;\
int a##1 = 11;\
int a##2 = 12;

should the "#... 12;" be one token on top hierarchy? Preprocessor token is tokenized by another lexer? Line continuation should be preserved in this case?

Suport for different #pragma sections

Different pragma sections could have different meanings. Need the possibility to register handlers

#pragma omp parallel for shared(array, array1, array2, dim) private(ii, jj, kk)
for (ii = 0; ii < dim; ii++) {
  for (jj = 0; jj < dim; jj++) {
    for (kk = 0; kk < dim; kk++) {
      array[[Ii | ii]][[Jj | jj]] = array1[[Ii | ii]][[Kk | kk]] * array2[[Kk | kk]][Jj];

#pragma omp section. If correspondent omp support is installed there should be delegating for syntax coloring id->keyword.
A: use new Highlighting SPI and color recognized tokens

Inlined assembler

Q:One token with embedding or multiple tokens? Don't want to create own rules, there must be asm-lexer to delegate tokenization.
A:One token for embedded block + registered handler

Basic inline:

asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
 '''asm''' ("movl %eax, %ebx\n\t"
          "movl $56, %esi\n\t"
          "movl %ecx, $label(%edx,%ebx,$4)\n\t"
          "movb %ah, (%ebx)");


        int a=10, b;
        asm ("movl %1, }}eax; 
              movl }}eax, %0;"
             :"=r"(b)        /* output */
             :"r"(a)         /* input */
             :"%eax"         /* clobbered register */
        '''asm''' '''volatile'''(
                      "   lock       ;\n"
                      "   addl %1,%0 ;\n"
                      : "=m"  (my_var)
                      : "ir"  (my_int), "m" (my_var)
                      : /* no clobber-list */
// Compute the tangent of x
real tan(real x)
       fld     x[EBP]                  ; // load x
       fxam                            ; // test for oddball values
       fstsw   AX                      ;
       sahf                            ;
       jc      trigerr                 ; // x is NAN, infinity, or empty
                                         // 387's can handle denormals
SC18:  fptan                           ;
       fstp    ST(0)                   ; // dump X, which is always 1
       fstsw   AX                      ;
       sahf                            ;
       jnp     Lret                    ; // C2 = 1 (x is out of range)
       // Do argument reduction to bring x into range
       fldpi                           ;
       fxch                            ;
SC17:  fprem1                          ;
       fstsw   AX                      ;
       sahf                            ;
       jp      SC17                    ;
       fstp    ST(1)                   ; // remove pi from stack
       jmp     SC18                    ;
   return real.nan;

Embedded SQL

Again, one token or multiple? How? Lookahead will be too big, because there could be any number of spaces between EXEC and SQL

   ME" INTO :n FROM staff WHERE name='Sa\
   int main() {

A: Handler for "exec" identifier is responsible for handling end of block


When new line is typed in editor indentation engine is asked to indent next line(s).

Delegation of indentation

There could be different settings/rules about indentation for embedded languages. Don't want to handle everything in one place. How to delegate indentation to embedded languages associated indentator?

Delegator should provide callback about current/last/base indentation position?

#if A
#  if B
int ab = 1;
#  else
int ab = 0;
#  endif

A: Communications between engines are through Document's properties

Preprocessor branches

#ifdef __DEBUG
if (deep_check()) {
if (check()) {
   int a = 0;

The problem is to detect indent after each line when press enter. Typing # in the first position of line should reindent it accordingly from current language-position to preprocessor-indent position.

Could be problems with pairing opening "{" and closing "}" to prevent inserting new unbalanced curly.
A: Not clear what to do


Delegation of formatting

How to delegate formatting to embedded languages associated formatter? A: Communications between engines are through Document's properties


We are trying to distinguish C and C++ langs by file extensions (CDataObject, CCDataObject), because some C++ keywords are not C keywords and we'd like to have them as identifier + there could be different compiler specific extensions (GNU extensions) for both C and C++. Could this be solved by attributes passed to lexer?

The real problem is with header files. We have one HDataObject and do not distinguish C and C++ in them. Always C++ is used. The mime-type for Headers the same as for C++ sources. But there is a possibility to create "New C Header" through "New" template wizard. This file is created i.e. with "h" extension but different content than "New C++ Header". How not forget about c-style vs. c++-style chosen on create time? After reloading IDE only DataLoader/MIMEResolver has a chance to detect type of file.

A: Ask someone in core

Code Completion

We'd like to have more phases for "Press Ctrl+Space" to improve responsiveness.

  • First Ctrl+Space on empty context usually shows only file local content (very fast).
  • Second Ctrl+Space shows all above + everything from current Project context (usually also not very slow)
  • Next Ctrl+Space shows all above + content of all used libraries (usually quite slow)

A two modes are maximal (one is better) + IZ#122012:low performance of completion

  1. 1 prevent sort
  2. 2 prevent displaying on items

re #1 in fact we can not skip this phase, because quick sort is really "quick" on sorted collections (just O(n)) and no reasons to introduce additional complexity for SPI implementers
re #2 infrastructure introduce method for SPI implementers CompletionResultSet.setHideExtraItems(boolean). SPI implementers put all it's items in result set and call this method.
If none of providers set flag => threshold is the number of all items.
Infrastructure sorts all items and after sorting will display only threshold elements and item "More..." as the last item (in case of flag ON). All calculations are done based on this threshold elements.

Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo