CNDEditor61
Migrating CND Editor to new APIs
Contents |
Goal
This document is used to discuss CND Editor usecases and the way to move on new APIs
Language embedding and tokenizing
Q: One token or multipe?
A: One token per embedding/inlined block
Q:Who is responsible for tokenizing and mapping identifiers to keywords? Plugg-ins?
A:General lexer for C/C++. Lexer for C++, Lexer for C to map ID->Keyword. Maps are configured using InputAttributes (stored in document properties)
Q:How to find the end of inlined language for difficult cases?
A:delegate recognition of the end of the block to registered handler
interface InlinedLanguageHandler {
/** checks wheither a token of a top lexer corresponds to the start of supported inlined language
* i.e. "asm" token for inlined ASM or "sql" token for inlined SQL or "#" for preprocessor
*/
boolean isStartToken(Token token);
/**
* eat characters from lexer input till the end of supported inlined language block.
* method is called by top lexer to delegate inlined block boundaries recognition
* @return end position of inlined block or -1 if not recognized (i.e. "sql" was not followed by "exec" command for inlined SQL)
*/
int skipInlinedLanguage(LexerInput input, InputAttributes atrs);
}
Preprocessor influence on tokenizing content
int a = 10;
int is identifier recognized as keyword
dou\ ble b = 1.0;
double must be one token as well. But what about flightweight of such token instance? Handle differently from normal ones?
A: Base general lexer eats "\" followed by CR
A: in case of line continuation inside token create non-flightweight one
Q: may be embedding for "\" in tokens?
A: there is a possibility to create token with properties
preprocessor directives
#define X(a) \ int a = 10;\ int a##1 = 11;\ int a##2 = 12;
should the "#... 12;" be one token on top hierarchy? Preprocessor token is tokenized by another lexer? Line continuation should be preserved in this case?
Suport for different #pragma sections
Different pragma sections could have different meanings. Need the possibility to register handlers
#pragma omp parallel for shared(array, array1, array2, dim) private(ii, jj, kk)
for (ii = 0; ii < dim; ii++) {
for (jj = 0; jj < dim; jj++) {
for (kk = 0; kk < dim; kk++) {
array[[Ii | ii]][[Jj | jj]] = array1[[Ii | ii]][[Kk | kk]] * array2[[Kk | kk]][Jj];
}
}
}
#pragma omp section. If correspondent omp support is installed there should be delegating for syntax coloring id->keyword.
A: use new Highlighting SPI and color recognized tokens
Inlined assembler
Q:One token with embedding or multiple tokens? Don't want to create own rules, there must be asm-lexer to delegate tokenization.
A:One token for embedded block + registered handler
Basic inline:
asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
'''asm''' ("movl %eax, %ebx\n\t"
"movl $56, %esi\n\t"
"movl %ecx, $label(%edx,%ebx,$4)\n\t"
"movb %ah, (%ebx)");
Extended:
int a=10, b;
asm ("movl %1, }}eax;
movl }}eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
'''asm''' '''volatile'''(
" lock ;\n"
" addl %1,%0 ;\n"
: "=m" (my_var)
: "ir" (my_int), "m" (my_var)
: /* no clobber-list */
);
// Compute the tangent of x
real tan(real x)
{
asm
{
fld x[EBP] ; // load x
fxam ; // test for oddball values
fstsw AX ;
sahf ;
jc trigerr ; // x is NAN, infinity, or empty
// 387's can handle denormals
SC18: fptan ;
fstp ST(0) ; // dump X, which is always 1
fstsw AX ;
sahf ;
jnp Lret ; // C2 = 1 (x is out of range)
// Do argument reduction to bring x into range
fldpi ;
fxch ;
SC17: fprem1 ;
fstsw AX ;
sahf ;
jp SC17 ;
fstp ST(1) ; // remove pi from stack
jmp SC18 ;
}
trigerr:
return real.nan;
Lret:
;
}
Embedded SQL
Again, one token or multiple? How? Lookahead will be too big, because there could be any number of spaces between EXEC and SQL
EXEC SQL SELECT "NA\
ME" INTO :n FROM staff WHERE name='Sa\
nders';
int main() {
}
A: Handler for "exec" identifier is responsible for handling end of block
Indentation
When new line is typed in editor indentation engine is asked to indent next line(s).
Delegation of indentation
There could be different settings/rules about indentation for embedded languages. Don't want to handle everything in one place.
How to delegate indentation to embedded languages associated indentator?
Delegator should provide callback about current/last/base indentation position?
#if A # if B int ab = 1; # else int ab = 0; # endif #endif
A: Communications between engines are through Document's properties
Preprocessor branches
#ifdef __DEBUG
if (deep_check()) {
#else
if (check()) {
#endif
int a = 0;
}
The problem is to detect indent after each line when press enter. Typing # in the first position of line should reindent it accordingly from current language-position to preprocessor-indent position.
Could be problems with pairing opening "{" and closing "}" to prevent inserting new unbalanced curly.
A: Not clear what to do
Formatting
Delegation of formatting
How to delegate formatting to embedded languages associated formatter? A: Communications between engines are through Document's properties
DataObjects
We are trying to distinguish C and C++ langs by file extensions (CDataObject, CCDataObject), because some C++ keywords are not C keywords and we'd like to have them as identifier + there could be different compiler specific extensions (GNU extensions) for both C and C++. Could this be solved by attributes passed to lexer?
The real problem is with header files. We have one HDataObject and do not distinguish C and C++ in them. Always C++ is used. The mime-type for Headers the same as for C++ sources. But there is a possibility to create "New C Header" through "New" template wizard. This file is created i.e. with "h" extension but different content than "New C++ Header". How not forget about c-style vs. c++-style chosen on create time? After reloading IDE only DataLoader/MIMEResolver has a chance to detect type of file.
A: Ask someone in core
Code Completion
We'd like to have more phases for "Press Ctrl+Space" to improve responsiveness.
- First Ctrl+Space on empty context usually shows only file local content (very fast).
- Second Ctrl+Space shows all above + everything from current Project context (usually also not very slow)
- Next Ctrl+Space shows all above + content of all used libraries (usually quite slow)
A two modes are maximal (one is better) + IZ#122012:low performance of completion
- 1 prevent sort
- 2 prevent displaying on items
re #1 in fact we can not skip this phase, because quick sort is really "quick" on sorted collections (just O(n)) and no
reasons to introduce additional complexity for SPI implementers
re #2 infrastructure introduce method for SPI implementers CompletionResultSet.setHideExtraItems(boolean). SPI
implementers put all it's items in result set and call this method.
If none of providers set flag => threshold is the number of all items.
Infrastructure sorts all items and after sorting will display only threshold elements and item "More..." as the last
item (in case of flag ON). All calculations are done based on this threshold elements.

