NetBeans project called Generic Languages Framework allows you to define a programming language and integrate it to NetBeans IDE. In the first step you should describe the language - lexical part (define tokens using regular expressions) and syntax (grammar rules). In the second step you can define how to visualize this language in NetBeans. You can define colors for tokens (syntax coloring) or folding (based on grammar rules). You can define what parts of this language should be displayed in the navigator, and how to indent this language plus many other features.
This tutorial should guide you through the process of creating new NetBeans module, adding languages support into it, describing lexical and syntax structure of your language and adding support for all standard editor # features for your language.
All NetBeans distributions contains support for creating NetBeans modules. So user can easily extend NetBeans IDE by his own plug-in modules. Use File / New Project action from main menu to create new NetBeans Module project:
Next page of "New Project" tutorial allows you to specify name and location of your project:
Define root package name and module name on the third page:
New project is created when you press Finish button.
We should add Language Support into your project now. Select root node of your project and New > Other action from pop up menu. New File wizard is opened. Select Module Development / Language Support template:
Select mime type and file extensions for your language:
This wizard generates mime-resolver.xml file, languages.nbs file and some registrations into your XML Layer.xml file. mime-resolver file defines contract between mime type and file extensions. language.nbs file contains description of your language (tokens, grammar and all features).
language.nbs file generated by New File wizard contains some example definition for some artificial language. Feel free to delete all the content of this file, or use this content as a start base for your experiments.
You can build and install your module now. Select root node of your module and invoke "Build" and "Install/Reload in Development IDE" actions. Your plug in is installed into the NetBeans.
Open Window / Favorites and create some example file now (Example.foo). File is opened in editor and you can edit it as plain text file.
We can start to modify language definition file now. You should open life version of your language definition file, if you would like to see your changes immediately without being slowed down by build / reinstall cycle:
You can change definition of your language now, and all changes will be automatically applied to all opened files recognized by your language (Example.foo) immediately after you save language definition file.
Add some content to your example file (Example.foo) first:
while identifier 1234 "string"
Now we can start describing tokens of our language:
TOKEN:keyword: ("while" | "if" | "else" | "var")
TOKEN:identifier: ( ['a'-'z' 'A'-'Z'] ['a'-'z' 'A'-'Z' '0'-'9']*)
TOKEN:operator: ("(" | ")" | "=" | "{" | "}" | "+" | "-")
TOKEN:number: ( ['0'-'9']+)
TOKEN:whitespace: ( [' ' '\t' '\n' '\r']+)
TOKEN:string: ( "\"" [^ '\'']* "\"")
As you can see tokens are described by regular expressions. "while" represents string, 'a' character. Operator | represents alternation, ? represents zero or one occurence of preceding element. List of all regular expression constructs supported by GLF can be found in NBS Language Description.
GLF lexer reads source text from the first character and tries to apply some regular expression on it. There are two simple rules that defines which regular expression "wins":
For example input text "whileee" is recognized (using lexer defined before) as one identifier token. First part of this text can represent keyword "while", but longer match wins. And text "if" is recognized as keyword token. This text matches identifier pattern too, but keyword definition precedes identifier definition.
Sometimes, during development of your lexer, it can be useful to see its output directly. Thats why we have designed Tokens View. It can be opened from main menu - Window / Other / Tokens View.
Sometimes its hard to describe your language by one plain list of tokens. GLF lexer allows you to define more groups of tokens, that are relevant in different contexts. There is something called state of lexer, and there are separate lists of tokens for each state. Each recognition of token can change state of lexer. We will show this functionality on following example:
Source text:
while identifier /* * author Jan */ 1234 "string"
NBS file:
TOKEN:keyword: ("while" | "if" | "else")
TOKEN:identifier: ( ['a'-'z' 'A'-'Z'] ['a'-'z' 'A'-'Z' '0'-'9']+)
TOKEN:operator: ("(" | ")" | "=" | "{" | "}" | "+" | "-")
TOKEN:number: ( ['0'-'9']+)
TOKEN:whitespace: ( [' ' '\t' '\n' '\r']+)
TOKEN:string: ( "\"" [^ '\'']* "\"")
TOKEN:comment: ("/*"):<IN_COMMENT>
<IN_COMMENT> {
TOKEN:comment: (.)
TOKEN:keyword: ("author")
TOKEN:comment: ("*/"):<DEFAULT>
}
String "author" is recognized as keyword inside comments only. It is recognized as identifier if you write it outside of comment.
When the lexer is finished, we can define grammar for this language:
SKIP:whitespace
SKIP:comment
S = Statement*;
Statement = WhileStatement |
IfStatement |
DeclarationStatement |
Assignment |
BlockStatement;
WhileStatement = "while" "(" Expression ")" Statement;
IfStatement = "if" "(" Expression ")" Statement ["else" Statement];
DeclarationStatement = "var" Identifier;
Identifier = <identifier>;
BlockStatement = "{" Statement* "}";
Assignment = Identifier "=" Expression;
Expression = Identifier [("+" | "-") Expression];
"S" represents starting symbol for GLF parser. "SKIP" command defines token types that should be ignored by parser. Source file of language defined by this grammar consists from zero or more statements. Thre are five types of statements: while cycle, conditional statement, declaration of variable, assignment and block of statements. Expressions can contain "+" and "-" operators only.
GLF uses extended BNF to express grammars. Nonterminals are represented by simple strings (like "WhileStatement"). There are three ways how to represent terminal symbols (tokens):
You can use following constructs in grammar rules:
See list of all operators that can be used in NBS file in NBS Language Description
Current version of GLF contains simple LL(k) parser, so your grammar have to be LL(k) too.
Use AST View if you would like to see output from your parser. AST View displays GLF parser output (called Parse Tree) generated for your input file. It can be opened from main menu - Window / Other / AST View:
Notice that GFL automatically highlights all syntax errors, when you have grammar of your language described in NBS file. Grammar definition is not compulsory. Many useful features can be based directly on lexical analyze.
We have lexical and syntax analyzers defined now. List of tokens produced by lexer and parse tree produced by syntax analyser stands as a base for other features supported by GLF. While syntax of token and grammar definitions is similar to other parser generator tools like JavaCC, the definition of features is similar to CSS.
Font and colors for tokens are inherited from IDE defaults, if you use some predefined token names ("keyword", "operator", "string", "character", "number", "identifier", "comment" and "whitespace"). But in other cases you should define colors and fonts directly in NBS file:
COLOR:special_token:{
foreground_color:"blue";
background_color:"0xff0f1d";
font_type:"bold";
You can redefine color for some nonterminal too:
COLOR:DeclarationStatement.Identifier: {
background_color:"0xe0d0e0";
This statement defines background color for "Identifier" parse tree node that is embedded in some "DeclarationStatement" node.
The power of GLF engine can be significantly increased by calling Java methods from nbs files. Following example marks some Identifiers based on result of Java method call. This approach can be used for semantic coloring.
COLOR:DeclarationStatement.Identifier: {
condition:org.foo.Foo.markIdentifierCondition;
background_color:"0xe0d0e0";
Support for indentation is simple. Add following piece of code to your NBS file:
INDENT "{:}"
INDENT "(:)"
INDENT "\\s*(((if|while)\\s*\\(|else\\s*|else\\s+if\\s*\\(|for\\s*\\(.*\\))[^{;]*)"
First two lines defines pairs of brackets. Indentation of lines between these brackets will be increased automatically. Second line defines conditional indentation. Line that follows line fulfilling this regular expression will be indented.
GLF allow you easily define code folding based on tokens or non terminals of your language:
FOLD:WhileStatement
Code folds for all while statements will be automatically added. You can define more types of folds, and names for fold actions too:
FOLD:WhileStatement: {
expand_type_action_name:"Expand While";
collapse_type_action_name:"Collapse While";
}
FOLD:IfStatement: {
expand_type_action_name:"Expand If";
collapse_type_action_name:"Collapse If";
}
NAVIGATOR:DeclarationStatement: {
display_name: "variable $Identifier$";
icon: "/org/netbeans/modules/languages/resources/variable.gif";
}
!! Error Recovery
!! Embedding of Different Languages
| ASTView1.png | ![]() |
61437 bytes |
| AdvancedOptionsDialog1.png | ![]() |
29339 bytes |
| New Project1.png | ![]() |
23163 bytes |
| NewLanguageSupport1.png | ![]() |
30444 bytes |
| NewLanguageSupport2.png | ![]() |
17611 bytes |
| NewProject2.png | ![]() |
23110 bytes |
| NewProject3.png | ![]() |
19938 bytes |
| TokensView1.png | ![]() |
49716 bytes |
Table of Contents