The NBS language is used to describe programming languages, and integrate them with NetBeans. With it, you can define the tokens and grammar of a language, and how to present this language in the IDE. One *.nbs file defines one programming language.
The lexical analyzer is the first part of each parser and compiler. It reads a source file and brakes it up in to a stream of tokens. A token is something like word in a given language. Tokens are typically defined by regular expressions.
Example:
TOKEN:number:( ['0' - '9']+ )
This line of an nbs file defines a type of token named "number".
Syntax:
tokenDefinition = "TOKEN" ":" tokenTypeName ":" "(" regularExpression ")";
tokenTypeName = <identifier>;
Regular expression constructs:
| 'a' | character a |
| "abc" | string abc - syntax is the same like in Java (\t, \n, ...) |
| "ab"i | case-insensitive string, i.e. ab, Ab, aB or AB |
| ['a' 'b' 'c'] | charater a, b, or c (simple class) |
| [^'a' 'b' 'c'] | any character except a, b, or c (negation) |
| ['a'-'z' 'A'-'Z'] | a through z or A through Z, inclusive (range) |
| . | any character |
| 'a'? | character a once or not at all |
| 'a'+ | character a one or more time |
| 'a'* | character a zero or more time |
| XY | X followed by Y |
| X/Y | Either X or Y |
| (X) | X, as a capturing group |
It is hard to describe some languages using a stateless lexical analyzer. For this reason, the nbs language contains support for states during lexical analysis.
Example:
<IN_COMMENT_STATE>:TOKEN:comment_end:( "*/" ):<DEFAULT_STATE>
"comment_end" token switches lex state from "IN_COMMENT_STATE" to "DEFAULT_STATE".
Syntax:
tokenDefinition = ["<" initialState ">" ":"] "TOKEN" ":" tokenTypeName ":" "(" regularExpression ")" [":" "<" finalState ">"];
initialState = <identifier>;
finalState = <identifier>;
State is not changed if you do not specify a final state. The default state is named "<DEFAULT>". It is also possible to group several token definitions with a common initial state (see example below).
Syntax:
tokenGroupdefinition = "<" initialState ">" "{"
(
"TOKEN" ":" tokenName ":" "(" regularExpression ")" [":" "<" finalState ">"]
)*
"}";
The following simple example shows most of the TOKEN keyword features. It defines tokens for *.properties files:
TOKEN:key:( [^ "=" "\n" "\r"]* ):<BEFORE_EQUAL>
<BEFORE_EQUAL> {
TOKEN:whitespace:( ["\n" "\r"]+ ):<DEFAULT>
TOKEN:operator:( "=" ):<AFTER_EQUAL>
}
<AFTER_EQUAL> {
TOKEN:whitespace:( ["\n" "\r"]+ ):<DEFAULT>
TOKEN:value:( [^ "\n" "\r"]* )
}
On occasion, it may prove difficult to describe some tokens using a regular expression. In these cases, you may implement parts of your tokenizer in Java.
TOKEN:special_token: {
start_state: "BEFORE_SPECIAL";
call: org.foo.Foo.myMethod;
end_state: "AFTER_SPECIAL";
}
And org.foo.Foo clas looks like:
package org.foo;
import org.netbeans.api.languages.CharInput;
public class Foo {
public static Object[] myMethod (CharInput input) {
int start = input.getIndex ();
while (!input.eof () &&
input.next () != '/'
) {
input.read ();
}
if (input.next () == '/')
return new Object[] {
ASTToken.create (MIME_TYPE, "js_operator", "", 0),
"NEW_STATE"
};
input.setIndex (start);
return null;
}
}
The method called from the TOKEN definition should take in one parameter of type CharInput. It should return the parsed token and new state of the tokenizer.
The syntax analyser reads a stream of tokens and creates an AST (Abstract Syntax Tree). Syntax definition is optional. Some IDE features (token coloring) can be based directly on the lexical analyser. A grammar is described in a similar form to that of JavaCC (extended BNF). The current version of GLF contains a simple LL syntax analyser. LR and LALR grammars are not accepted.
Example:
S = (Statement)*;
Statement = WhileStatement | IfStatement | ExpressionStatement;
WhileStatement = "while" "(" ConditionalExpression ")" Block;
IfStatement = "if" "(" ConditionalExpression ")" Block ["else" Block];
Block = "{" (Statement)* "}";
ConditionalExpression = <identifier>;
ExpressionStatement = <identifier>;
Gramar rule can contain:
| WhileStatement | nonterminal |
| ( expression )* | repeat expression zero or more time |
| ( expression )+ | repeat expression one or more time |
| [ expression ] | expression once or not at all |
| "if" | value of token (regardless of token type) |
| <identifier> | type of token (regardless of token value) |
| <keyword,"if"> | token type and value |
There are typically some tokens that should be ignored during syntax analysis, e.g. spaces. Use the SKIP keyword to define them.
Syntax:
skipDefinition = "SKIP" ":" tokenTypeName;
# definition of tokens TOKEN:number:( ['0' - '9']+ ) TOKEN:operator:( '*' | '+' ) TOKEN:whitespace:( [' ' '\t' '\n' '\r'}]+ ) # grammar SKIP:whitespace S = additiveExpression; additiveExpression = multiplicativeExpression "+" additiveExpression; additiveExpression = multiplicativeExpression; multiplicativeExpression = operand "*" multiplicativeExpression; multiplicativeExpression = operand; operand = <number>;
Input: 1 + 2 * 3
After lexical analyse (tokens): <number,"1"> <operator,"+"> <number,"2"> <operator,"*"> <number,"3">
After syntactic analyse (parse tree):
S
- additiveExpression
- multiplicativeExpression
- operand
- <number,"1">
- <operator,"+">
- additiveExpression
- multiplicativeExpression
- operand
- <number,"2">
- <operator,"*">
- multiplicativeExpression
- operand
- <number,"3">
This feature of an nbs file allows you to define the color or font type for token or grammar rules.
The following example defines the foreground color and font type for the number token type:
COLOR:number: {
foreground_color:"orange";
font_type:"bold";
}
Syntax of COLOR feature:
"COLOR" ":" Selector ":" "{"
( propertyName ":" properrtyValue )*
"}"
Syntax:
colorDefinition = "COLOR" ":" identifier ":" "{" ( parameter )* "}";
identifier = <identifier> ( "." <identifier> )*;
parameter = parameterName ":" parameterValue;
parameterValue = <string> | <identifier>;
Where identifier is the name of some grammar rule, value of some token or name of a token type. The names of grammar rules can be nested. This means that "method.name" can be used to specify color of "name" nonTerminal embeded in "method" nonTerminal. Corresponding grammar rule should looks like:
method = modifiers returnType name ...;
Supported properties:
Any grammar rule can be folded.
Syntax:
foldDefinition = "FOLD" ":" identifier [ ":" parameters ( "\"" text "\"" ) | methodCall ]
Where identifier is the name of some grammar rule.
Examples:
FOLD:additiveExpression FOLD:additiveExpression:"$multiplicativeExpression$ + $additiveExpression$" FOLD:additiveExpression:org.foo.Foo.method
If some part of code is folded, there is some text written in place of it. There are three ways how to specify this text.
NAVIGATOR command allows you to reflect usage of some grammar rules (some parts of parse tree) in the NetBeans navigator. Navigator can contain list or tree of elements. Each node in the navigator can have an icon, tooltip and action assigned to it.
Syntax:
navigatorDefinition = "NAVIGATOR" ":" identifier ":" "{" parameters "}"
Where identifier is name of some grammar rule.
Example:
NAVIGATOR:method {
display_name: "$method_name$ ($parametersList$)";
tooltip: "$modifiers$ $type$ $method_name$ ($parametersList$) $throws$";
icon: "/org/netbeans/modules/languages/resources/method.gif";
}
Supported properties:
An NBS file consists of statements. Each statement defines either a token, group of tokens, a grammar rule or a feature.
S = (Statement)*; Statement = TokenStatement | TokenGroupStatement | GrammarRuleStatement | FeatureStatement;
Token definitions in an NBS file are similar to those in JavaCC:
TokenStatement = [LexerState ":"] "TOKEN" ":" TokenName ":" "(" RegularExpression ")" [":" LexerState];
LexerState = "<" <identifier> ">";
TokenName = <identifier>;
TokenGroupStatement = LexerState ":" "{" (TokenStatementWithoutInitialState)+ "}";
TokenStatementWithoutInitialState = "TOKEN" ":" TokenName ":" "(" RegularExpression ")" [":" LexerState];
GrammarRuleStatement = grLeftSide "=" grRightSide ";";
grLeftSide = <identifier>;
grRightSide = grChoice grRightSide1;
grRightSide1 = "|" grChoice grRightSide1;
grRightSide1 = ;
grChoice = grPart grChoice;
grChoice = ;
grPart = <identifier> grOperator;
grPart = tokenDef grOperator;
grPart = <string> grOperator;
grPart = "[" grRightSide "]";
grPart = "(" grRightSide ")" grOperator;
grOperator = "+";
grOperator = "*";
grOperator = "?";
grOperator = ;
tokenDef = "<" <identifier> tokenDef1 ">";
tokenDef1 = "," <string>;
tokenDef1 = ;
There are several ways to declare a feature statement:
FeatureStatement = Keyword Value |
Keyword ":" Selector |
Keyword ":" Selector ":" Value;
Keyword = <keyword>;
Keyword defines type of feature ("COLOR", "INDENTATION", ...). Selector defines where the feature should be applied. It can be type of token or name of grammar rule (nonterminal). You can specify some path using selector too. For example "Method.Name" can be used for "Name" identification inside Method grammar rule:
Method = "method" Name "(" Parameters ")" Block;
Name = <identifier>;
COLOR:Method.Name: {font_type:"bold"}
Syntax of selector:
Selector = <identifier> ("." <identifier>)*;
There are several ways how to declare value of feature:
Value = StringValue | MethodCallValue | CompoundValue;
StringValue = <string>;
MethodCallValue = <identifier> ("." <identifier>)*;
RegularExpressionValue = "(" RegularExpression ")";
CompoundValue = "{" (PropertyName ":" PropertyValue ";")* "}";
PropertyName = <identifier>;
PropertyValue = StringValue | MethodCallValue | RegularExpressionValue;
Table of Contents