SchliemannNBSLanguageDescription

Contents


NBS Language Description

The NBS language is used to describe programming languages, and integrate them with NetBeans. With it, you can define the tokens and grammar of a language, and how to present this language in the IDE. One *.nbs file defines one programming language.


Defining tokens

The lexical analyzer is the first part of each parser and compiler. It reads a source file and breaks it up in to a stream of tokens. A token is something like word in a given language. Tokens are typically defined by regular expressions.

Example:

TOKEN:number:( [09]+ )

This line of an nbs file defines a type of token named "number".

Syntax:

tokenDefinition = "TOKEN" ":" tokenTypeName ":" "(" regularExpression ")";
tokenTypeName = <identifier>;

Regular expression constructs:

'a' character a
"abc" string abc - syntax is the same like in Java (\t, \n, ...)
"ab"i case-insensitive string, i.e. ab, Ab, aB or AB
[[[abc | ['a' 'b' 'c']] charater a, b, or c (simple class)
[[[^abc | [^'a' 'b' 'c']] any character except a, b, or c (negation)
[[[aZAZ | ['a'-'z' 'A'-'Z']] a through z or A through Z, inclusive (range)
. any character
'a'? character a once or not at all
'a'+ character a one or more time
'a'* character a zero or more time
XY X followed by Y
X/Y Either X or Y
(X) X, as a capturing group


It is hard to describe some languages using a stateless lexical analyzer. For this reason, the nbs language contains support for states during lexical analysis.

Example:

<IN_COMMENT_STATE>:TOKEN:comment_end:( "*/" ):<DEFAULT_STATE>

"comment_end" token switches lex state from "IN_COMMENT_STATE" to "DEFAULT_STATE".

Syntax:

tokenDefinition = [["<"InitialState">"":" | "<" initialState ">" ":"]] "TOKEN" ":" tokenTypeName ":" "(" regularExpression ")" [":""<"FinalState">"];
initialState = <identifier>;
finalState = <identifier>;

State is not changed if you do not specify a final state. The default state is named "<DEFAULT>". It is also possible to group several token definitions with a common initial state (see example below).

Syntax:

tokenGroupdefinition = "<" initialState ">" "{" 
  ( 
    "TOKEN" ":" tokenName ":" "(" regularExpression ")" [":""<"FinalState">"] 
  )* 
"}";

The following simple example shows most of the TOKEN keyword features. It defines tokens for *.properties files:

TOKEN:key:( [^"=""\n""\r"]* ):<BEFORE_EQUAL>

<BEFORE_EQUAL> {
    TOKEN:whitespace:( ["\n""\r"]+ ):<DEFAULT>
    TOKEN:operator:( "=" ):<AFTER_EQUAL>
}

<AFTER_EQUAL> {
    TOKEN:whitespace:( ["\n""\r"]+ ):<DEFAULT>
    TOKEN:value:( [^"\n""\r"]* )
}


Parsing tokens in Java

On occasion, it may prove difficult to describe some tokens using a regular expression. In these cases, you may implement parts of your tokenizer in Java.

TOKEN:special_token: {
    call: org.foo.Foo.myMethod;
}

And org.foo.Foo clas looks like:

package org.foo;

import org.netbeans.api.languages.CharInput;

public class Foo { 

    private static final String MIME_TYPE = "text/x-foo";

    public static Object[] myMethod (CharInput input) {
        int start = input.getIndex ();

        while (!input.eof () && input.next () != '/') {
            input.read ();
        }

        if (input.next () == '/') {
            Language language;

            try {
                language = LanguagesManager.get().getLanguage(MIME_TYPE);
            } catch (LanguageDefinitionNotFoundException ex) {
                ex.printStackTrace();
                return null;
            }

            return new Object[] {
                ASTToken.create (language, "js_operator", "", 0, 0, null),
                null
            };
        }

        input.setIndex (start);
        return null;
    }
}

The method called from the TOKEN definition should take in one parameter of type CharInput. It should return an array whose first element is the parsed token and whose second element is the new state of the tokenizer (or null, please add explanation of what this is for).


Defining the grammar

The syntax analyser reads a stream of tokens and creates an AST (Abstract Syntax Tree). Syntax definition is optional. Some IDE features (token coloring) can be based directly on the lexical analyser. A grammar is described in a similar form to that of JavaCC (extended BNF). The current version of GLF contains a simple LL syntax analyser. LR and LALR grammars are not accepted.

Example:

S = (Statement)*;
Statement = WhileStatement | IfStatement | ExpressionStatement;
WhileStatement = "while" "(" ConditionalExpression ")" Block;
IfStatement = "if" "(" ConditionalExpression ")" Block ["else"Block];
Block = "{" (Statement)* "}";
ConditionalExpression = <identifier>;
ExpressionStatement = <identifier>;

Gramar rule can contain:

WhileStatement nonterminal
( expression )* repeat expression zero or more time
( expression )+ repeat expression one or more time
[[[Expression | [ expression ]] expression once or not at all
"if" value of token (regardless of token type)
<identifier> type of token (regardless of token value)
<keyword,"if"> token type and value


There are typically some tokens that should be ignored during syntax analysis, e.g. spaces. Use the SKIP keyword to define them.

Syntax:

skipDefinition = "SKIP" ":" tokenTypeName;


Lexical and Syntax analyse example

# definition of tokens
TOKEN:number:( [09]+ )
TOKEN:operator:( '*' | '+' )
TOKEN:whitespace:( [\t\n\r}]+ )

# grammar
SKIP:whitespace
S = additiveExpression;
additiveExpression = multiplicativeExpression "+" additiveExpression;
additiveExpression = multiplicativeExpression;
multiplicativeExpression = operand "*" multiplicativeExpression;
multiplicativeExpression = operand;
operand = <number>;

Input: 1 + 2 * 3
After lexical analyse (tokens): <number,"1"> <operator,"+"> <number,"2"> <operator,"*"> <number,"3">
After syntactic analyse (parse tree):

S
    - additiveExpression
        - multiplicativeExpression
            - operand
                - <number,"1">
        - <operator,"+">
        - additiveExpression
            - multiplicativeExpression
                - operand
                    - <number,"2">
                - <operator,"*">
                - multiplicativeExpression
                    - operand
                        - <number,"3">


Features

The NBS language provides a number of other features for defining the behavior of the editing environment. Typically, feature declarations take the following shape:

FEATURE_NAME:tokenOrNodeTypeName: {
    property1: "String that may contain $path.expressions$ which allow you to go down in the syntax tree from the context node and will be replaced with the string value of the respective token. If the context node is a token itself, get its value by $$.";
    property2:( ["R""r" ] "egular" ( " " )+ "expression" );
    property3: com.example.yourmodule.YourLanguage.method # will be executed to get the property value, must have exactly one argument of type SyntaxContext, appropriate return type depends on the respective property
    # ...
}

Using Java methods to define property values is generally possible for all properties.


Syntax coloring

This feature of an nbs file allows you to define the color or font type for token or grammar rules.

The following example defines the foreground color and font type for the number token type:

COLOR:number: {
    foreground_color:"orange";
    font_type:"bold";
}

Syntax of COLOR feature:

"COLOR" ":" Selector ":" "{" 
    ( propertyName ":" properrtyValue )*
"}"

Syntax:

colorDefinition = "COLOR" ":" identifier ":" "{" ( parameter )* "}";
identifier = <identifier> ( "." <identifier> )*;
parameter = parameterName ":" parameterValue;
parameterValue = <string> | <identifier>;

Where identifier is the name of some grammar rule, value of some token or name of a token type. The names of grammar rules can be nested. This means that "method.name" can be used to specify color of "name" nonTerminal embeded in "method" nonTerminal. Corresponding grammar rule should looks like:

method = modifiers returnType name ...;


Supported properties:

  • color_name: Name of color. elementName is used for name of color if its not specified.
  • default_coloring: Defines parent coloring (operator, keyword, identifier, whitespace, number, char, string, comment).
  • foreground_color: Foreground color (for example "white", "FF00FF").
  • background_color: Background color.
  • underline_color: Underlined color.
  • wave_underline_color: Wave underlined color.
  • strike_through_color: Strike through color.
  • font_name: Name of font.
  • font_type: Font type (like "bold" or "italics-bold").



Code folding

Any grammar rule can be folded.

Syntax:

foldDefinition = "FOLD" ":" identifier [MethodCall]

Where identifier is the name of some grammar rule.

Examples:

FOLD:additiveExpression
FOLD:additiveExpression:"$multiplicativeExpression$ + $additiveExpression$"
FOLD:additiveExpression:org.foo.Foo.method

If some part of code is folded, there is some text written in place of it. There are three ways how to specify this text.

  1. If you do not specify the text, default text ("...") is used.
  2. You can specify text directly. And you can use some expressions inside this text.
  3. Text can be obtained from some method call.



Navigator

NAVIGATOR command allows you to reflect usage of some grammar rules (some parts of parse tree) in the NetBeans navigator. Navigator can contain list or tree of elements. Each node in the navigator can have an icon, tooltip and action assigned to it.

Syntax:

navigatorDefinition = "NAVIGATOR" ":" identifier ":" "{" parameters "}"

Where identifier is name of some grammar rule.

Example:

NAVIGATOR:method {
    display_name: "$method_name$ ($parametersList$)";
    tooltip: "$modifiers$ $type$ $method_name$ ($parametersList$) $throws$";
    icon: "/org/netbeans/modules/languages/resources/method.gif";
}


Supported properties:

  • display_name: The display name of the node in Navigator. The display name may contain parameters such as $method_name$, $parametersList$. TODO: Document complete list.
  • icon: The path of the icon to display for the node in Navigator. A default icon is supplied if none is specified.
  • tooltip: The tooltip for the the node in Navigator. The tooltip may contain parameters such as $method_name$, $parametersList$. TODO: Document complete list.



Imports


Code completion


Actions


Tooltips


Marking declarations and usages

In code, certain tokens such as the names of variables and functions occur multiple times. Typically, one of those occurrences is a declaration, such as in the definition of a function. The other occurrences we will call usages. The SEMANTIC_DECLARATION and SEMANTIC_USAGE keywords allow you to make explicit the connection between those multiple occurrences. This automatically enables the following UI features:

  • When the user clicks on one occurrence of a token, all of its other occurrences will be highlighted (background color, marker at the margin).
  • Ctrl-clicking on a usage takes you to the declaration.

Here's an example for the Prolog language:

# declaration and usage of predicates
SEMANTIC_DECLARATION:identifier: {
    condition: tralesld.geewhiz.PrologNBS.isPredicateDeclaration;
    name: tralesld.geewhiz.PrologNBS.predicateName;
    type:"method";
}
# TODO Can we get rid of the squiggly lines for "unused" declarations?
SEMANTIC_USAGE:identifier: {
    condition: tralesld.geewhiz.PrologNBS.isPredicateUsage;
    name: tralesld.geewhiz.PrologNBS.predicateName;
    type:"method";
}
# TODO treat all non-initial clause heads as usages?

# declaration and usage of variables
SEMANTIC_DECLARATION:variable: {
    name: tralesld.geewhiz.PrologNBS.variableIdentifier;
    type:"variable";
}
SEMANTIC_USAGE:variable: {
    name: tralesld.geewhiz.PrologNBS.variableIdentifier;
    type:"variable";
}

Supported properties:

  • condition: as with other features, takes a boolean value to enable/disable this feature for a given token
  • name: arbitrary string value that must be identical for occurrences of the same token, different for occurrences of different tokens
  • type: no idea what it's for
  • ...


Indentation


Properties


Annotations


Appendix I: NBS file syntax.

Statements

An NBS file consists of statements. Each statement defines either a token, group of tokens, a grammar rule or a feature.

S = (Statement)*;
Statement = TokenStatement | TokenGroupStatement | GrammarRuleStatement | FeatureStatement;

Tokens

Token definitions in an NBS file are similar to those in JavaCC:

TokenStatement = [[LexerState":" | LexerState ":"]] "TOKEN" ":" TokenName ":" "(" RegularExpression ")" [":"LexerState];
LexerState = "<" <identifier> ">";
TokenName = <identifier>;

Group of Tokens

TokenGroupStatement = LexerState ":" "{" (TokenStatementWithoutInitialState)+ "}";
TokenStatementWithoutInitialState = "TOKEN" ":" TokenName ":" "(" RegularExpression ")" [":"LexerState];

Grammar Rules

GrammarRuleStatement = grLeftSide "=" grRightSide ";";
grLeftSide = <identifier>;
grRightSide = grChoice grRightSide1;
grRightSide1 = "|" grChoice grRightSide1;
grRightSide1 = ;
grChoice = grPart grChoice;
grChoice = ;
grPart = <identifier> grOperator;
grPart = tokenDef grOperator;
grPart = <string> grOperator;
grPart = "["GrRightSide"]";
grPart = "(" grRightSide ")" grOperator;
grOperator = "+";
grOperator = "*";
grOperator = "?";
grOperator = ;
tokenDef = "<" <identifier> tokenDef1 ">";
tokenDef1 = "," <string>;
tokenDef1 = ;

Features

There are several ways to declare a feature statement:

FeatureStatement = Keyword Value         |
                   Keyword ":" Selector  |
                   Keyword ":" Selector ":" Value;
Keyword = <keyword>;

Keyword defines type of feature ("COLOR", "INDENTATION", ...). Selector defines where the feature should be applied. It can be type of token or name of grammar rule (nonterminal). You can specify some path using selector too. For example "Method.Name" can be used for "Name" identification inside Method grammar rule:

Method = "method" Name "(" Parameters ")" Block;
Name = <identifier>;

COLOR:Method.Name: {font_type:"bold"}

Syntax of selector:

Selector = <identifier> ("." <identifier>)*;

There are several ways how to declare value of feature:

Value = StringValue | MethodCallValue | CompoundValue;
StringValue = <string>;
MethodCallValue = <identifier> ("." <identifier>)*;
RegularExpressionValue = "(" RegularExpression ")";
CompoundValue = "{" (PropertyName ":" PropertyValue ";")* "}";

PropertyName = <identifier>;
PropertyValue = StringValue | MethodCallValue | RegularExpressionValue;
Not logged in. Log in, Register

By use of this website, you agree to the NetBeans Policies and Terms of Use. © 2012, Oracle Corporation and/or its affiliates. Sponsored by Oracle logo