This tutorial assumes a familiarity with parsing expression grammars (PEGs).
Grammars must begin with a “@grammar” statement which is used to determines the namespace the output grammar lives within.
@grammar MyGrammar
Grammars can be nested within other namespaces in the output code.
@module grandparent_namespace.parent_namespace @grammar MyGrammar
Comments are C++-style.
// a single line comment /* a comment that ends at a star followed by a slash */
Unlike PEGs whitespace is optional between elements in a sequence. The “Spacing“ rule is used to determine what is considered whitespace (and thus should contain no sequences itself). The symbol ^ can be used to seperate elements in a sequence that should not be joined.
@grammar javascript Spacing <- \s+ // \s matches tabs, vertical tabs and spaces. Identifier <- [a-zA-Z_] ^ [a-zA-Z0-9_]* VarDeclaration <- "var" Identifier
@grammar list
Spacing <- \s
// * or + repetitions of a character storing parser store a string.
IdSuffix <- [0-9a-zA-Z_]*
// Spacing* is allowed between parsers in a sequence. Spacing* is not
// allowed between parsers joined with ^.
// The storage types of adjacent string/char storing parsers joined with ^
// are collapsed to store a single string, so Id stores a string.
Id <- [a-zA-Z_] ^ IdSuffix
// (P % Q) matches many P joined with Q (with optional Spacing* around
// the Q) and stores vector[storage type of P]
// In Grammar "[", "," and "]" are not stored as they always match the
// same data. (P ^% Q) is the same as (P % Q) but does not allow
// Spacing* to match around Q.
Grammar <- "[" (Id % ",") "]"
// Grammar stores: vector<string>
// parses: [ hello, king ]
// as: vector<string>("hello", "king")
@grammar js
Spacing <- \s
Id <- [a-zA-Z_] ^ [0-9a-zA-Z_]*
// Sequences store tuples. sub-tuples are broken down into the parent
// tuple type, and a tuple that stores a single type is broken down into
// that type.
// FuncCall stores tuple<string, string>.
FuncCall <- Id "(" Id ")"
// FuncCall stores tuple<string, string> and Id stores string so
// Grammar stores vector< variant<string, tuple<string, string> > >
// Duplicate types are collapsed into a single entry in a variant, and a
// variant that stores a single type is collapsed to that type.
Grammar <- (FuncCall / Id)+
@grammar mathematics_basic
Spacing <- \s
// <int- stores resulting parsed string as int
Number < int - [0-9]+
// <= creates a "node parser". Node parsers store a new type with the same
// name as the parsing rule. The storage types of node parsers are not
// flattened into the storage type of including parsers.
Product <= Number %+ "*"
// %+ is like % but at least one join item must be stored.
// P %+ Q matches P (Q P)* and stores vector[storage type of P]
Addition <= Product %+ "+"
// Spacing* is allowed between elements in each P in P+ or P* unless
// P stores a character. P^+ is the same as P+ with no spacing allowed.
Grammar <- Addition+
// These examples uses [ .. ] to represent a stored list/vector type.
// stores: vector<Addition>
// creates:
// class Product {
// vector<int> value_;
// }
// class Addition {
// vector<Product> value_;
// }
// parses: 4 + 2
// as: [ Addition[Product[4], Product[2]] ]
// parses: 4
// as: [ Addition[Product[4]] ]
// parses: 4 7
// as: [ Addition[Product[4]], Addition[Product[7]]]
@grammar mathematics_basic
Spacing <- \s
Number < int - [0-9]+
// Expression recursively refers to itself through Term. This would not be
// possible if Expression was not a node parser as in this case the type of
// Expression would recursively depend on its own storage type.
// Term stores variant<int, Expression>
Term <= Number / "(" Expression ")"
Product <= Number %+ "*"
Addition <= Product %+ "+"
Expression <= Addition
Grammar <- Expression+
@grammar mathematics
Spacing <- \s
Number < int - [0-9]+
// |% parses the same data as %+ but stores parsed data differently.
// The node type is only created if the join matches more than one item,
// otherwise it stores the item to the left of the |%. The resulting type
// of the whole expression is a variant that can store either type.
// In this case Product stores variant<int, Product> which is populated
// with either int or Product depending on whether Number matches one or
// many times.
Product <= Number |% "*"
// Node parsers that use |+, |* or |% cannot refer to themselves.
// Addition stores: variant<Addition, storage type of Product>
// expand: variant<Addition, variant<Product, int>>
// collapse: variant<Addition, Product, int>
Addition <= Product |% "+"
Grammar <- Addition+
// stores: vector< variant<int, Product, Addition> >
// creates:
// class Product {
// vector<int> value_;
// }
// class Addition {
// vector<variant<int, Product>> value_;
// }
// parses: 4 + 2
// as: [ Addition[4, 2] ]
// parses: 4
// as: [ 4 ]
// parses: 4 7
// as: [ 4, 7 ]
// parses: 4 + 2 * 7
// as: [ Addition[4, Product[2, 7]] ]
@grammar hash
Id <- [a-zA-Z]+
// (KeyPair <- Id "=" Id) would store tuple<string, string> but since
// the first identifier begins with "#" then key_value<string, string>
// is stored.
KeyPair <- #Id "=" Id
// (P % Q) would normally store vector<storage type of P>, but when the
// storage type of P is key_value<...> then it stores a vector_hash_map.
// A vector_hash_map stores the order in which items were inserted
// in addition to a hash index which can be used for fast access to
// a stored item based on its key. This storage behaviour is the same
// for all parsers that can store vectors.
Grammar <- "{" KeyPair % "," "}"
// Using { key -> value, ... } to represent the hash map type nyu created
// parses "{ first = hello, second = bye }" as:
// {
// "first" -> "hello",
// "second" -> "bye"
// }