chilon::parser tutorial - part 7

This example will show how to build a tree of operators to allow mathematical symbols to be handled. This can easily be integrated into the javascript parser discussed in the previous tutorials.

The parse data is a series of mathematical equations over integers.
8
5 * 6
1 + 2 + (3)
1 * 2 * 3
1 * (3 + 6) * 10

Two parsers will be discussed in this section, first a naive parser will be created, then an identical parser that stores a small syntax tree will be discussed.

First the naive parser:

#include <common.hpp>

typedef char_range<'0', '9'> Number;
typedef many_plus<Number>    Integer;

struct Expression;

typedef choice<
    Integer,
    sequence< char_<'('>, node<Expression>, char_<')'> >
> Term;

struct Multiplication : simple_node<Multiplication,
    joined_plus< char_<'*'>, Term >
> {};

struct Addition : simple_node<Addition,
    joined_plus< char_<'+'>, Multiplication >
> {};

struct Expression : simple_node<Expression, Addition> {};

int main(int argc, char *argv[]) {
    return test< many<Expression> >(argc, argv);
}

The output:

[
    Expression: Addition: [ Multiplication: [ "8" ] ]
    Expression: Addition: [ Multiplication: [
        "5"
        "6"
    ] ]
    Expression: Addition: [
        Multiplication: [ "1" ]
        Multiplication: [ "2" ]
        Multiplication: [ Expression: Addition: [ Multiplication: [ "3" ] ] ]
    ]
    Expression: Addition: [ Multiplication: [
        "1"
        "2"
        "3"
    ] ]
    Expression: Addition: [ Multiplication: [
        "1"
        Expression: Addition: [
            Multiplication: [ "3" ]
            Multiplication: [ "6" ]
        ]
        "10"
    ] ]
]

In the previous abstract syntax tree there are many vectors with a single element that represent a sum with a single element (hence not really a sum at all). It is easy to collapse these nodes to allow a variants to store nodes only when necessary using the tree_join template.

#include <common.hpp>

typedef char_range<'0', '9'> Number;
typedef many_plus<Number>    Integer;

struct Expression;

typedef choice<
    Integer,
    sequence< char_<'('>, node<Expression>, char_<')'> >
> Term;

// tree_joined is like joined_plus, but stores a
//      variant(vector(subexpression store), subexpression store)
// Any variants within subexpression are collapsed into this variant.
// If only a single sub-expression is matches then the latter storage
// type is used, otherwise the former variant type is populated.
struct Multiplication : simple_node<Multiplication,
    tree_joined< char_<'*'>, Term >
> {};

struct Addition : simple_node<Addition,
    tree_joined< char_<'+'>, Multiplication >
> {};

struct Expression : simple_node<Expression, Addition> {};

int main(int argc, char *argv[]) {
    return test< many<Expression> >(argc, argv);
}

The new neater output:

[
    Expression: "8"
    Expression: Multiplication: [
        "5"
        "6"
    ]
    Expression: Addition: [
        "1"
        "2"
        Expression: "3"
    ]
    Expression: Multiplication: [
        "1"
        "2"
        "3"
    ]
    Expression: Multiplication: [
        "1"
        Expression: Addition: [
            "3"
            "6"
        ]
        "10"
    ]
]

chilon::parser also provides tree_many which behaves identically to tree_join but where the join parser is always the whitespace parser.

previous | next