Expression parser understand

gengjiawen · January 3, 2021, 2:02pm

How to better understand this pic. Is this ocml grammar ?

beefok · January 4, 2021, 3:48pm

With the exception of ‘/’ instead of ‘|’ and ‘<-’ instead of ‘=’, this is just EBNF https://en.wikipedia.org/wiki/Extended_Backus–Naur_form#:~:text=In%20computer%20science%2C%20extended%20Backus,as%20a%20computer%20programming%20language.

keleshev · January 4, 2021, 3:50pm

Thanks for the question, @gengjiawen!

It looks like this is from section 6.4 Expression grammar. This grammar corresponds to the expression part of the subset of TypeScript/JavaScript that we are trying to implement in the book. It uses PEG notation (Parsing Expression Grammar). You might have heard of other similar notations such as EBNF, for example. The previous chapter 5 introduced parsing combinators and along with a light introduction of the PEG notation. (Perhaps, too light?)

The following paper (PDF) introduced PEG and its notation and is very approachable if you want to have dive into this area: https://bford.info/pub/lang/peg.pdf.

beefok · January 4, 2021, 3:52pm

Ah PEG, gotcha! It is exactly the correct syntax, haha.

keleshev · January 4, 2021, 3:57pm

@beefok, yeah PEG and EBNF are very similar. For mundane academic reasons I decided to use PEG for the book, even though EBNF is generally more familiar.

gengjiawen · January 5, 2021, 8:28am

I find this very hard to understand. Maybe add more context and give an introduction here here will make it easier to read.

keleshev · January 5, 2021, 3:11pm

Thanks for your feedback, @gengjiawen. I will give it some more thought and will try to improve this part, but this will probably take some time. In the meanwhile feel free to post more questions.

Here’s some informal description of what this PEG grammar stands for. It has operators similar to regular expressions: * means repetition, ? means optionality. However, there are differences too. Notably, strings must be quoted. This way "hi" ? means a grammar that corresponds to either hi or an empty string.

Grammar rules are just names for existing expressions. If you define a rule hi <- "hi" ?, then you can write just hi instead of "hi" ?.

There are parser tools that generate parsers from this kind of grammar description, however, in the book we implement the parser manually (using parser combinators), however, it still helps (at least to me) to think about grammars first before jumping to implement the parser combinators.

The expression grammar from the book doesn’t use quoted strings. Instead, it follows this convention that simplest rules (our pseudo-tokens) are capitalized. They are defined using the token parser combinator.

      args <- (expression (COMMA expression)*)?
      call <- ID LEFT_PAREN args RIGHT_PAREN
      atom <- call / ID / NUMBER
            / LEFT_PAREN expression RIGHT_PAREN
      unary <- NOT? atom
      product <- unary ((STAR / SLASH) unary)*
      sum <- product ((PLUS / MINUS) product)*
      comparison <- sum ((EQUAL / NOT_EQUAL) sum)*
      expression <- comparison

This grammar is pretty involved, but the book goes over each of the rule and shows how to implement it using our parser combinators.

Let’s look at one of the rules, call:

call <- ID LEFT_PAREN args RIGHT_PAREN

It says that a call starts with an identifier, followed by an opening parenthesis, then arguments (defined separately), then a closing parenthesis. This way f(x) fits our call rule, but f[x] does not.

The implementation of call from the book, using parser combinators:

      // call <- ID LEFT_PAREN args RIGHT_PAREN
      let call =
        ID.and(LEFT_PAREN.and(args.and(RIGHT_PAREN)))

Hope this helps! If not, let me know. I appreciate all the feedback.

gengjiawen · January 11, 2021, 3:47am

I still find this a bits of hard to understand. Maybe we can apply tdd here.

Like write a simple test case for

args <- (expression (COMMA expression)*)?

This maybe easier to understand.

gengjiawen · January 11, 2021, 3:49am

Also I think I need to understand PEG to move on. Any other material other than the pdf essay.

keleshev · January 11, 2021, 6:48pm

You can find some test that might be helpful in the book’s repo:

github.com

keleshev/compiling-to-assembly-from-scratch/blob/572074fcf78b8c3fec30a992767e3f9fbd50ba98/compiler.ts#L755-L770


test("Expression parser", () => {
  console.log();
  let [x, y, z] = [new Id('x'), new Id('y'), new Id('z')];
  let parse = (string) => expression.parseStringToCompletion(string);

  console.assert(parse('x + y + z').equals(new Add(new Add(x, y), z)));
  console.assert(parse('x + y * z').equals(new Add(x, new Multiply(y, z))));
  console.assert(parse('x * y + z').equals(new Add(new Multiply(x, y), z)));
  console.assert(parse('(x + y) * z').equals(new Multiply(new Add(x, y), z)));
  console.assert(parse('x == y + z').equals(new Equal(x, new Add(y, z))));
  console.assert(parse('x + y == z').equals(new Equal(new Add(x, y), z)));

  console.assert(parse('f()').equals(new Call('f', [])));
  console.assert(parse('f(x)').equals(new Call('f', [x])));
  console.assert(parse('f(x, y, z)').equals(new Call('f', [x, y, z])));
});

If you take call, as an example:

      // call <- ID LEFT_PAREN args RIGHT_PAREN
      let call =
        ID.and(LEFT_PAREN.and(args.and(RIGHT_PAREN)))

You can write a test like this:

let node = call.parseStringToCompletion("f(x)");
console.assert(node.equals(new Call("f", new Id("x"))));

Apart from the paper (wich is quite approachable, as I remember), maybe this Wikipedia example could be of some help?

gengjiawen · January 13, 2021, 2:42am

Thanks，this is more clear.

Maybe we can reorganise book split every PEG rule with code and test.

This surely helps for readers like me.

keleshev · January 13, 2021, 7:44am

Maybe we can reorganise book split every PEG rule with code and test.

Maybe this is the way to go; I’ll have to think about that.