How to better understand this pic. Is this ocml grammar ?
With the exception of ‘/’ instead of ‘|’ and ‘<-’ instead of ‘=’, this is just EBNF https://en.wikipedia.org/wiki/Extended_Backus–Naur_form#:~:text=In%20computer%20science%2C%20extended%20Backus,as%20a%20computer%20programming%20language.
Thanks for the question, @gengjiawen!
It looks like this is from section 6.4 Expression grammar. This grammar corresponds to the expression part of the subset of TypeScript/JavaScript that we are trying to implement in the book. It uses PEG notation (Parsing Expression Grammar). You might have heard of other similar notations such as EBNF, for example. The previous chapter 5 introduced parsing combinators and along with a light introduction of the PEG notation. (Perhaps, too light?)
The following paper (PDF) introduced PEG and its notation and is very approachable if you want to have dive into this area: https://bford.info/pub/lang/peg.pdf.
Ah PEG, gotcha! It is exactly the correct syntax, haha.
@beefok, yeah PEG and EBNF are very similar. For mundane academic reasons I decided to use PEG for the book, even though EBNF is generally more familiar.
I find this very hard to understand. Maybe add more context and give an introduction here here will make it easier to read.
Thanks for your feedback, @gengjiawen. I will give it some more thought and will try to improve this part, but this will probably take some time. In the meanwhile feel free to post more questions.
Here’s some informal description of what this PEG grammar stands for. It has operators similar to regular expressions: *
means repetition, ?
means optionality. However, there are differences too. Notably, strings must be quoted. This way "hi" ?
means a grammar that corresponds to either hi
or an empty string.
Grammar rules are just names for existing expressions. If you define a rule hi <- "hi" ?
, then you can write just hi
instead of "hi" ?
.
There are parser tools that generate parsers from this kind of grammar description, however, in the book we implement the parser manually (using parser combinators), however, it still helps (at least to me) to think about grammars first before jumping to implement the parser combinators.
The expression grammar from the book doesn’t use quoted strings. Instead, it follows this convention that simplest rules (our pseudo-tokens) are capitalized. They are defined using the token
parser combinator.
args <- (expression (COMMA expression)*)?
call <- ID LEFT_PAREN args RIGHT_PAREN
atom <- call / ID / NUMBER
/ LEFT_PAREN expression RIGHT_PAREN
unary <- NOT? atom
product <- unary ((STAR / SLASH) unary)*
sum <- product ((PLUS / MINUS) product)*
comparison <- sum ((EQUAL / NOT_EQUAL) sum)*
expression <- comparison
This grammar is pretty involved, but the book goes over each of the rule and shows how to implement it using our parser combinators.
Let’s look at one of the rules, call
:
call <- ID LEFT_PAREN args RIGHT_PAREN
It says that a call starts with an identifier, followed by an opening parenthesis, then arguments (defined separately), then a closing parenthesis. This way f(x)
fits our call
rule, but f[x]
does not.
The implementation of call
from the book, using parser combinators:
// call <- ID LEFT_PAREN args RIGHT_PAREN
let call =
ID.and(LEFT_PAREN.and(args.and(RIGHT_PAREN)))
Hope this helps! If not, let me know. I appreciate all the feedback.
I still find this a bits of hard to understand. Maybe we can apply tdd here.
Like write a simple test case for
args <- (expression (COMMA expression)*)?
This maybe easier to understand.
Also I think I need to understand PEG to move on. Any other material other than the pdf essay.
You can find some test that might be helpful in the book’s repo:
If you take call
, as an example:
// call <- ID LEFT_PAREN args RIGHT_PAREN
let call =
ID.and(LEFT_PAREN.and(args.and(RIGHT_PAREN)))
You can write a test like this:
let node = call.parseStringToCompletion("f(x)");
console.assert(node.equals(new Call("f", new Id("x"))));
Apart from the paper (wich is quite approachable, as I remember), maybe this Wikipedia example could be of some help?
Thanks,this is more clear.
Maybe we can reorganise book split every PEG rule with code and test.
This surely helps for readers like me.
Maybe we can reorganise book split every PEG rule with code and test.
Maybe this is the way to go; I’ll have to think about that.