@Apos I meant the date should be in the title page
@CJlambda by the way, I wrote the prototype of the book’s compiler in OCaml. I will upload that version to the book’s repo. A few people already showed interest in it.
You read my mind, because that was going to be my first project: converting it to OCaml!
I have uploaded the OCaml version of the compiler to the book’s repo:
@CJlambda I hope I don’t spoil it for you!
I put it into the contrib
folder. More contributions are welcome: versions of the compiler in other languages, for example. I imagine this will be valuable for other readers—to see how you can adapt the book’s code to other languages.
p44: I assume some words are left there by accident
this way is very much not unlike writing code
with lower precedence rules building upon higher precedence rules.
i find precedence language confusing in general and in this case lower is actually higher than higher since it’s “upon”, isn’t it?
I’m enjoying the draft. Thanks for writing it!
I also liked your blog post on your writing process using markdown, and I look forward to seeing the sequel, if you decide to write it.
In chapter 2, perhaps you could say in a sentence what strict and loose equality are. I have no experience in Javascript, and this was the only thing I had to Google to understand.
Here are some typos I found in the July 13 draft. I agree with CJlambda’s comments on the first few dozen pages.
In Appendix B:
“First one is the GNU Assembler” should be “The first one…”
“A small Rosetta-stone—style side-by-side comparisson cheat-sheet for completeness” has a typo (“comparisson” instead of comparison), and likely “Rosetta-stone—style” should be “Rosetta Stone–style,” with an en dash instead of a em dash. I found this tricky to punctuate correctly, and I had to do some Googling to get it right, so let me explain my reasoning. First, Rosetta Stone is a proper noun (see e.g. Wikipedia), so there should be no hyphen, and “stone” should be capitalized. Second, since you use Grammarly, let me appeal to their blog post about en dashes. The “Ming Dynasty–style” example is very similar (under “Using an En Dash with Complex Compound Adjectives”).
I think “cheat sheet” is the more standard form of “cheat-sheet” (10x more common on Google Ngram).
In the 19.2 code snippet for ARMASM, probably it would be good to italicize the comments to be consistent with the rest of the book. Also, should “with” be bold? Probably not?
In Appendix A:
“In case you’re a lucky owner” is not really a typo, but it sounds a little clunky to me, and I’d just write “If you’re a lucky owner.”
“Executives” should be “executables,” right?
“Only to good old” -> “Only to a good old.”
“based on” -> “made by” (if I understand correctly?)
Both uses of “Raspberry Pi” in this appendix should likely be “a Raspberry Pi.”
“Debian-based Linux distro” should be “Debian-based Linux distros.”
“We could have used qemu package, use it to configure some emulated hardware configuration, then log into that machine, and so on” might be better as “We could have used the qemu package, emulated some hardware configuration, logged into that machine, and so on.” In particular, this removes the double configure/configuration, which is a little clunky.
“The sudo” should be just “sudo.”
“Turns out” -> “It turns out.”
Instead of “Note,” it’s probably more common to write “Note that.” But this is a subjective point.
Also, I guess all the uses of “static” in the command line snippets here should not be bold.
An extremely minor issue: in the appendix, you write, “Second, you need to install QEMU – software that allows emulating different processors, including ARM,” which uses an an en dash to offset a remark. On pages 69 and 72 (for example) you use unspaced em dashes. On page 8 you use an em dash with spaces. It seems you use the unspaced em dash the most, so perhaps the other examples should be edited to conform with that usage? (I didn’t read this far but instead just used ctrl-f to search the .pdf.)
Chapter 3:
“Only once the compilation is finished the resulting program can be run.” I’m not sure whether this is wrong, but I think more idiomatic is “Only once the compilation is finished can the resulting program be run.”
“Straight-forward” -> “Straightforward”
Chapter 4, page 25: “The Call node is interesting because it has both a primitive string and an array of AST its members” should be “as its members.”
Visually, you can think of it this way:
product <- unary ((STAR / SLASH) unary)*
sum <- product ((PLUS / MINUS) product)*
comparison <- sum ((EQUAL / NOT_EQUAL) sum)*
Comparison has the lowest precedence, while product has the highest.
If you see this: 1 + 2 * 3 != 4 * 5 + 6
You want the tree to look like: ((1 + (2 * 3)) != ((4 * 5) + 6))
We start the parsing, we look for a comparison. Comparison looks for a sum. Sum looks for a product. Product looks for a unary which ends up finding the 1
. Now product looks for a *
or /
. Doesn’t find any so we go back up to sum. Sum looks for +
and finds it. Then it looks for a product again. Product goes down, finds 2
then looks for *
, it finds it then looks for the next unary aka 3
. That’s the part that creates our first inner “parentheses” (2 * 3). Then we bubble up to sum, we don’t find any +
or -
so get our second “parentheses” (1 + (2 * 3)) we get back to comparison and that takes care of the left side.
Does this make more sense? You can think of the word precedence as a parentheses to force an operation to get evaluated first.
@patrick thanks for the suggestions, keep 'em coming! I’m incorporating the small fixes into the next version of the draft.
@gmarik good point about the precedence, I will need to give some more thought how to describe it better. @Apos you’ve provided a good description, I will probably incorporate something similar.
In section 7.13, the first paragraph has the sentence:
Every time we add an instruction to our program, the offset of all the following instructions addresses.
What does that mean?
Yeah, that didn’t come out right. What I meant is to highlight that if you have code like this:
add pc, pc, #8
add r0, r0, #1
add r0, r0, #2
sub r0, r0, #4
And you want to jump to the sub
instruction, you add 8
to pc
. But if you insert one more instructions like this:
add pc, pc, #8
add r0, r0, #1
add r0, r0, #2
add r0, r0, #3
sub r0, r0, #4
And you still want to jump to sub
, then you need to change the pc
offset from 8
to 12
.
Here, I’m trying to create motivation for lables: a way to get all offsets right, even if more instructions are inserted here and there.
I will have to come up with a better example for the book.
A common way to introduce labels in literature is to say that a label stands for an address. And you use labels because you don’t know the actual addresses yet.
However, in ARM, branching is pc-related. You don’t jump to an address, but change the value of pc
. That’s why I wanted, instead, to introduce branching as a more convenient way to manipulate pc
. And the results are mixed.
Sounds good. It’s just a case against hardcoding absolute jump values when possible.
So far I like the book, it’s easy to follow and it feels like a spiritual successor to Let’s Build a Compiler.
// returnStatement <- RETURN expression SEMICOLON
let returnStatement: Parser = RETURN.and(expression).bind((term) =>
SEMICOLON.and(constant(new Return(term))));
Am i understanding it correctly, the last ‘and(constant…’ is not about matching syntax but rather than completing the match and returning the Return AST?
Would ‘map’ make more sense there instead?
p57: astray ‘TODO’?
// varStatement <- TODO
// VAR ID ASSIGN expression SEMICOLON
swapped: Code doesn’t match the operator
p82: did the bullet list get collapsed? Next paragraph has the same issue
The data section declared with .data assembly directive is a span of memory that you are allowed to: * read, * write, * but not exe- cute.
In chapter 7.1, the assembly program sets a value to 41
then adds 1
to create 42
. I didn’t fully understand the motivation for doing that. Was it to show an addition? Is there a meaning to error code 42
? Or it’s the answer to the Ultimate Question of Life, the Universe, and Everything?
@gmarik instead of SEMICOLON.and(constant(new Return(term)))
it could have been SEMICOLON.map((_) => new Return(term))
. Would the later make more sense to you?