What do you think about the draft?

keleshev · July 14, 2020, 6:31am

@Apos I meant the date should be in the title page

keleshev · July 14, 2020, 12:28pm

@CJlambda by the way, I wrote the prototype of the book’s compiler in OCaml. I will upload that version to the book’s repo. A few people already showed interest in it.

CJlambda · July 14, 2020, 12:35pm

You read my mind, because that was going to be my first project: converting it to OCaml!

keleshev · July 14, 2020, 12:46pm

I have uploaded the OCaml version of the compiler to the book’s repo:

@CJlambda I hope I don’t spoil it for you!

I put it into the contrib folder. More contributions are welcome: versions of the compiler in other languages, for example. I imagine this will be valuable for other readers—to see how you can adapt the book’s code to other languages.

gmarik · July 14, 2020, 6:58pm

p44: I assume some words are left there by accident

this way is very much not unlike writing code

gmarik · July 15, 2020, 5:02am

with lower precedence rules building upon higher precedence rules.

i find precedence language confusing in general and in this case lower is actually higher than higher since it’s “upon”, isn’t it?

patrick · July 15, 2020, 7:51pm

I’m enjoying the draft. Thanks for writing it!

I also liked your blog post on your writing process using markdown, and I look forward to seeing the sequel, if you decide to write it.

In chapter 2, perhaps you could say in a sentence what strict and loose equality are. I have no experience in Javascript, and this was the only thing I had to Google to understand.

Here are some typos I found in the July 13 draft. I agree with CJlambda’s comments on the first few dozen pages.

In Appendix B:

“First one is the GNU Assembler” should be “The first one…”

“A small Rosetta-stone—style side-by-side comparisson cheat-sheet for completeness” has a typo (“comparisson” instead of comparison), and likely “Rosetta-stone—style” should be “Rosetta Stone–style,” with an en dash instead of a em dash. I found this tricky to punctuate correctly, and I had to do some Googling to get it right, so let me explain my reasoning. First, Rosetta Stone is a proper noun (see e.g. Wikipedia), so there should be no hyphen, and “stone” should be capitalized. Second, since you use Grammarly, let me appeal to their blog post about en dashes. The “Ming Dynasty–style” example is very similar (under “Using an En Dash with Complex Compound Adjectives”).

I think “cheat sheet” is the more standard form of “cheat-sheet” (10x more common on Google Ngram).

In the 19.2 code snippet for ARMASM, probably it would be good to italicize the comments to be consistent with the rest of the book. Also, should “with” be bold? Probably not?

In Appendix A:

“In case you’re a lucky owner” is not really a typo, but it sounds a little clunky to me, and I’d just write “If you’re a lucky owner.”

“Executives” should be “executables,” right?

“Only to good old” -> “Only to a good old.”

“based on” -> “made by” (if I understand correctly?)

Both uses of “Raspberry Pi” in this appendix should likely be “a Raspberry Pi.”

“Debian-based Linux distro” should be “Debian-based Linux distros.”

“We could have used qemu package, use it to configure some emulated hardware configuration, then log into that machine, and so on” might be better as “We could have used the qemu package, emulated some hardware configuration, logged into that machine, and so on.” In particular, this removes the double configure/configuration, which is a little clunky.

“The sudo” should be just “sudo.”

“Turns out” -> “It turns out.”

Instead of “Note,” it’s probably more common to write “Note that.” But this is a subjective point.

Also, I guess all the uses of “static” in the command line snippets here should not be bold.

An extremely minor issue: in the appendix, you write, “Second, you need to install QEMU – software that allows emulating different processors, including ARM,” which uses an an en dash to offset a remark. On pages 69 and 72 (for example) you use unspaced em dashes. On page 8 you use an em dash with spaces. It seems you use the unspaced em dash the most, so perhaps the other examples should be edited to conform with that usage? (I didn’t read this far but instead just used ctrl-f to search the .pdf.)

Chapter 3:
“Only once the compilation is finished the resulting program can be run.” I’m not sure whether this is wrong, but I think more idiomatic is “Only once the compilation is finished can the resulting program be run.”

“Straight-forward” -> “Straightforward”

Chapter 4, page 25: “The Call node is interesting because it has both a primitive string and an array of AST its members” should be “as its members.”

Apos · July 15, 2020, 5:31am

Visually, you can think of it this way:

product <- unary ((STAR / SLASH) unary)*
sum <- product ((PLUS / MINUS) product)*
comparison <- sum ((EQUAL / NOT_EQUAL) sum)*

Comparison has the lowest precedence, while product has the highest.

If you see this: 1 + 2 * 3 != 4 * 5 + 6

You want the tree to look like: ((1 + (2 * 3)) != ((4 * 5) + 6))

We start the parsing, we look for a comparison. Comparison looks for a sum. Sum looks for a product. Product looks for a unary which ends up finding the 1. Now product looks for a * or /. Doesn’t find any so we go back up to sum. Sum looks for + and finds it. Then it looks for a product again. Product goes down, finds 2 then looks for *, it finds it then looks for the next unary aka 3. That’s the part that creates our first inner “parentheses” (2 * 3). Then we bubble up to sum, we don’t find any + or - so get our second “parentheses” (1 + (2 * 3)) we get back to comparison and that takes care of the left side.

Does this make more sense? You can think of the word precedence as a parentheses to force an operation to get evaluated first.

keleshev · July 15, 2020, 7:16pm

@patrick thanks for the suggestions, keep 'em coming! I’m incorporating the small fixes into the next version of the draft.

@gmarik good point about the precedence, I will need to give some more thought how to describe it better. @Apos you’ve provided a good description, I will probably incorporate something similar.

Apos · July 15, 2020, 7:44pm

In section 7.13, the first paragraph has the sentence:

Every time we add an instruction to our program, the offset of all the following instructions addresses.

What does that mean?

keleshev · July 15, 2020, 7:52pm

Yeah, that didn’t come out right. What I meant is to highlight that if you have code like this:

    add pc, pc, #8
    add r0, r0, #1  
    add r0, r0, #2  
    sub r0, r0, #4

And you want to jump to the sub instruction, you add 8 to pc. But if you insert one more instructions like this:

    add pc, pc, #8
    add r0, r0, #1  
    add r0, r0, #2  
    add r0, r0, #3  
    sub r0, r0, #4

And you still want to jump to sub, then you need to change the pc offset from 8 to 12.
Here, I’m trying to create motivation for lables: a way to get all offsets right, even if more instructions are inserted here and there.

I will have to come up with a better example for the book.

keleshev · July 15, 2020, 7:57pm

A common way to introduce labels in literature is to say that a label stands for an address. And you use labels because you don’t know the actual addresses yet.

However, in ARM, branching is pc-related. You don’t jump to an address, but change the value of pc. That’s why I wanted, instead, to introduce branching as a more convenient way to manipulate pc. And the results are mixed.

Apos · July 15, 2020, 8:03pm

Sounds good. It’s just a case against hardcoding absolute jump values when possible.

Apos · July 15, 2020, 8:05pm

So far I like the book, it’s easy to follow and it feels like a spiritual successor to Let’s Build a Compiler.

keleshev · July 15, 2020, 8:20pm

@Apos I’m glad you like it! Haven’t read that one—gonna check it out.

gmarik · July 16, 2020, 2:43am

// returnStatement <- RETURN expression SEMICOLON
let returnStatement: Parser = RETURN.and(expression).bind((term) =>
SEMICOLON.and(constant(new Return(term))));

Am i understanding it correctly, the last ‘and(constant…’ is not about matching syntax but rather than completing the match and returning the Return AST?
Would ‘map’ make more sense there instead?

gmarik · July 16, 2020, 2:49am

p57: astray ‘TODO’?

// varStatement <- TODO
// VAR ID ASSIGN expression SEMICOLON

gmarik · July 16, 2020, 4:22am

swapped: Code doesn’t match the operator

p82: did the bullet list get collapsed? Next paragraph has the same issue

The data section declared with .data assembly directive is a span of memory that you are allowed to: * read, * write, * but not exe- cute.

Apos · July 16, 2020, 5:50am

In chapter 7.1, the assembly program sets a value to 41 then adds 1 to create 42. I didn’t fully understand the motivation for doing that. Was it to show an addition? Is there a meaning to error code 42? Or it’s the answer to the Ultimate Question of Life, the Universe, and Everything?

keleshev · July 16, 2020, 6:39am

@gmarik instead of SEMICOLON.and(constant(new Return(term))) it could have been SEMICOLON.map((_) => new Return(term)). Would the later make more sense to you?