The main idea behind building DamnBASIC was to have a tool that would help me build games using a higher language definition and then generate code for platforms that had a MOS 6502 chip (Commodore 64, NES, Atari 800, Apple II, etc…).
The first thing I did was create a BNF document that would allow me to define the syntax structure of the language. I wasn’t looking to build a new type of semantic, I only wanted an easy to use and already known syntax. That’s why I decided to go for a BASIC-ish / C-ish type of syntax. This is how I wanted the language to look:
This looks like something simple and straight forward if you’ve ever programmed in a C based language.
Building the thing
The good thing about having a BNF document is that you can use is as a reference for each step on the building of the compiler. The first thing I looked for were terminals. In this case I had digits [0-9], alphabetic characters [a-zA-Z], multiple symbols [+|-|*|/|…] and finally keywords like “if”, “else”, “func”, etc. Once I’ve read the symbols I was able to construct tokens. Tokens are used by the parser to generate a syntax tree.
Here you can see the whole list of tokens I had.
The token was a very simple structure which only had a type, raw value and the line of the scan.
The scanning routine is very simple and it’s broken in multiple parts in a hierarchical way. For example the first thing I want to scan is white spaces and comments then it would be symbols after that keywords and finally terminals like boolean values, numbers and identifiers. The main loop looks something like this.
Lets break apart the “has_terminal” function so we can see how we can process string of characters and translate that into a usable token. The body of the function is very similar to our main loop. You’ll see this a lot through the compiler because it’s highly based on the BNF document which breaks each step into smaller parts.
Lets finish disecting “has_number”. First we want to check if the current character is a valid number. There are multiple ways we can do that. We could check if the characters ASCII value is in the range of [0-9] values or we could just compare it to all the characters in a string “0123456789”. So what we’ll do is loop until we stop getting a match or we reach the end of the file/string we are reading.
Here you can see a very nice video example explaining more about state machines and the use on language design. Basically the BNF document I wrote is a less “academic” representation of a finite state automata.