Make A Language

Part Three: Defining Variables

9 September 2020

Welcome back! This time, we’ll parse and evaluate variable definitions.

Parsing

Before we can begin writing a parser, we need to decide on a syntax. Since we like Rust, we’ll go for a similar syntax:

let a = 5

To start off, Eldiro will only include immutable variables, or bindings,1 as they’re sometimes called (at least that’s what I’ll be calling them for the rest of this series).

Although we’d usually start writing a test here, I fear that lib.rs will become too cluttered. To ameliorate this, we’ll move lib.rs to expr.rs, remove mod util; from expr.rs, and put the following in lib.rs:

pub mod expr;
mod utils;

We also need to import crate::utils in expr.rs:

use crate::utils;

Now we can start writing a test. Open lib.rs and add pub mod binding_def; to the top, and open up src/binding_def.rs:

use crate::expr::Expr;

#[derive(Debug, PartialEq)]
pub struct BindingDef {
    name: String,
    val: Expr,
}

impl BindingDef {
    pub fn new(s: &str) -> (&str, Self) {
        todo!()
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::expr::{Number, Op};

    #[test]
    fn parse_binding_def() {
        assert_eq!(
            BindingDef::new("let a = 10 / 2"),
            (
                "",
                BindingDef {
                    name: "a".to_string(),
                    val: Expr {
                        lhs: Number(10),
                        rhs: Number(2),
                        op: Op::Div,
                    },
                },
            ),
        );
    }
}

Let’s implement the BindingDef parser:

use crate::utils;

// snip

impl BindingDef {
    pub fn new(s: &str) -> (&str, Self) {
        let s = if s.starts_with("let") {
            &s[3..]
        } else {
            panic!("expected let")
        };
        let (s, _) = utils::extract_whitespace(s);

        let (s, name) = utils::extract_ident(s); // Unimplemented!
        let (s, _) = utils::extract_whitespace(s);

        let s = if s.starts_with('=') {
            &s[1..]
        } else {
            panic!("expected equals sign")
        };
        let (s, _) = utils::extract_whitespace(s);

        let (s, val) = Expr::new(s);

        (
            s,
            Self {
                name: name.to_string(),
                val,
            },
        )
    }
}

We start by checking if the input starts with let; if it does, we strip it off and continue. If it doesn’t, we panic. Next, we strip off whitespace. After this, we extract an identifier using a function we haven’t defined yet, and strip off whitespace. Next, we follow the same pattern we used to strip off the let, but instead this time with =. Finally, we strip off whitespace yet again and can parse an expression from the remaining input.

Although the panicking isn’t pretty, it’s good enough for this tutorial. The repetition, on the other hand, isn’t. Before we get too ahead of ourselves, we should implement utils::extract_ident, though:

// utils.rs

pub(crate) fn extract_ident(s: &str) -> (&str, &str) {
    take_while(|c| c.is_ascii_alphabetic(), s)
}

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn extract_alphabetic_ident() {
        assert_eq!(extract_ident("abcdEFG stop"), (" stop", "abcdEFG"));
    }
}

Easy enough! But what if we want our identifiers to contain numbers?

pub(crate) fn extract_ident(s: &str) -> (&str, &str) {
    // Notice this is now alphanumeric instead of alphabetic
    take_while(|c| c.is_ascii_alphanumeric(), s)
}

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn extract_alphanumeric_ident() {
        assert_eq!(extract_ident("foobar1()"), ("()", "foobar1"));
    }
}

Fortunately, this works. It has one problem, though: what if we’re in a situation where either a number or an identifier is permissible? For example, both of the following will work once Eldiro becomes more developed:

let a = 10       # numbers
let n = a        # identifier

A number or an identifier works in the same place.

The problem is that extract_ident allows all characters of the identifier to be numbers, thereby allowing the mistaken treatment of numbers as identifiers. It is for this reason that many programming languages (Rust and C included) don’t allow identifiers to start with a number; Eldiro should follow in their footsteps. Luckily, this is pretty easy to implement:

pub(crate) fn extract_ident(s: &str) -> (&str, &str) {
    let input_starts_with_alphabetic = s
        .chars()
        .next()
        .map(|c| c.is_ascii_alphabetic())
        .unwrap_or(false);

    if input_starts_with_alphabetic {
        take_while(|c| c.is_ascii_alphanumeric(), s)
    } else {
        (s, "")
    }
}

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn cannot_extract_ident_beginning_with_number() {
        assert_eq!(extract_ident("123abc"), ("123abc", ""));
    }
}

Note how we try to extract the identifier in the test, but the entire input is returned as leftover, meaning the parser didn’t consume anything. This matches what we wrote on the second-last line of extract_ident’s definition: if the input doesn’t start with an alphabetic character, then we know we won’t be able to parse an identifier and can just give up by not consuming anything.

Now that we have utils::extract_ident working, we can move on to the next problem: the duplication of logic in BindingDef::new. Here’s the current code of that function for a quick refresher:

impl BindingDef {
    pub fn new(s: &str) -> (&str, Self) {
        let s = if s.starts_with("let") {
            &s[3..]
        } else {
            panic!("expected let")
        };
        let (s, _) = utils::extract_whitespace(s);

        let (s, name) = utils::extract_ident(s);
        let (s, _) = utils::extract_whitespace(s);

        let s = if s.starts_with('=') {
            &s[1..]
        } else {
            panic!("expected equals sign")
        };
        let (s, _) = utils::extract_whitespace(s);

        let (s, val) = Expr::new(s);

        (
            s,
            Self {
                name: name.to_string(),
                val,
            },
        )
    }
}

For both let and = we follow the same pattern:

  1. Check if the input starts with the desired text
  2. If it does, return the input with the desired text chopped off
  3. If it doesn’t, panic

This can be abstracted away into a function, which we should definitely do since we’ll be using it a lot.2 Inspired by the Nom parsing library,3 we’ll call this function tag:

// utils.rs

pub(crate) fn tag<'a, 'b>(starting_text: &'a str, s: &'b str) -> &'b str {
    if s.starts_with(starting_text) {
        &s[starting_text.len()..]
    } else {
        panic!("expected {}", starting_text);
    }
}

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn tag_word() {
        assert_eq!(tag("let", "let a"), " a");
    }
}

Those lifetimes on tag might look scary, but all they’re doing is telling Rust that the lifetimes of s and the output are related, while the lifetimes of starting_text and the output aren’t. If we didn’t have the lifetimes, Rust wouldn’t know if the returned value has to live as long as starting_text, or s, or both.

With tag working, let’s go back to BindingDef::new and refactor it to use our new function:

impl BindingDef {
    pub fn new(s: &str) -> (&str, Self) {
        let s = utils::tag("let", s);
        let (s, _) = utils::extract_whitespace(s);

        let (s, name) = utils::extract_ident(s);
        let (s, _) = utils::extract_whitespace(s);

        let s = utils::tag("=", s);
        let (s, _) = utils::extract_whitespace(s);

        let (s, val) = Expr::new(s);

        (
            s,
            Self {
                name: name.to_string(),
                val,
            },
        )
    }
}

Now for the moment of truth:

$ cargo t
running 21 tests
test expr::tests::parse_add_op ... ok
test expr::tests::parse_div_op ... ok
test binding_def::tests::parse_binding_def ... ok
test expr::tests::parse_expr_with_whitespace ... ok
test expr::tests::parse_mul_op ... ok
test expr::tests::parse_number ... ok
test expr::tests::parse_one_plus_two ... ok
test expr::tests::parse_sub_op ... ok
test utils::tests::cannot_extract_ident_beginning_with_number ... ok
test utils::tests::do_not_extract_anything_from_empty_input ... ok
test utils::tests::extract_alphabetic_ident ... ok
test utils::tests::extract_alphanumeric_ident ... ok
test utils::tests::extract_digits_with_no_remainder ... ok
test utils::tests::extract_minus ... ok
test utils::tests::extract_multiple_digits ... ok
test utils::tests::extract_one_digit ... ok
test utils::tests::extract_plus ... ok
test utils::tests::extract_slash ... ok
test utils::tests::extract_spaces ... ok
test utils::tests::tag_word ... ok
test utils::tests::extract_star ... ok

test result: ok. 21 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Huzzah! That output is getting a little long, though, so let’s rerun with --quiet, or -q for short:

$ cargo t -q
running 21 tests
.....................
test result: ok. 21 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Much better. Now that we can parse binding definitions, we need to evaluate them.

Evaluation

Up to now, all we’ve done is parsing. Now it’s time for something different: evaluation. Since Eldiro doesn’t at the moment have a way to access the value of bindings (we haven’t parsed binding usages yet), and can only define new bindings, we need to worry solely about the problem of storing bindings. Of course, we will eventually add a way to get the value of a binding, but we’ll worry about that when we get to it.

Imagine that Eldiro had the ability to access the value of bindings; how would binding storage work? If we have a binding definition let a = 2 + 2, and later use it by typing a, how can we figure out what the value of the usage is? When we’re evaluating the binding usage, we know that a binding is in scope with the name a and a value of 4. We also know that we’re trying to use a binding with the name a. To me, this sounds like a job for a HashMap (also known as a ‘dictionary’ or ‘map’ in other languages). A HashMap allows you to store values, associating them with a key. Given that same key, you can later extract the value associated with that key. In this case, the keys are binding names (Strings), and the values are … well, we’ll see.

Traditionally, all the context needed to evaluate code is called an environment. Let’s define one. Add the module declaration to lib.rs:

pub mod binding_def;
pub mod expr;

mod env;
mod utils;

And create and open src/env.rs:

use std::collections::HashMap;

#[derive(Debug, PartialEq, Default)]
pub(crate) struct Env {
    bindings: HashMap<String, ???>,
}

What do we put where the question marks are? Well, that’s the type of the value of a binding. At the moment Eldiro only supports expressions with numbers, but we’ll add support for all kinds of data types later. To make these future enhancements easier, we can declare Eldiro values as an enum with variants for each data type. We’ll need another module for this:

// lib.rs
pub mod binding_def;
pub mod expr;
pub mod val;

mod env;
mod utils;

And in src/val.rs:

#[derive(Debug, PartialEq)]
pub enum Val {
    Number(i32),
}

Let’s head back over to env.rs so we can use Val as the value type of our bindings HashMap:

use crate::val::Val;
use std::collections::HashMap;

#[derive(Debug, PartialEq, Default)]
pub(crate) struct Env {
    bindings: HashMap<String, Val>,
}

All that’s left is a way for code outside of the env module to insert bindings:

impl Env {
    pub(crate) fn store_binding(&mut self, name: String, val: Val) {
        self.bindings.insert(name, val);
    }
}

Well, that was easy! Let’s go back to binding_def.rs to add an eval method to BindingDef that calls Env::store_binding for us:

use crate::env::Env;

// snip

impl BindingDef {
    // snip

    pub(crate) fn eval(&self, env: &mut Env) {
        env.store_binding(self.name.clone(), self.val.eval());
    }
}

This stores in the environment a binding with the name of the BindingDef we’re evaluating (the bar in let bar = 2 * 10). The value of the new binding we’re storing is the value of the BindingDef we’re evaluating (the 2 * 10 in let bar = 2 * 10). There’s an issue with this code, though: it calls a non-existent method eval on self.val (which is of type Expr); we should implement it:

// expr.rs
use crate::val::Val

// snip

impl Expr {
    // snip

    pub(crate) fn eval(&self) -> Val {
        let Number(lhs) = self.lhs;
        let Number(rhs) = self.rhs;

        let result = match self.op {
            Op::Add => lhs + rhs,
            Op::Sub => lhs - rhs,
            Op::Mul => lhs * rhs,
            Op::Div => lhs / rhs,
        };

        Val::Number(result)
    }
}

This method first extracts the numeric values of the left-hand side and right-hand sides of the expression, and then, depending on the operator, completes the needed calculation. To make sure Expr::eval is working, we should add some tests:

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn eval_add() {
        assert_eq!(
            Expr {
                lhs: Number(10),
                rhs: Number(10),
                op: Op::Add,
            }
            .eval(),
            Val::Number(20),
        );
    }

    #[test]
    fn eval_sub() {
        assert_eq!(
            Expr {
                lhs: Number(1),
                rhs: Number(5),
                op: Op::Sub,
            }
            .eval(),
            Val::Number(-4),
        );
    }

    #[test]
    fn eval_mul() {
        assert_eq!(
            Expr {
                lhs: Number(5),
                rhs: Number(6),
                op: Op::Mul,
            }
            .eval(),
            Val::Number(30),
        );
    }

    #[test]
    fn eval_div() {
        assert_eq!(
            Expr {
                lhs: Number(200),
                rhs: Number(20),
                op: Op::Div,
            }
            .eval(),
            Val::Number(10),
        );
    }
}

Let’s see if it works:

$ cargo t -q
// lots of warnings
running 25 tests
.........................
test result: ok. 25 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Great!

Next time around, we’ll allow Eldiro’s Exprs to be just numbers by themselves, without requiring that everything is a mathematical operation (so let foo = 100 will work). This might sound trivial, but it actually requires some significant changes across the project.


  1. A variable that you can’t vary doesn’t make much sense. ↩︎

  2. Anywhere where you know you’ll see a given string, the function will be used. Examples include all keywords, parentheses, and some operators. ↩︎

  3. In case you haven’t used Nom before, all the functions in util are loosely modelled off Nom. I decided against directly using Nom because I think it’s easier to understand when you write all the code yourself. In some cases this is impractical, but for something this simple I think the tradeoff is worth it. ↩︎