Make A Language

Part Seven: A REPL

7 October 2020

In case you aren’t familiar with the concept, a read-eval-print-loop, or REPL, is a program that lets you interactively use a programming language. Here’s a hypothetical session with an Eldiro REPL:

$ eldiro
→ 5
5
→ 10 - 7
3
→ let one = 1
→ one
1

To implement such a thing, the REPL needs to have access to Stmt::new, Val and Env (to store the evaluation environment between inputs). We should isolate the REPL from as many internal implementation details as possible – as such, it would be nice if the REPL cannot see that Eldiro has Stmts. Additionally, the REPL should not have to worry about the idea of ‘unconsumed input’ that every parser in Eldiro has.

Rather, there should simply be a top-level parse function that calls the correct parser (in our case Stmt::new), while also returning an error if the input hasn’t been consumed fully:

// lib.rs

pub fn parse(s: &str) -> Result<?, String> {
    let (s, stmt) = stmt::Stmt::new(s)?;

    if s.is_empty() {
        Ok(stmt)
    } else {
        Err("input was not consumed fully by parser".to_string())
    }
}

What should that ? be? Again, we don’t want the REPL to know about Stmt. A solution to this is to create a Parse struct that contains a Stmt, without exposing Stmt itself:

pub struct Parse(stmt::Stmt);

pub fn parse(s: &str) -> Result<Parse, String> {
    let (s, stmt) = stmt::Stmt::new(s)?;

    if s.is_empty() {
        Ok(Parse(stmt))
    } else {
        Err("input was not consumed fully by parser".to_string())
    }
}

We have a bunch of pubs scattered throughout our codebase. Now that parsing is exposed properly, we can prune the pubs down to those which are actually needed. Search and replace "pub " with "" and "pub(crate) " with "" across the whole project to get rid of them. We now need to add all the necessary pub(crate) and pubs back again, following the compiler’s instructions. I’ll go through fixing the first error in detail, but I’ll leave you to do the rest yourself.

$ cargo c
error[E0603]: struct `Env` is private
 --> src/binding_def.rs:1:17
  |
1 | use crate::env::Env;
  |                 ^^^ private struct
  |
note: the struct `Env` is defined here
 --> src/env.rs:5:1
  |
5 | struct Env<'parent> {
  | ^^^^^^^^^^^^^^^^^^^

lots more errors ...

Let’s open up env.rs and make Env pub(crate):1

#[derive(Debug, PartialEq, Default)]
pub(crate) struct Env<'parent> {
    bindings: HashMap<String, Val>,
    parent: Option<&'parent Self>,
}

This change is very large, so feel free to take a look at the diff on GitHub.

Now that the REPL only has access to our parse function and the Parse struct, it needs a way to evaluate parsed Eldiro code. A nice, intuitive interface for this is an eval method on Parse:

// lib.rs

impl Parse {
    pub fn eval(&self, env: &mut Env) -> Result<Val, String> {
        self.0.eval(env)
    }
}

For this to compile, both Env and Val need to be pub:

// env.rs

#[derive(Debug, PartialEq, Default)]
pub struct Env<'parent> {
    bindings: HashMap<String, Val>,
    parent: Option<&'parent Self>,
}
// val.rs

#[derive(Debug, Clone, PartialEq)]
pub enum Val {
    Number(i32),
    Unit,
}

We should re-export Env and Val from lib.rs to make them easy to access. Here’s lib.rs in its entirety:

mod binding_def;
mod env;
mod expr;
mod stmt;
mod utils;
mod val;

pub use env::Env;
pub use val::Val;

pub struct Parse(stmt::Stmt);

impl Parse {
    pub fn eval(&self, env: &mut Env) -> Result<Val, String> {
        self.0.eval(env)
    }
}

pub fn parse(s: &str) -> Result<Parse, String> {
    let (s, stmt) = stmt::Stmt::new(s)?;

    if s.is_empty() {
        Ok(Parse(stmt))
    } else {
        Err("input was not consumed fully by parser".to_string())
    }
}

Now that the minimum and only the minimum interface for a REPL is public, we can start implementing the REPL itself.

As Eldiro becomes more developed, we may want to add fancy features like autocomplete and syntax highlighting to the REPL. Although we could write the code necessary to make use of terminal emulators’ advanced features ourselves, a more reasonable approach is to use libraries that already exist. Say that we create the REPL as part of the Eldiro crate (we would put the code in src/bin/eldiro.rs) and used a terminal colours library. If we ever write, say, a language server for Eldiro that depends on the eldiro crate, that terminal colours library would be pulled in, regardless of whether we actually want it or not. Because of this, it makes more sense to put the REPL into a separate crate. We could put the REPL into its own repository too, but that seems like overkill given the size of the project.

To keep the eldiro crate and the REPL in the same folder, while keeping them separate crates, we need to use Cargo’s Workspaces. Let’s move all of the current eldiro crate into its own folder, first:

$ mkdir -p crates/eldiro
$ git mv {Cargo.toml,src} crates/eldiro/ # Or use ‘mv’ if you aren’t using Git.

We need to instruct Cargo that all the crates in our workspace will reside inside the crates/ directory. To do this, let’s create a Cargo.toml at the root of the repository and fill it with the following:

[workspace]
members = ["crates/*"]

Let’s create a crate for the REPL, now:

$ cargo new crates/eldiro-repl

Hey, hey, wait a second! Later, we’ll want Eldiro to have an interpreter. Wouldn’t it be nice if the binary used to run the REPL functioned also as an interpreter? Something like this would be cool:

$ eldiro
→ 1000
1000
→ ^D
$ cat code.eldiro
A bunch of Eldiro code ...
$ eldiro < code.eldiro # Or alternatively ‘cat code.eldiro | eldiro’
Output of the code

Since the eldiro CLI tool will contain both an interpreter and a REPL, it makes more sense to call the crate eldiro-cli:

$ rm -r crates/eldiro-repl
$ cargo new crates/eldiro-cli

Open up crates/eldiro-cli/Cargo.toml, and add the eldiro library as a dependency:

[dependencies]
eldiro = {path = "../eldiro"}

Let’s start by creating a loop of reading user input:

// crates/eldiro-cli/src/main.rs

use std::io;

fn main() -> io::Result<()> {
    loop {
        let mut input = String::new();
        io::stdin().read_line(&mut input)?;

        dbg!(input);
    }
}

If you run it, you’ll get something like this:

$ cargo r -q
hello
[crates/eldiro-cli/src/main.rs:8] input = "hello\n"
goodbye
[crates/eldiro-cli/src/main.rs:8] input = "goodbye\n"
^C
$

Let’s add a prompt to make the user aware that the REPL is waiting for input:

use std::io;

fn main() -> io::Result<()> {
    loop {
        print!("→ ");

        let mut input = String::new();
        io::stdin().read_line(&mut input)?;

        dbg!(input);
    }
}
$ cargo r -q
test
[crates/eldiro-cli/src/main.rs:10] input = "test\n"
^C
$

Why didn’t our prompt appear? The reason is that STDOUT and STDIN are line-buffered, meaning that output (or input in the case of STDIN) only appears once a whole line has been sent. We can manually flush STDOUT to fix this:

use std::io;

fn main() -> io::Result<()> {
    loop {
        print!("→ ");
        io::stdout().flush()?;

        let mut input = String::new();
        io::stdin().read_line(&mut input)?;

        dbg!(input);
    }
}
$ cargo r -q
→ foo
[crates/eldiro-cli/src/main.rs:11] input = "foo\n"
→ bar
[crates/eldiro-cli/src/main.rs:11] input = "bar\n"
→ baz
[crates/eldiro-cli/src/main.rs:11] input = "baz\n"
→ ^C
$

Cool! There is one functional problem with this code, though: if printing the prompt fails, the program will panic. In this case, panicking isn’t the right course of action in my opinion – if writing to STDOUT fails, we should instead handle the error and exit cleanly:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    loop {
        write!(io::stdout(), "→ ")?;
        io::stdout().flush()?;

        let mut input = String::new();
        io::stdin().read_line(&mut input)?;

        dbg!(input);
    }
}

We’re getting a handle to STDOUT and STDIN every time we go around the loop. Instead, we could do this once before the loop starts, which is probably faster:2

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        let mut input = String::new();
        stdin.read_line(&mut input)?;

        dbg!(input);
    }
}

Finally, we’re allocating a new String on the heap every time a line of input is read from the user. Rather than doing that, we can instead create the String once and read into it each time the loop body runs:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();

    let mut input = String::new();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;
        dbg!(&input);
    }
}
$ cargo r -q
→ test
[crates/eldiro-cli/src/main.rs:14] &input = "test\n"
→ again
[crates/eldiro-cli/src/main.rs:14] &input = "test\nagain\n"
→ wait, what?
[crates/eldiro-cli/src/main.rs:14] &input = "test\nagain\nwait, what?\n"
→ ^C
$

The problem here is that .read_line is appending to input, rather than overwriting what’s there. We need to manually call .clear() on input so that it is emptied before we read input into it again:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();

    let mut input = String::new();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;
        dbg!(&input);

        input.clear();
    }
}
$ cargo r -q
→ foo
[crates/eldiro-cli/src/main.rs:14] &input = "foo\n"
→ bar
[crates/eldiro-cli/src/main.rs:14] &input = "bar\n"
→ baz
[crates/eldiro-cli/src/main.rs:14] &input = "baz\n"
→ ^C
$

This way we reuse the previous allocation instead of repeatedly creating new Strings.

To be honest, this is all premature optimisation, and makes little to no difference in practice. However, the code now is less repetitive and … well … I guess it’s fun to optimise things :)

Parsing input

Let’s parse the input the user has provided us with, rather than debug-printing it:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();
    let mut stderr = io::stderr();

    let mut input = String::new();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;

        match eldiro::parse(&input) {
            Ok(parse) => {
                dbg!(parse);
            }
            Err(msg) => {
                writeln!(stderr, "Parse error: {}", msg)?;
                stderr.flush()?;
            }
        }

        input.clear();
    }
}
$ cargo r -q
error[E0277]: `eldiro::Parse` doesn't implement `std::fmt::Debug`
  --> crates/eldiro-cli/src/main.rs:18:17
   |
18 |                 dbg!(parse);
   |                 ^^^^^^^^^^^^ `eldiro::Parse` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug`
   |
   = help: the trait `std::fmt::Debug` is not implemented for `eldiro::Parse`
   = note: required because of the requirements on the impl of `std::fmt::Debug` for `&eldiro::Parse`
   = note: required by `std::fmt::Debug::fmt`
   = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

Ah, eldiro::Parse doesn’t implement Debug. Let’s derive that now:

// crates/eldiro/src/lib.rs

#[derive(Debug)]
pub struct Parse(stmt::Stmt);

And try running again:

$ cargo r -q
→ 1 + 1
Parse error: input was not consumed fully by parser
→ 9999
Parse error: input was not consumed fully by parser
→ ^C
$

Huh? Let’s add a debug print to see what wasn’t consumed by the parser:

// crates/eldiro/src/lib.rs

pub fn parse(s: &str) -> Result<Parse, String> {
    let (s, stmt) = stmt::Stmt::new(s)?;

    if s.is_empty() {
        Ok(Parse(stmt))
    } else {
        dbg!(s); // here
        Err("input was not consumed fully by parser".to_string())
    }
}
$ cargo r -q
→ 1 + 1
[crates/eldiro/src/lib.rs:26] s = "\n"
Parse error: input was not consumed fully by parser
→ ^C
$

Ah, of course! .read_line leaves in the final newline from us hitting the return key, so we should strip that off before we parse the input:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();
    let mut stderr = io::stderr();

    let mut input = String::new();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;

        //                   Here ↓
        match eldiro::parse(input.trim()) {
            Ok(parse) => {
                dbg!(parse);
            }
            Err(msg) => {
                writeln!(stderr, "Parse error: {}", msg)?;
                stderr.flush()?;
            }
        }

        input.clear();
    }
}
$ cargo r -q
→ 1 + 1
[crates/eldiro-cli/src/main.rs:18] parse = Parse(
    Expr(
        Operation {
            lhs: Number(
                1,
            ),
            rhs: Number(
                1,
            ),
            op: Add,
        },
    ),
)
→ 999
[crates/eldiro-cli/src/main.rs:18] parse = Parse(
    Expr(
        Number(
            Number(
                999,
            ),
        ),
    ),
)
→ abc
[crates/eldiro-cli/src/main.rs:18] parse = Parse(
    Expr(
        BindingUsage(
            BindingUsage {
                name: "abc",
            },
        ),
    ),
)

Fantastic! It always feels so good to have different parts of a project come together.

Don’t forget to get rid of the debug-print we added to parse, by the way.

Evaluating the Parse

Now that we’ve parsed the input handed to us by the user, we can evaluate it:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();
    let mut stderr = io::stderr();

    let mut input = String::new();
    let mut env = eldiro::Env::default();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;

        match eldiro::parse(input.trim()) {
            Ok(parse) => match parse.eval(&mut env) {
                Ok(val) => {
                    dbg!(val);
                }
                Err(msg) => {
                    writeln!(stderr, "Evaluation error: {}", msg)?;
                    stderr.flush()?;
                }
            },
            Err(msg) => {
                writeln!(stderr, "Parse error: {}", msg)?;
                stderr.flush()?;
            }
        }

        input.clear();
    }
}
$ cargo r -q
→ abc
Evaluation error: binding with name ‘abc’ does not exist
→ let abc = 10
[crates/eldiro-cli/src/main.rs:20] val = Unit
→ abc
[crates/eldiro-cli/src/main.rs:20] val = Number(
    10,
)
→ 10 - 5
[crates/eldiro-cli/src/main.rs:20] val = Number(
    5,
)
→ ^C
$

Isn’t that cool?!

The code could use some cleaning, though. In particular, the nested repetition of writing errors to STDERR is a problem solved by the ? operator:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();
    let mut stderr = io::stderr();

    let mut input = String::new();
    let mut env = eldiro::Env::default();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;

        match run(input.trim(), &mut env) {
            Ok(()) => {}
            Err(msg) => {
                writeln!(stderr, "{}", msg)?;
                stderr.flush()?;
            }
        }

        input.clear();
    }
}

fn run(input: &str, env: &mut eldiro::Env) -> Result<(), String> {
    let parse = eldiro::parse(input).map_err(|msg| format!("Parse error: {}", msg))?;

    let evaluated = parse
        .eval(env)
        .map_err(|msg| format!("Evaluation error: {}", msg))?;

    dbg!(evaluated);

    Ok(())
}

That, in my opinion, is much easier to read.

Quality of life improvements

Currently, evaluating a binding definition returns a Unit. This is perfectly reasonable. However, it is a little distracting, in my opinion, when we are constantly reminded of that fact by the REPL:

$ cargo r -q
→ let a = 10 * 5
[crates/eldiro-cli/src/main.rs:36] evaluated = Unit
→ let b = a
[crates/eldiro-cli/src/main.rs:36] evaluated = Unit
→ let c = b
[crates/eldiro-cli/src/main.rs:36] evaluated = Unit
→ c
[crates/eldiro-cli/src/main.rs:36] evaluated = Number(
    50,
)
→ ^C
$

Let’s add a check to only show the evaluated value if it isn’t a Unit:

fn run(input: &str, env: &mut eldiro::Env) -> Result<(), String> {
    let parse = eldiro::parse(input).map_err(|msg| format!("Parse error: {}", msg))?;

    let evaluated = parse
        .eval(env)
        .map_err(|msg| format!("Evaluation error: {}", msg))?;

    if evaluated != eldiro::Val::Unit {
        dbg!(evaluated);
    }

    Ok(())
}
$ cargo r -q
→ let a = 10 * 5
→ let b = a
→ let c = b
→ c
[crates/eldiro-cli/src/main.rs:37] evaluated = Number(
    50,
)
→ ^C
$

Nice, that’s less noisy. We should also define a proper format for displaying Vals so we can customise how they appear:

// crates/eldiro/src/val.rs

use std::fmt;

// snip

impl fmt::Display for Val {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Self::Number(n) => write!(f, "{}", n),
            Self::Unit => write!(f, "Unit"),
        }
    }
}

Let’s make use of this in the REPL:

use std::io::{self, Write};

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();
    let mut stderr = io::stderr();

    let mut input = String::new();
    let mut env = eldiro::Env::default();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;

        match run(input.trim(), &mut env) {
            Ok(Some(val)) => {
                writeln!(stdout, "{}", val)?;
            }
            Ok(None) => {}
            Err(msg) => {
                writeln!(stderr, "{}", msg)?;
                stderr.flush()?;
            }
        }

        input.clear();
    }
}

fn run(input: &str, env: &mut eldiro::Env) -> Result<Option<eldiro::Val>, String> {
    let parse = eldiro::parse(input).map_err(|msg| format!("Parse error: {}", msg))?;

    let evaluated = parse
        .eval(env)
        .map_err(|msg| format!("Evaluation error: {}", msg))?;

    if evaluated == eldiro::Val::Unit {
        Ok(None)
    } else {
        Ok(Some(evaluated))
    }
}

Let’s see if it works:

$ cargo r -q
→ let calculation = 10 * 10
→ calculation
100
→ ^C
$

Nice, now we don’t have all that extraneous text cluttering the output.

To finish up, let’s remove those unneeded3 braces from the match in main:

fn main() -> io::Result<()> {
    let stdin = io::stdin();
    let mut stdout = io::stdout();
    let mut stderr = io::stderr();

    let mut input = String::new();
    let mut env = eldiro::Env::default();

    loop {
        write!(stdout, "→ ")?;
        stdout.flush()?;

        stdin.read_line(&mut input)?;

        match run(input.trim(), &mut env) {
            Ok(Some(val)) => writeln!(stdout, "{}", val)?,
            Ok(None) => {}
            Err(msg) => writeln!(stderr, "{}", msg)?,
        }

        input.clear();
    }
}

In Part Eight we’ll parse and evaluate function definitions.


  1. I know I said earlier that Env would need to be public, but we can do that later. ↩︎

  2. Looking at the code of std::io::stdout(), I can see the construction of an Arc and a RefCell (among other things), the cost of which is avoided (apart from the initial call before the loop). I haven’t run any benchmarks though, and the difference in performance is likely negligible. ↩︎

  3. The .flush() on stderr was unneeded, so it was removed here too. When I realised that, I almost went back to the previous code samples and changed it, but then I noticed that the debug line numbers would be mismatched. I decided to just leave it and change it at the end :) ↩︎