Part Twenty: Testing

I was disciplined with my TDD throughout the early parts of this series – I think this was helpful. Unfortunately, testing has fallen by the wayside in some recent parts; especially the last one. I would like to get back into the habit of TDD, and so I should add tests to areas of the codebase that are missing them.

A weird compilation failure

First off, a small erratum: it turns out that Eldiro fails to compile when the entire workspace’s tests are run:

$ cargo t
# snip
error[E0659]: `parser` is ambiguous (name vs any other name during import resolution)
 --> /home/me/src/eldiro/crates/parser/src/lib.rs:8:5
  |
8 | use parser::{ParseError, Parser};
  |     ^^^^^^ ambiguous name
  |
  = note: `parser` could refer to a crate passed with `--extern`
  = help: use `::parser` to refer to this crate unambiguously
note: `parser` could also refer to the module defined here
 --> /home/me/src/eldiro/crates/parser/src/lib.rs:3:1
  |
3 | mod parser;
  | ^^^^^^^^^^^
  = help: use `crate::parser` to refer to this module unambiguously

error: aborting due to previous error

This is because we have both a module and a crate with the same name, so the import is ambiguous. I’m unsure why this error doesn’t appear with cargo build, but we should fix it anyway:

// crates/parser/src/lib.rs

mod event;
mod grammar;
mod parser;
mod sink;
mod source;

use crate::parser::{ParseError, Parser};
use lexer::Lexer;
use rowan::GreenNode;
use sink::Sink;
use source::Source;
use syntax::SyntaxNode;

Updates to Rowan

Due to a flurry of activity focused on performance, Rowan has had two breaking changes recently. Let’s update both of our crates that depend on Rowan to point to the new version:

# crates/syntax/Cargo.toml

[dependencies]
lexer = {path = "../lexer"}
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.12.1"
# crates/parser/Cargo.toml

[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
rowan = "0.12.1"
syntax = {path = "../syntax"}
text-size = "1.0.0"

Thanks to Rust’s static type system and helpful error messages, updating dependencies is a breeze. The only error this change has resulted in is in VariableRef::name. Rowan switched from returning &SmolStr from SyntaxToken::text to returning a simple &str. Let’s update VariableRef::name to convert the &str to a SmolStr:

// crates/ast/src/lib.rs

impl VariableRef {
    pub fn name(&self) -> SmolStr {
        self.0.first_token().unwrap().text().into()
    }
}

All the other AST methods return SyntaxNodes, SyntaxTokens or data types that contain them; let’s return the VariableRef’s first child token directly for consistency:

impl VariableRef {
    pub fn name(&self) -> Option<SyntaxToken> {
        self.0.first_token()
    }
}

This allows us to remove ast’s smol_str dependency:

use syntax::{SyntaxElement, SyntaxKind, SyntaxNode, SyntaxToken};
# Cargo.toml

[dependencies]
syntax = {path = "../syntax"}

Next, we have a usage of VariableRef::name that we have to update:

// crates/hir/src/database.rs

impl Database {
    // snip

    pub(crate) fn lower_expr(&mut self, ast: Option<ast::Expr>) -> Expr {
        if let Some(ast) = ast {
            match ast {
                // snip
                ast::Expr::VariableRef(ast) => Expr::VariableRef {
                    var: ast.name().unwrap().text().into(),
                },
            }
        } else {
            // snip
        }
    }

    // snip
}

That match arm is longer than one line; let’s extract it to a method for consistency:

impl Database {
    // snip

    pub(crate) fn lower_expr(&mut self, ast: Option<ast::Expr>) -> Expr {
        if let Some(ast) = ast {
            match ast {
                // snip
                ast::Expr::VariableRef(ast) => self.lower_variable_ref(ast),
            }
        } else {
            // snip
        }
    }

    // snip

    fn lower_variable_ref(&mut self, ast: ast::VariableRef) -> Expr {
        Expr::VariableRef {
            var: ast.name().unwrap().text().into(),
        }
    }
}

Updating Rowan has given us one more error related to SmolStr vs &str. It’s a trivial fix – all we have to do is change a .clone() to an .into():

impl Database {
    pub(crate) fn lower_stmt(&mut self, ast: ast::Stmt) -> Option<Stmt> {
        let result = match ast {
            ast::Stmt::VariableDef(ast) => Stmt::VariableDef {
                name: ast.name()?.text().into(),
                value: self.lower_expr(ast.value()),
            },
            // snip
        };

        Some(result)
    }

    // snip
}

Running cargo clippy reveals that there is an occurrence where we were converting a &str to a SmolStr using .into() – now that the type has changed to &str, this .into() call does not have an effect. Let’s remove it:

// crates/parser/src/sink.rs

impl<'t, 'input> Sink<'t, 'input> {
    // snip

    fn token(&mut self) {
        let Token { kind, text, .. } = self.tokens[self.cursor];

        self.builder
            .token(EldiroLanguage::kind_to_raw(kind.into()), text);

        self.cursor += 1;
    }
}

Tests for lowering

Let’s write a test for lowering a variable definition:

// crates/hir/src/database.rs

#[cfg(test)]
mod tests {
    use super::*;

    fn parse(input: &str) -> ast::Root {
        ast::Root::cast(parser::parse(input).syntax()).unwrap()
    }

    #[test]
    fn lower_variable_def() {
        let root = parse("let foo = bar");
        let ast = root.stmts().next().unwrap();
        let hir = Database::default().lower_stmt(ast).unwrap();

        assert_eq!(
            hir,
            Stmt::VariableDef {
                name: "foo".into(),
                value: Expr::VariableRef { var: "bar".into() },
            },
        );
    }
}

Note how we need a SyntaxNode to create an ast::Root, from which we then extract an ast::Stmt for lowering. We can only create this SyntaxNode by parsing source code, so we have to depend on parser:

# Cargo.toml

[dependencies]
ast = {path = "../ast"}
la-arena = "0.2.0"
smol_str = "0.1.17"
syntax = {path = "../syntax"}

[dev-dependencies]
parser = {path = "../parser"}

We have an error about Stmt not implementing PartialEq, which we can derive for it and other HIR types:

// lib.rs

#[derive(Debug, PartialEq)]
pub enum Stmt {
    // snip
}

#[derive(Debug, PartialEq)]
pub enum Expr {
    // snip
}

#[derive(Debug, PartialEq)]
pub enum BinaryOp {
    // snip
}

#[derive(Debug, PartialEq)]
pub enum UnaryOp {
    // snip
}
$ cargo t -q --lib

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 26 tests
..........................
test result: ok. 26 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Writing tests for each of the other HIR elements will be easier if we write some helper functions:

// database.rs

#[derive(Debug, PartialEq, Default)]
pub struct Database {
    // snip
}

#[cfg(test)]
mod tests {
    use super::*;

    fn parse(input: &str) -> ast::Root {
        ast::Root::cast(parser::parse(input).syntax()).unwrap()
    }

    fn check_stmt(input: &str, expected_hir: Stmt) {
        let root = parse(input);
        let ast = root.stmts().next().unwrap();
        let hir = Database::default().lower_stmt(ast).unwrap();

        assert_eq!(hir, expected_hir);
    }

    fn check_expr(input: &str, expected_hir: Expr, expected_database: Database) {
        let root = parse(input);
        let first_stmt = root.stmts().next().unwrap();
        let ast = match first_stmt {
            ast::Stmt::Expr(ast) => ast,
            _ => unreachable!(),
        };
        let mut database = Database::default();
        let hir = database.lower_expr(Some(ast));

        assert_eq!(hir, expected_hir);
        assert_eq!(database, expected_database);
    }

    #[test]
    fn lower_variable_def() {
        check_stmt(
            "let foo = bar",
            Stmt::VariableDef {
                name: "foo".into(),
                value: Expr::VariableRef { var: "bar".into() },
            },
        );
    }
}
$ cargo t -q -p hir --lib

running 1 test
.
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Let’s write tests for each of the remaining HIR elements:

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn lower_expr_stmt() {
        check_stmt("123", Stmt::Expr(Expr::Literal { n: 123 }));
    }

    #[test]
    fn lower_binary_expr() {
        let mut exprs = Arena::new();
        let lhs = exprs.alloc(Expr::Literal { n: 1 });
        let rhs = exprs.alloc(Expr::Literal { n: 2 });

        check_expr(
            "1 + 2",
            Expr::Binary {
                lhs,
                rhs,
                op: BinaryOp::Add,
            },
            Database { exprs },
        );
    }

    #[test]
    fn lower_literal() {
        check_expr("999", Expr::Literal { n: 999 }, Database::default());
    }

    #[test]
    fn lower_paren_expr() {
        check_expr(
            "((((((abc))))))",
            Expr::VariableRef { var: "abc".into() },
            Database::default(),
        );
    }

    #[test]
    fn lower_unary_expr() {
        let mut exprs = Arena::new();
        let ten = exprs.alloc(Expr::Literal { n: 10 });

        check_expr(
            "-10",
            Expr::Unary {
                expr: ten,
                op: UnaryOp::Neg,
            },
            Database { exprs },
        );
    }

    #[test]
    fn lower_variable_ref() {
        check_expr(
            "foo",
            Expr::VariableRef { var: "foo".into() },
            Database::default(),
        );
    }
}
$ cargo t -q -p hir --lib

running 7 tests
.......
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

So far our tests have only included the ‘happy path’, or cases where everything has gone as it should. Let’s add some test cases where the input is malformed, to make sure lowering doesn’t break when the user is typing:

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn lower_variable_def_without_name() {
        let root = parse("let = 10");
        let ast = root.stmts().next().unwrap();
        assert!(Database::default().lower_stmt(ast).is_none());
    }

    #[test]
    fn lower_variable_def_without_value() {
        check_stmt(
            "let a =",
            Stmt::VariableDef {
                name: "a".into(),
                value: Expr::Missing,
            },
        );
    }

    // snip

    #[test]
    fn lower_binary_expr_without_rhs() {
        let mut exprs = Arena::new();
        let lhs = exprs.alloc(Expr::Literal { n: 10 });
        let rhs = exprs.alloc(Expr::Missing);

        check_expr(
            "10 -",
            Expr::Binary {
                lhs,
                rhs,
                op: BinaryOp::Sub,
            },
            Database { exprs },
        );
    }

    // snip

    #[test]
    fn lower_unary_expr_without_expr() {
        let mut exprs = Arena::new();
        let expr = exprs.alloc(Expr::Missing);

        check_expr(
            "-",
            Expr::Unary {
                expr,
                op: UnaryOp::Neg,
            },
            Database { exprs },
        );
    }

    // snip
}
$ cargo t -q -p hir --lib

running 11 tests
...........
test result: ok. 11 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Testing the parser

The parser’s current test suite is quite extensive; however, we might have missed some problematic inputs. The same goes for lowering, too.

A fuzzer is a program that generates random inputs to another program with the purpose of findings bugs in it. This is perfect for testing our parsing and lowering code. Let’s install cargo-fuzz, a tool that makes it easy to run fuzzers on Rust code. Unfortunately, the fuzzing engine we’ll be using, libFuzzer, only supports x86-64 Linux and x86-64 macOS.

$ cargo install cargo-fuzz

Now that cargo-fuzz is installed, we can create a crate to house our fuzzer. We don’t want the fuzzer to be part of the workspace, so we’ll create our fuzzing crate outside of crates/:

$ cargo new --bin fuzz
warning: compiling this new crate may not work due to invalid workspace configuration

current package believes it's in a workspace when it's not:
current:   /home/me/src/eldiro/fuzz/Cargo.toml
workspace: /home/me/src/eldiro/Cargo.toml

this may be fixable by adding `fuzz` to the `workspace.members` array of the manifest located at: /home/me/src/eldiro/Cargo.toml
Alternatively, to keep it out of the workspace, add the package to the `workspace.exclude` array, or add an empty `[workspace]` table to the package's manifest.
     Created binary (application) `fuzz` package

Let’s address that warning:

# fuzz/Cargo.toml

[workspace]

cargo-fuzz has the concept of a fuzz target, which is a program that receives random data from the fuzzer and tries to run some other code. Let’s create a fuzz target for parsing and lowering:

$ rm -r fuzz/src
$ mkdir fuzz/fuzz_targets
# fuzz/Cargo.toml

[[bin]]
name = "parse_lower"
path = "fuzz_targets/parse_lower.rs"

To write a fuzz target, we need to depend on libFuzzer:

[dependencies]
libfuzzer-sys = "0.3"

And define it:

// fuzz/fuzz_targets/parse_lower.rs

#![no_main]

use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        let parse = parser::parse(s);
        let root = ast::Root::cast(parse.syntax()).unwrap();
        let (_database, _stmts) = hir::lower(root);
    }
});

Note how libFuzzer gives us random bytes, and we can only run the parser and lower the syntax tree if those bytes are valid UTF-8. Let’s add those missing dependencies:

# Cargo.toml

[dependencies]
ast = {path = "../crates/ast"}
hir = {path = "../crates/hir"}
libfuzzer-sys = "0.3"
parser = {path = "../crates/parser"}

Finally, to use the fuzz target in our fuzz crate, we need to declare that the crate is meant to be used with cargo-fuzz:

[package.metadata]
cargo-fuzz = true

cargo-fuzz uses nightly compiler features; we’ll set an override for the fuzz crate only:

$ rustup override set --path fuzz nightly
info: using existing install for 'nightly-x86_64-apple-darwin'
info: override toolchain for 'fuzz' set to 'nightly-x86_64-apple-darwin'

  nightly-x86_64-apple-darwin unchanged - rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)

Cool! And, finally, we can run the fuzzer:

$ cd fuzz
$ cargo fuzz run parse_lower

A ton of output will fly across the screen as the fuzzer tries random inputs. The lines starting with NEW are when the fuzzer is generating new inputs, and the ones starting with REDUCE are those in which the fuzzer tries to make the input smaller and simpler while covering the same codepaths. libFuzzer is an LLVM project, so it makes sense that it would have deep insights into the program’s execution.

The fuzzer found a bug in around one second of fuzzing on my machine; here’s the report:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ParseIntError { kind: PosOverflow }', /home/me/src/eldiro/crates/ast/src/lib.rs:107:54

# snip backtrace

────────────────────────────────────────────────────────────────────────────────

Failing input:

	artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5

Output of `std::fmt::Debug`:

	[120, 42, 53, 120, 42, 53, 53, 53, 48, 48, 56, 54, 55, 53, 54, 51, 53, 53, 48, 48, 56, 54, 55, 53, 54, 51, 52, 54, 48, 55, 55, 45, 93, 45]

Reproduce with:

	cargo fuzz run parse_lower artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5

Minimize test case with:

	cargo fuzz tmin parse_lower artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5

────────────────────────────────────────────────────────────────────────────────

Let’s take a look at the failing input:

$ cat artifacts/parse_lower/crash-3ea7d538477b2fa5533ed073970deff99fec39d5
x*5x*55500867563550086756346077-]-

Let’s take a look at what we know:

This makes me think that, when we try to parse an integer out of the source code, the standard library’s integer parser realises that the parsed integer would be larger than the integer type’s maximum value. The panic message points to line 107 of crates/ast/src/lib.rs, which, sure enough, is where we parse integer literals:

impl Literal {
    pub fn parse(&self) -> u64 {
        self.0.first_token().unwrap().text().parse().unwrap() // line 107
    }
}

The solution to this is twofold:

AST validation

It’s great that the fuzzer found this bug for us so quickly! Let’s create a validation module in the ast crate:

// crates/ast/src/lib.rs

pub mod validation;

We’ll traverse down into expressions across the entire AST, validating them as we go:

// crates/ast/src/validation.rs

use crate::{Root, Stmt};

pub fn validate(root: Root) {
    for stmt in root.stmts() {
        match stmt {
            Stmt::VariableDef(variable_def) => {
                if let Some(e) = variable_def.value() {
                    validate_expr(e)
                }
            }
            Stmt::Expr(e) => validate_expr(e),
        }
    }
}

Let’s define validate_expr:

use crate::{Expr, Root, Stmt};

// snip

fn validate_expr(expr: Expr) {
    match expr {
        Expr::BinaryExpr(binary_expr) => {
            if let Some(e) = binary_expr.lhs() {
                validate_expr(e);
            }

            if let Some(e) = binary_expr.rhs() {
                validate_expr(e);
            }
        }
        Expr::Literal(literal) => {
            ???
        }
        Expr::ParenExpr(paren_expr) => {
            if let Some(e) = paren_expr.expr() {
                validate_expr(e);
            }
        }
        Expr::UnaryExpr(unary_expr) => {
            if let Some(e) = unary_expr.expr() {
                validate_expr(e);
            }
        }
        Expr::VariableRef(_) => {}
    }
}

How can we actually check if a number literal is greater than u64::MAX? We can’t parse the number and then check, because we would then run into the bug we’re trying to fix! Instead, we’ll report an error when parsing the number fails, since we know the only reason parsing can fail is if the number literal is too large:

fn validate_expr(expr: Expr) {
    match expr {
        // snip
        Expr::Literal(literal) => {
            if literal.parse().is_none() {
                // report error
            }
        }
        // snip
    }
}

This forces us to make that other change from earlier:

// lib.rs

impl Literal {
    pub fn parse(&self) -> Option<u64> {
        self.0.first_token().unwrap().text().parse().ok()
    }
}

Let’s update the HIR to store an Option<u64> so that analysis can continue in spite of the invalid number literal:

// crates/hir/src/lib.rs

#[derive(Debug, PartialEq)]
pub enum Expr {
    // snip
    Literal {
        /// is `None` if the number is too big to fit in a u64
        n: Option<u64>,
    },
    // snip
}

Unfortunately, we have a number of errors from the lowering tests depending on n being a u64. The errors are straightforward and just involve wrapping numbers in Some(), so I won’t show how to solve them here. Take a look at the diff on GitHub if you get stuck.

We need a new type to represent validation errors:

// crates/ast/src/validation.rs

use crate::{Expr, Root, Stmt};
use text_size::TextRange;

#[derive(Debug, PartialEq)]
pub struct ValidationError {
    message: String,
    range: TextRange,
}
# Cargo.toml

[dependencies]
syntax = {path = "../syntax"}
text-size = "1.1.0"

Let’s store a Vec of these, pushing to it as we go:

// validation.rs

pub fn validate(root: Root) -> Vec<ValidationError> {
    let mut errors = Vec::new();

    for stmt in root.stmts() {
        match stmt {
            Stmt::VariableDef(variable_def) => {
                if let Some(e) = variable_def.value() {
                    validate_expr(e, &mut errors);
                }
            }
            Stmt::Expr(e) => validate_expr(e, &mut errors),
        }
    }

    errors
}

fn validate_expr(expr: Expr, errors: &mut Vec<ValidationError>) {
    match expr {
        Expr::BinaryExpr(binary_expr) => {
            if let Some(e) = binary_expr.lhs() {
                validate_expr(e, errors);
            }

            if let Some(e) = binary_expr.rhs() {
                validate_expr(e, errors);
            }
        }
        Expr::Literal(literal) => {
            if literal.parse().is_none() {
                errors.push(ValidationError {
                    message: format!(
                        "number literal is larger than an integer’s maximum value, {}",
                        u64::MAX,
                    ),
                    range: literal.0.first_token().unwrap().text_range(),
                });
            }
        }
        Expr::ParenExpr(paren_expr) => {
            if let Some(e) = paren_expr.expr() {
                validate_expr(e, errors);
            }
        }
        Expr::UnaryExpr(unary_expr) => {
            if let Some(e) = unary_expr.expr() {
                validate_expr(e, errors);
            }
        }
        // snip
    }
}

It took a lot of code to walk the entire AST for number literals to validate. It would be easier if we could just search the AST recursively from the top. SyntaxNode has a descendants method that does exactly what we want to do – all we need is a way to determine whether a given node is a Literal and, if it is, validate it.

// lib.rs

impl Literal {
    pub fn cast(node: SyntaxNode) -> Option<Self> {
        if node.kind() == SyntaxKind::Literal {
            Some(Self(node))
        } else {
            None
        }
    }

    // snip
}
// validation.rs

use crate::Literal;
use syntax::SyntaxNode;
use text_size::TextRange;

// snip

pub fn validate(node: &SyntaxNode) -> Vec<ValidationError> {
    let mut errors = Vec::new();

    for node in node.descendants() {
        if let Some(literal) = Literal::cast(node) {
            validate_literal(literal, &mut errors)
        }
    }

    errors
}

fn validate_literal(literal: Literal, errors: &mut Vec<ValidationError>) {
    if literal.parse().is_none() {
        errors.push(ValidationError {
            message: format!(
                "number literal is larger than an integer’s maximum value, {}",
                u64::MAX,
            ),
            range: literal.0.first_token().unwrap().text_range(),
        });
    }
}

So much simpler! Let’s write some tests to make sure this is working as expected:1

# Cargo.toml

[dependencies]
syntax = {path = "../syntax"}
text-size = "1.1.0"

[dev-dependencies]
parser = {path = "../parser"}
#[cfg(test)]
mod tests {
    use super::*;
    use std::ops::Range as StdRange;

    fn check(input: &str, expected_errors: &[(&str, StdRange<u32>)]) {
        let parse = parser::parse(input);

        let expected_errors: Vec<_> = expected_errors
            .iter()
            .map(|(message, range)| ValidationError {
                message: message.to_string(),
                range: {
                    let start = range.start.into();
                    let end = range.end.into();
                    TextRange::new(start, end)
                },
            })
            .collect();

        assert_eq!(validate(&parse.syntax()), expected_errors);
    }

    #[test]
    fn validate_ok_literal() {
        check("123", &[]);
    }

    #[test]
    fn validate_too_large_literal() {
        check(
            "99999999999999999999",
            &[(
                "number literal is larger than an integer’s maximum value, 18446744073709551615",
                (0..20),
            )],
        );
    }
}
$ cargo t -q -p ast --lib

running 2 tests
..
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

The tests have revealed a problem with our design, though – if we ever decide to change the wording of an error message, we have to update all the tests, too. I think that we shouldn’t test the specific text of the error message, but just that an error that signifies that a number literal is too large has been reported. We can represent this in code using an enum with a Display implementation:

use crate::Literal;
use std::fmt;
use syntax::SyntaxNode;
use text_size::TextRange;

#[derive(Debug, PartialEq)]
pub struct ValidationError {
    kind: ValidationErrorKind,
    range: TextRange,
}

#[derive(Debug, Clone, Copy, PartialEq)]
enum ValidationErrorKind {
    NumberLiteralTooLarge,
}

impl fmt::Display for ValidationErrorKind {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Self::NumberLiteralTooLarge => write!(
                f,
                "number literal is larger than an integer’s maximum value, {}",
                u64::MAX,
            ),
        }
    }
}

// snip

fn validate_literal(literal: Literal, errors: &mut Vec<ValidationError>) {
    if literal.parse().is_none() {
        errors.push(ValidationError {
            kind: ValidationErrorKind::NumberLiteralTooLarge,
            range: literal.0.first_token().unwrap().text_range(),
        });
    }
}

Let’s update the tests:

#[cfg(test)]
mod tests {
    // snip

    fn check(input: &str, expected_errors: &[(ValidationErrorKind, StdRange<u32>)]) {
        // snip

        let expected_errors: Vec<_> = expected_errors
            .iter()
            .map(|(kind, range)| ValidationError {
                kind: *kind,
                range: {
                    // snip
                },
            })
            .collect();

        // snip
    }

    // snip

    #[test]
    fn validate_too_large_literal() {
        check(
            "99999999999999999999",
            &[(ValidationErrorKind::NumberLiteralTooLarge, (0..20))],
        );
    }
}

For the error type we’ve created to be useful, we need to be able to display it to the user. Let’s implement Display for ValidationError:

impl fmt::Display for ValidationError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(
            f,
            "error at {}..{}: {}",
            u32::from(self.range.start()),
            u32::from(self.range.end()),
            self.kind,
        )
    }
}

We can try this out in the REPL:

// crates/eldiro/src/main.rs

fn main() -> io::Result<()> {
    // snip

    loop {
        // snip

        let parse = parse(&input);
        println!("{}", parse.debug_tree());

        let syntax = parse.syntax();

        for error in ast::validation::validate(&syntax) {
            println!("{}", error);
        }

        let root = ast::Root::cast(syntax).unwrap();

        // snip
    }
}
$ cargo r -q
→ 123
Root@0..4
  Literal@0..4
    Number@0..3 "123"
    Whitespace@3..4 "\n"
# snip
→ 999999999999999999999999999
Root@0..28
  Literal@0..28
    Number@0..27 "999999999999999999999 ..."
    Whitespace@27..28 "\n"
error at 0..27: number literal is larger than an integer’s maximum value, 18446744073709551615
# snip

Continuing with fuzzing

I’d argue that most bugs in programming language implementations are in edge-cases that differ slightly from real programs. Currently, the fuzzer is just throwing random input at the parser, hoping to find a bug. We can be more strategic by giving the fuzzer an example to work with – the fuzzer will modify and mutate this example randomly, rather than starting from total randomness.

Let’s clear out the existing corpus – inputs the fuzzer has tried and will work from in future:

$ rm fuzz/corpus/parse_lower/*

We’ll add a single example file to the corpus:

# fuzz/corpus/parse_lower/example

let foo = 10
let bar = 20
let a = foo + (bar * bar - (100 / 33))

a * (a - 10) # the result

This includes every bit of syntax we currently support. Let’s run the fuzzer to see what it finds:

$ cd fuzz
$ cargo fuzz run parse_lower

I let this run for a few minutes and didn’t find anything (I know you’re meant to let fuzzers run for several hours so they can exhaust every possible codepath, but I can’t be bothered), so I decided it was time to add AST validation to the fuzz target.

// fuzz/fuzz_targets/parse_lower.rs

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        let parse = parser::parse(s);
        let syntax = parse.syntax();
        let _validation_errors = ast::validation::validate(&syntax);
        let root = ast::Root::cast(syntax).unwrap();
        let (_database, _stmts) = hir::lower(root);
    }
});

parse_lower isn’t a good name for this fuzz target anymore, as it covers more than just parsing and lowering – let’s rename it to main, because that seems like a good name for the only fuzz target.

$ mv fuzz/fuzz_targets/{parse_lower.rs,main.rs}
$ mkdir fuzz/corpus/main
$ mv fuzz/corpus/{parse_lower,main}/example
$ rm -r fuzz/corpus/parse_lower
# Cargo.toml

[[bin]]
name = "main"
path = "fuzz_targets/main.rs"

We can now run our fuzz target with cd fuzz && cargo fuzz run main. Again, this didn’t find anything within the first few minutes for me.

Conclusion

Thank you for reading! In the next part, we’ll implement string literals.


  1. I really should have done this at the start. This is a nice pattern – you use a fuzzer to find bugs in your code, you write some unit tests to catch the bugs the fuzzer found, and then write code to make the bugs go away. ↩︎