Part Seventeen: Crates

I’m not a fan of how, currently, all the submodules of crate::parser can access the internals of our parser. This access makes sense for marker.rs, but not for, say, expr.rs. These other modules should have to go through the parser’s API. We can enforce these sorts of visibility issues by breaking our project up into crates.

Extracting the lexer

The first part of Eldiro we’ll create a separate crate for is the lexer.

$ cargo new --lib crates/lexer
     Created library `crates/lexer` package
$ git mv -f crates/{eldiro/src/lexer.rs,lexer/src/lib.rs}

We also need to transfer the Logos dependency from eldiro’s Cargo.toml to lexer’s:

# crates/eldiro/Cargo.toml

[dependencies]
drop_bomb = "0.1.5"
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.10.0"
# crates/lexer/Cargo.toml

[dependencies]
logos = "0.11.4"

crates/eldiro/src/lib.rs is still referring to the lexer module; let’s remove it:

pub mod parser;

mod syntax;

Since naming should show that the lexer is separate from the syntax tree, rename lexer::SyntaxKind to TokenKind. We should remove the node types from TokenKind for the same reason:

// crates/lexer/src/lib.rs

use logos::Logos;

// snip

#[derive(Debug, Copy, Clone, PartialEq, Logos)]
pub(crate) enum TokenKind {
    #[regex("[ \n]+")]
    Whitespace,

    #[token("fn")]
    FnKw,

    #[token("let")]
    LetKw,

    #[regex("[A-Za-z][A-Za-z0-9]*")]
    Ident,

    #[regex("[0-9]+")]
    Number,

    #[token("+")]
    Plus,

    #[token("-")]
    Minus,

    #[token("*")]
    Star,

    #[token("/")]
    Slash,

    #[token("=")]
    Equals,

    #[token("(")]
    LParen,

    #[token(")")]
    RParen,

    #[token("{")]
    LBrace,

    #[token("}")]
    RBrace,

    #[regex("#.*")]
    Comment,

    #[error]
    Error,
}

Delete TokenKind::is_trivia, since the lexer should know nothing about parsing (which trivia certainly is a part of). To allow other code to use our lexer, replace all occurrences of pub(crate) with pub in crates/lexer/src/lib.rs.

Let’s move TokenKind-related code into a separate module, as lib.rs is starting to get cluttered:

mod token_kind;
pub use token_kind::TokenKind;
// crates/lexer/src/token_kind.rs

use logos::Logos;

#[derive(Debug, Copy, Clone, PartialEq, Logos)]
pub enum TokenKind {
    // snip
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::{Lexer, Token};

    // snip
}

Now that that’s done we should make eldiro depend on lexer:

# crates/eldiro/Cargo.toml

[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.10.0"

Run a project-wide search and replace from crate::lexer to lexer; this will replace all references to the now-deleted lexer module with ones to the new lexer crate.

Let’s define SyntaxKind:

// syntax.rs

use lexer::TokenKind;
use num_derive::{FromPrimitive, ToPrimitive};
use num_traits::{FromPrimitive, ToPrimitive};

#[derive(Debug, Copy, Clone, PartialEq, FromPrimitive, ToPrimitive)]
pub(crate) enum SyntaxKind {
    Whitespace,
    FnKw,
    LetKw,
    Ident,
    Number,
    Plus,
    Minus,
    Star,
    Slash,
    Equals,
    LParen,
    RParen,
    LBrace,
    RBrace,
    Comment,
    Error,
    Root,
    InfixExpr,
    Literal,
    ParenExpr,
    PrefixExpr,
    VariableRef,
}

impl SyntaxKind {
    pub(crate) fn is_trivia(self) -> bool {
        matches!(self, Self::Whitespace | Self::Comment)
    }
}

impl From<TokenKind> for SyntaxKind {
    fn from(token_kind: TokenKind) -> Self {
        match token_kind {
            TokenKind::Whitespace => Self::Whitespace,
            TokenKind::FnKw => Self::FnKw,
            TokenKind::LetKw => Self::LetKw,
            TokenKind::Ident => Self::Ident,
            TokenKind::Number => Self::Number,
            TokenKind::Plus => Self::Plus,
            TokenKind::Minus => Self::Minus,
            TokenKind::Star => Self::Star,
            TokenKind::Slash => Self::Slash,
            TokenKind::Equals => Self::Equals,
            TokenKind::LParen => Self::LParen,
            TokenKind::RParen => Self::RParen,
            TokenKind::LBrace => Self::LBrace,
            TokenKind::RBrace => Self::RBrace,
            TokenKind::Comment => Self::Comment,
            TokenKind::Error => Self::Error,
        }
    }
}

Note how we define a conversion from lexer’s TokenKind to SyntaxKind. The next step is to import SyntaxKind from crate::syntax instead of lexer, making sure to add any conversions where necessary:

// parser.rs

use crate::syntax::{SyntaxKind, SyntaxNode};
use event::Event;
use expr::expr;
use lexer::{Lexer, Token};
use marker::Marker;
use rowan::GreenNode;
use sink::Sink;
use source::Source;
// event.rs

use crate::syntax::SyntaxKind;

#[derive(Debug, PartialEq)]
pub(super) enum Event {
    // snip
}
// expr.rs

use super::marker::CompletedMarker;
use super::Parser;
use crate::syntax::SyntaxKind;
// marker.rs

use super::event::Event;
use super::Parser;
use crate::syntax::SyntaxKind;
use drop_bomb::DropBomb;
// sink.rs

use super::event::Event;
use crate::syntax::{EldiroLanguage, SyntaxKind};
use lexer::Token;
use rowan::{GreenNode, GreenNodeBuilder, Language};
use std::mem;

// snip

impl<'t, 'input> Sink<'t, 'input> {
    // snip

    fn eat_trivia(&mut self) {
        while let Some(token) = self.tokens.get(self.cursor) {
            if !SyntaxKind::from(token.kind).is_trivia() {
                break;
            }

            self.token();
        }
    }

    fn token(&mut self) {
        let Token { kind, text } = self.tokens[self.cursor];

        self.builder
            .token(EldiroLanguage::kind_to_raw(kind.into()), text.into());

        self.cursor += 1;
    }
}
// source.rs

use crate::syntax::SyntaxKind;
use lexer::Token;

// snip

impl<'t, 'input> Source<'t, 'input> {
    // snip

    fn peek_kind_raw(&self) -> Option<SyntaxKind> {
        self.tokens
            .get(self.cursor)
            .map(|Token { kind, .. }| (*kind).into())
    }
}

Running our tests shows how they’re split across crates:

$ cargo t -q --lib

running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Extracting syntax

Next up: the syntax module. Let’s again generate a new crate and copy over the module:

$ cargo new --lib crates/syntax
     Created library `crates/syntax` package
$ git mv -f crates/{eldiro/src/syntax.rs,syntax/src/lib.rs}

Let’s add all its dependencies:

# crates/syntax/Cargo.toml

[dependencies]
lexer = {path = "../lexer"}
num-derive = "0.3.3"
num-traits = "0.2.14"
rowan = "0.10.0"

We can now remove num-derive and num-traits from eldiro’s Cargo.toml, while also adding syntax:

# crates/eldiro/Cargo.toml

[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
rowan = "0.10.0"
syntax = {path = "../syntax"}

Search and replace pub(crate) with pub in crates/syntax/src/lib.rs to make all its items visible to the outside world.

We also need to remove the relevant mod declaration from crates/eldiro/src/lib.rs, which now looks like this:

pub mod parser;

Yet again, run a project-wide search and replace from crate::syntax to syntax.

$ cargo t -q --lib

running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Extracting the parser

This is the last crate we’ll extract, so let’s get on with it:

$ cargo new --lib crates/parser
     Created library `crates/parser` package
$ git mv -f crates/{eldiro/src/parser.rs,parser/src/lib.rs}
$ git mv -f crates/{eldiro/src/parser/*,parser/src}
$ rmdir crates/eldiro/src/parser
# crates/parser/Cargo.toml

[dependencies]
drop_bomb = "0.1.5"
lexer = {path = "../lexer"}
rowan = "0.10.0"
syntax = {path = "../syntax"}

[dev-dependencies]
expect-test = "1.0.1"

Run a project-wide search and replace from pub(super) to pub(crate), since all the occurrences of pub(super) are on items in top-level modules of the parser crate. As such, using pub(super) on these items is equivalent to pub(crate), with pub(crate) being the more idiomatic choice.

To prevent all code in the parser crate from being able to see inside of Parser, we’ll move it into its own module.

// lib.rs

mod event;
mod expr;
mod marker;
mod parser;
mod sink;
mod source;

use lexer::Lexer;
use parser::Parser;
use rowan::GreenNode;
use sink::Sink;
use syntax::SyntaxNode;

pub fn parse(input: &str) -> Parse {
    // snip
}

pub struct Parse {
    green_node: GreenNode,
}

impl Parse {
    // snip
}

#[cfg(test)]
fn check(input: &str, expected_tree: expect_test::Expect) {
    let parse = parse(input);
    expected_tree.assert_eq(&parse.debug_tree());
}
// crates/parser/src/parser.rs

use crate::event::Event;
use crate::expr::expr;
use crate::marker::Marker;
use crate::source::Source;
use lexer::Token;
use syntax::SyntaxKind;

pub(crate) struct Parser<'t, 'input> {
    source: Source<'t, 'input>,
    events: Vec<Event>,
}

impl<'t, 'input> Parser<'t, 'input> {
    pub(crate) fn new(tokens: &'t [Token<'input>]) -> Self {
        // snip
    }

    pub(crate) fn parse(mut self) -> Vec<Event> {
        // snip
    }

    pub(crate) fn start(&mut self) -> Marker {
        // snip
    }

    pub(crate) fn bump(&mut self) {
        // snip
    }

    pub(crate) fn at(&mut self, kind: SyntaxKind) -> bool {
        // snip
    }

    pub(crate) fn peek(&mut self) -> Option<SyntaxKind> {
        // snip
    }
}

#[cfg(test)]
mod tests {
    use crate::check;
    use expect_test::expect;

    // snip
}

Now Marker and CompletedMarker can’t see inside of Parser, even though they need to be able to. Let’s move the marker module into parser:

$ mkdir crates/parser/src/parser
$ git mv crates/parser/src/{marker.rs,parser}
// lib.rs

mod event;
mod expr;
mod parser;
mod sink;
mod source;
// parser.rs

pub(crate) mod marker;

use crate::event::Event;
use crate::expr::expr;
use crate::source::Source;
use lexer::Token;
use marker::Marker;
use syntax::SyntaxKind;

We need to update the imports in marker:

// marker.rs

use super::Parser;
use crate::event::Event;
use drop_bomb::DropBomb;
use syntax::SyntaxKind;

There are some errors in expr.rs about an unresolved import that we can fix:

// expr.rs

use crate::parser::marker::CompletedMarker;
use crate::parser::Parser;
use syntax::SyntaxKind;

Let’s modify Parser::new so that it doesn’t have to know how to create a Source:

// parser.rs

pub(crate) mod marker;

use crate::event::Event;
use crate::expr::expr;
use crate::source::Source;
use marker::Marker;
use syntax::SyntaxKind;

// snip

impl<'t, 'input> Parser<'t, 'input> {
    pub(crate) fn new(source: Source<'t, 'input>) -> Self {
        Self {
            source,
            events: Vec::new(),
        }
    }

    // snip
}

This obligates us to update the main parse function:

// lib.rs

mod event;
mod expr;
mod parser;
mod sink;
mod source;

use lexer::Lexer;
use parser::Parser;
use rowan::GreenNode;
use sink::Sink;
use source::Source;
use syntax::SyntaxNode;

pub fn parse(input: &str) -> Parse {
    let tokens: Vec<_> = Lexer::new(input).collect();
    let source = Source::new(&tokens);
    let parser = Parser::new(source);
    let events = parser.parse();
    let sink = Sink::new(&tokens, events);

    Parse {
        green_node: sink.finish(),
    }
}

Next, we’ll create a new module, grammar, and move expr into it:

// lib.rs

mod event;
mod grammar;
mod parser;
mod sink;
mod source;
// crates/parser/src/grammar.rs

mod expr;
pub(crate) use expr::expr;
$ mkdir crates/parser/src/grammar
$ git mv crates/parser/src/{expr.rs,grammar}

Let’s fix the imports in expr.rs:

#[cfg(test)]
mod tests {
    use crate::check;
    use expect_test::expect;

    // snip
}

As we add more and more features to Eldiro, we’ll end up with more and more submodules in grammar. It would be annoying to have the same three lines of imports in each one, so we can move them into grammar:

// grammar.rs

mod expr;
pub(crate) use expr::expr;

use crate::parser::marker::CompletedMarker;
use crate::parser::Parser;
use syntax::SyntaxKind;

And import them from expr.rs:

use super::*;

Let’s fix the import of expr in parser.rs:

pub(crate) mod marker;

use crate::event::Event;
use crate::grammar;
use crate::source::Source;
use marker::Marker;
use syntax::SyntaxKind;

// snip

impl<'t, 'input> Parser<'t, 'input> {
    // snip

    pub(crate) fn parse(mut self) -> Vec<Event> {
        let m = self.start();
        grammar::expr(&mut self);
        m.complete(&mut self, SyntaxKind::Root);

        self.events
    }

    // snip
}

It might be nice to extract that little dance we do with SyntaxKind::Root in Parser::parse into its own function:

// grammar.rs

mod expr;

use crate::parser::marker::CompletedMarker;
use crate::parser::Parser;
use syntax::SyntaxKind;

pub(crate) fn root(p: &mut Parser) -> CompletedMarker {
    let m = p.start();
    expr::expr(p);

    m.complete(p, SyntaxKind::Root)
}

Now grammar::expr::expr doesn’t need to be pub(crate):

// expr.rs

pub(super) fn expr(p: &mut Parser) {
    expr_binding_power(p, 0);
}

Let’s use that root function from grammar in parser:

// parser.rs

impl<'t, 'input> Parser<'t, 'input> {
    // snip

    pub(crate) fn parse(mut self) -> Vec<Event> {
        grammar::root(&mut self);
        self.events
    }

    // snip
}

Finally, let’s remove the parser module declaration from crates/eldiro/src/lib.rs:

// It’s empty!

To make everything compile we can re-export parser::parse from eldiro:

pub mod parser {
    pub use parser::parse;
}
# crates/eldiro/Cargo.toml

[dependencies]
parser = {path = "../parser"}
$ cargo t -q --lib

running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

In reality, though, there isn’t a need anymore for the eldiro crate. Let’s delete it:

$ rm -r crates/eldiro

And update eldiro-cli:

# crates/eldiro-cli/Cargo.toml

[dependencies]
parser = {path = "../parser"}
// crates/eldiro-cli/src/main.rs

use parser::parse;
use std::io::{self, Write};

Now that the eldiro crate doesn’t exist anymore, we can rename eldiro-cli to eldiro:

$ git mv crates/{eldiro-cli,eldiro}
# crates/eldiro/Cargo.toml

[package]
name = "eldiro"
$ cargo t -q --lib

running 18 tests
..................
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out


running 17 tests
.................
test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

A random note

Rowan parsers tend to return a CompletedMarker from subparsers, as that gives the parsers that call them the opportunity to call precede. Let’s follow this convention and change the only subparsers that don’t return a CompletedMarker, expr and expr_binding_power:

// expr.rs

pub(super) fn expr(p: &mut Parser) -> Option<CompletedMarker> {
    expr_binding_power(p, 0)
}

fn expr_binding_power(p: &mut Parser, minimum_binding_power: u8) -> Option<CompletedMarker> {
    let mut lhs = lhs(p)?; // we’ll handle errors later.

    loop {
        let op = match p.peek() {
            // snip
            _ => return None, // we’ll handle errors later.
        };

        // snip

        if left_binding_power < minimum_binding_power {
            break;
        }

        // snip
    }

    Some(lhs)
}

Conclusion

As a reward for our efforts I’ve prepared a diagram of the dependency graph of the eldiro crate:

$ cargo install cargo-depgraph
$ # install Graphviz however you like
$ cargo depgraph --all-deps | dot -Tsvg > graph.svg

Dependency graph of eldiro crate

Green indicates a build dependency, meaning a dependency that is needed to build, but doesn’t contribute any code directly. As you can see, the two direct build dependencies that our crates have both provide procedural macros: logos-derive and num-derive.

Blue indicates a development dependency, of which we only have one: expect-test.

In the next part we’ll cover an interesting topic that Rowan is particularly suited to: error recovery, resilience and reporting.