Make A Language

Part Two: Whitespace Support

9 September 2020

Last time, we made a parser for simple unnested mathematical expressions, such as 1+1 or 3*4. In this post, we’ll add support for whitespace (so that users of Eldiro will be able to use 2 + 2 instead of 2+2).

We can achieve this by creating an extract_whitespace function similar to extract_digits:

// utils.rs

// Let’s copy-paste from extract_digits
pub(crate) fn extract_whitespace(s: &str) -> (&str, &str) {
    let whitespace_end = s
        .char_indices()
        .find_map(|(idx, c)| if c == ' ' { None } else { Some(idx) })
        .unwrap_or_else(|| s.len());

    let whitespace = &s[..whitespace_end];
    let remainder = &s[whitespace_end..];
    (remainder, whitespace)
}

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn extract_spaces() {
        assert_eq!(extract_whitespace("    1"), ("1", "    "));
    }
}

Although this does indeed work, it involves quite a bit of repetition. What if we want a similar function for, say, extracting variable names? Instead of copy-pasting extract_digits yet again, we can define a take_while function with the following signature:

fn take_while(accept: impl Fn(char) -> bool, s: &str) -> (&str, &str)

take_while takes a function called accept that determines whether to accept a given character or not. Once it encounters a character that is not accepted, it stops consuming characters and returns.

pub(crate) fn take_while(accept: impl Fn(char) -> bool, s: &str) -> (&str, &str) {
    let extracted_end = s
        .char_indices()
        .find_map(|(idx, c)| if accept(c) { None } else { Some(idx) })
        .unwrap_or_else(|| s.len());

    let extracted = &s[..extracted_end];
    let remainder = &s[extracted_end..];
    (remainder, extracted)
}

We can now redefine extract_digits and extract_whitespace in terms of take_while:

pub(crate) fn extract_digits(s: &str) -> (&str, &str) {
    take_while(|c| c.is_ascii_digit(), s)
}

pub(crate) fn extract_whitespace(s: &str) -> (&str, &str) {
    take_while(|c| c == ' ', s)
}

And our tests still pass!

To actually allow whitespace inside expressions, we can call extract_whitespace while throwing away the output:

// lib.rs

impl Expr {
    pub fn new(s: &str) -> (&str, Self) {
        let (s, lhs) = Number::new(s);
        let (s, _) = utils::extract_whitespace(s);

        let (s, op) = Op::new(s);
        let (s, _) = utils::extract_whitespace(s);

        let (s, rhs) = Number::new(s);

        (s, Self { lhs, rhs, op })
    }
}

#[cfg(test)]
mod tests {
    // snip

    #[test]
    fn parse_expr_with_whitespace() {
        assert_eq!(
            Expr::new("2 * 2"),
            (
                "",
                Expr {
                    lhs: Number(2),
                    rhs: Number(2),
                    op: Op::Mul,
                },
            ),
        );
    }
}

Note that we aren’t using let _ = utils::extract_whitespace(s), as that would throw away the s that we get back, thereby not progressing through the input. Instead, by using let (s, _) = ... we keep advancing, but throw away the actual whitespace we extracted.

In the next post, Eldiro will gain the ability to create variables. This involves both parsing, as well as the beginnings of an interpreter.