synkit

A toolkit for building round-trip parsers with logos.

What is synkit?

synkit generates syn-like parsing infrastructure from token definitions. Define your tokens once, and get:

Token enum with logos lexer integration
Token structs (EqToken, IdentToken, etc.) with typed values
TokenStream with whitespace skipping, fork/rewind, and span tracking
Parse/Peek/ToTokens traits for building parsers
Printer for round-trip code formatting

When to use synkit

Use Case	synkit	Alternative
Custom DSL with formatting	✅	-
Config file parser	✅	serde + format-specific crate
Code transformation tool	✅	-
Rust source parsing	❌	syn
Simple regex matching	❌	logos alone

synkit is ideal when you need:

Round-trip fidelity: Parse → modify → print without losing formatting
Span tracking: Precise error locations and source mapping
Type-safe AST: Strongly-typed nodes with Spanned<T> wrappers

Architecture

flowchart LR
    Source["Source<br/>String"] --> TokenStream["TokenStream<br/>(lexer)"]
    TokenStream --> AST["AST<br/>(parser)"]
    TokenStream --> Span["Span<br/>tracking"]
    AST --> Printer["Printer<br/>(output)"]

Quick Example

use synkit::parser_kit;

synkit::parser_kit! {
    error: MyError,
    skip_tokens: [Space],
    tokens: {
        #[token(" ")]
        Space,

        #[token("=")]
        Eq,

        #[regex(r"[a-z]+", |lex| lex.slice().to_string())]
        Ident(String),
    },
    delimiters: {},
    span_derives: [Debug, Clone, PartialEq],
    token_derives: [Debug, Clone, PartialEq],
}

This generates:

Token enum with Eq, Ident(String) variants
EqToken, IdentToken structs
TokenStream with lex(), parse(), peek()
Tok![=], Tok![ident] macros
Parse, Peek, ToTokens, Diagnostic traits

Getting Started

Installation

Add synkit and logos to your Cargo.toml:

[dependencies]
synkit = "0.1"
logos = "0.15"
thiserror = "2"  # recommended for error types

Optional Features

# For async streaming with tokio
synkit = { version = "0.1", features = ["tokio"] }

# For async streaming with futures (runtime-agnostic)
synkit = { version = "0.1", features = ["futures"] }

# For std::error::Error implementations
synkit = { version = "0.1", features = ["std"] }

Minimal Example

A complete parser in ~30 lines:

use thiserror::Error;

#[derive(Error, Debug, Clone, Default, PartialEq)]
pub enum LexError {
    #[default]
    #[error("unknown token")]
    Unknown,
    #[error("expected {expect}, found {found}")]
    Expected { expect: &'static str, found: String },
    #[error("expected {expect}")]
    Empty { expect: &'static str },
}

synkit::parser_kit! {
    error: LexError,
    skip_tokens: [Space],
    tokens: {
        #[token(" ")]
        Space,

        #[token("=")]
        Eq,

        #[regex(r"[a-z]+", |lex| lex.slice().to_string())]
        #[fmt("identifier")]
        Ident(String),

        #[regex(r"[0-9]+", |lex| lex.slice().parse().ok())]
        #[fmt("number")]
        Number(i64),
    },
    delimiters: {},
    span_derives: [Debug, Clone, PartialEq],
    token_derives: [Debug, Clone, PartialEq],
}

Using the Generated Code

After parser_kit!, you have access to:

use crate::{
    // Span types
    Span, Spanned,
    // Token enum and structs
    tokens::{Token, EqToken, IdentToken, NumberToken},
    // Parsing infrastructure
    stream::TokenStream,
    // Traits
    Parse, Peek, ToTokens, Diagnostic,
};

// Lex source into tokens
let mut stream = TokenStream::lex("x = 42")?;

// Parse tokens
let name: Spanned<IdentToken> = stream.parse()?;
let eq: Spanned<EqToken> = stream.parse()?;
let value: Spanned<NumberToken> = stream.parse()?;

assert_eq!(*name.value, "x");
assert_eq!(value.value.0, 42);

Generated Modules

parser_kit! generates these modules in your crate:

Module	Contents
`span`	`Span`, `RawSpan`, `Spanned<T>`
`tokens`	`Token` enum, `*Token` structs, `Tok!`/`SpannedTok!` macros
`stream`	`TokenStream`, `MutTokenStream`
`printer`	`Printer` implementation
`delimiters`	Delimiter structs (e.g., `Bracket`, `Brace`)
`traits`	`Parse`, `Peek`, `ToTokens`, `Diagnostic`

Error Type Requirements

Your error type must:

Implement Default (for unknown token errors from logos)
Have variants for parse errors (recommended pattern):

#[derive(Error, Debug, Clone, Default, PartialEq)]
pub enum MyError {
    #[default]
    #[error("unknown")]
    Unknown,

    #[error("expected {expect}, found {found}")]
    Expected { expect: &'static str, found: String },

    #[error("expected {expect}")]
    Empty { expect: &'static str },
}

Next Steps

Concepts - Understand tokens, parsing, spans
Tutorial - Build a complete TOML parser
Reference - Full macro documentation

Concepts Overview

This section covers the core concepts in synkit:

Tokens - Token enum, token structs, and the Tok! macro
Parsing - Parse and Peek traits, stream operations
Spans & Errors - Source locations, Spanned<T>, error handling
Printing - ToTokens trait and round-trip formatting

Core Flow

Source → Lexer → TokenStream → Parse → AST → ToTokens → Output

Lexer (logos): Converts source string to token sequence
TokenStream: Wraps tokens with span tracking and skip logic
Parse: Trait for converting tokens to AST nodes
AST: Your domain-specific tree structure
ToTokens: Trait for converting AST back to formatted output

Tokens

synkit generates two representations for each token: an enum variant and a struct.

Token Enum

The Token enum contains all token variants, used by the lexer:

#[derive(Logos, Debug, Clone, PartialEq)]
pub enum Token {
    #[token("=")]
    Eq,

    #[regex(r"[a-z]+", |lex| lex.slice().to_string())]
    Ident(String),

    #[regex(r"[0-9]+", |lex| lex.slice().parse().ok())]
    Number(i64),
}

Token Structs

For each variant, synkit generates a corresponding struct:

// Unit token (no value)
pub struct EqToken;

impl EqToken {
    pub fn new() -> Self { Self }
    pub fn token(&self) -> Token { Token::Eq }
}

// Token with value
pub struct IdentToken(pub String);

impl IdentToken {
    pub fn new(value: String) -> Self { Self(value) }
    pub fn token(&self) -> Token { Token::Ident(self.0.clone()) }
}

impl std::ops::Deref for IdentToken {
    type Target = String;
    fn deref(&self) -> &Self::Target { &self.0 }
}

Token Attributes

`#[token(...)]` and `#[regex(...)]`

Standard logos attributes for matching:

#[token("=")]           // Exact match
#[regex(r"[a-z]+")]     // Regex pattern
#[regex(r"[0-9]+", |lex| lex.slice().parse().ok())]  // With callback

`#[fmt(...)]`

Custom display name for error messages:

#[regex(r"[a-z]+", |lex| lex.slice().to_string())]
#[fmt("identifier")]  // Error: "expected identifier, found ..."
Ident(String),

Without #[fmt], uses the variant name in snake_case.

`#[derive(...)]` on tokens

Additional derives for a specific token struct:

#[regex(r"[A-Za-z_]+", |lex| lex.slice().to_string())]
#[derive(Hash, Eq)]  // Only for IdentToken
Ident(String),

`priority`

Logos priority for overlapping patterns:

#[token("true", priority = 2)]  // Higher priority than bare keys
True,

#[regex(r"[A-Za-z]+", priority = 1)]
BareKey(String),

The `Tok!` Macro

Access token types by their pattern:

// Punctuation - use the literal
Tok![=]     // → EqToken
Tok![.]     // → DotToken
Tok![,]     // → CommaToken

// Keywords - use the keyword
Tok![true]  // → TrueToken
Tok![false] // → FalseToken

// Regex tokens - use snake_case name
Tok![ident]   // → IdentToken
Tok![number]  // → NumberToken

`SpannedTok!`

Shorthand for Spanned<Tok![...]>:

SpannedTok![=]      // → Spanned<EqToken>
SpannedTok![ident]  // → Spanned<IdentToken>

Auto-generated Trait Implementations

Each token struct automatically implements:

Trait	Purpose
`Parse`	Parse from TokenStream
`Peek`	Check if token matches without consuming
`Diagnostic`	Format name for error messages
`Display`	Human-readable output

Parsing

Parsing converts a token stream into an AST using the Parse and Peek traits.

The `Parse` Trait

pub trait Parse: Sized {
    fn parse(stream: &mut TokenStream) -> Result<Self, Error>;
}

Token structs implement Parse automatically. For AST nodes, implement manually:

impl Parse for KeyValue {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        Ok(Self {
            key: stream.parse()?,
            eq: stream.parse()?,
            value: stream.parse()?,
        })
    }
}

The `Peek` Trait

Check the next token without consuming:

pub trait Peek {
    fn is(token: &Token) -> bool;
    fn peek(stream: &TokenStream) -> bool;
}

Use in conditionals and loops:

// Check before parsing
if SimpleKey::peek(stream) {
    let key: Spanned<SimpleKey> = stream.parse()?;
}

// Parse while condition holds
while Value::peek(stream) {
    items.push(stream.parse()?);
}

TokenStream Operations

Basic Operations

// Create from source
let mut stream = TokenStream::lex("x = 42")?;

// Parse with type inference
let token: Spanned<IdentToken> = stream.parse()?;

// Peek at next token
if stream.peek::<EqToken>() {
    // ...
}

// Get next raw token (including skipped)
let raw = stream.next_raw();

Fork and Rewind

Speculatively parse without committing:

let fork = stream.fork();
if let Ok(result) = try_parse(&mut fork) {
    stream.advance_to(&fork);  // Commit
    return Ok(result);
}
// Didn't advance - stream unchanged

Whitespace Handling

skip_tokens in parser_kit! defines tokens to skip:

skip_tokens: [Space, Tab],

stream.next() - Skips whitespace
stream.next_raw() - Includes whitespace
stream.peek_token() - Skips whitespace
stream.peek_token_raw() - Includes whitespace

Parsing Patterns

Sequential Fields

impl Parse for Assignment {
    fn parse(stream: &mut TokenStream) -> Result<Self, Error> {
        Ok(Self {
            name: stream.parse()?,   // Spanned<IdentToken>
            eq: stream.parse()?,     // Spanned<EqToken>
            value: stream.parse()?,  // Spanned<Value>
        })
    }
}

Enum Variants

Use peek to determine variant:

impl Parse for Value {
    fn parse(stream: &mut TokenStream) -> Result<Self, Error> {
        if stream.peek::<IntegerToken>() {
            Ok(Value::Integer(stream.parse()?))
        } else if stream.peek::<StringToken>() {
            Ok(Value::String(stream.parse()?))
        } else {
            Err(Error::expected("value"))
        }
    }
}

Optional Fields

// Option<T> auto-implements Parse via Peek
let comma: Option<Spanned<CommaToken>> = stream.parse()?;

Repeated Items

// Manual loop
let mut items = Vec::new();
while Value::peek(stream) {
    items.push(stream.parse()?);
}

// Using synkit::Repeated
use synkit::Repeated;
let items: Repeated<Value, CommaToken, Spanned<Value>> =
    Repeated::parse(stream)?;

Delimited Content

Extract content between delimiters:

// Using the bracket! macro
let mut inner;
let bracket = bracket!(inner in stream);

// inner is a new TokenStream with bracket contents
let items = parse_items(&mut inner)?;

Spans & Errors

Spans track source locations for error reporting and source mapping.

Span Types

`RawSpan`

Byte offsets into source:

pub struct RawSpan {
    pub start: usize,
    pub end: usize,
}

`Span`

Handles both known and synthetic locations:

pub enum Span {
    CallSite,           // No source location (generated code)
    Known(RawSpan),     // Actual source position
}

`Spanned<T>`

Wraps a value with its source span:

pub struct Spanned<T> {
    pub value: T,
    pub span: Span,
}

Always use Spanned<T> for AST node fields:

pub struct KeyValue {
    pub key: Spanned<Key>,        // ✓
    pub eq: Spanned<EqToken>,     // ✓
    pub value: Spanned<Value>,    // ✓
}

This enables:

Precise error locations
Source mapping for transformations
Hover information in editors

Error Handling

Error Type Pattern

#[derive(Error, Debug, Clone, Default, PartialEq)]
pub enum MyError {
    #[default]
    #[error("unknown token")]
    Unknown,

    #[error("expected {expect}, found {found}")]
    Expected { expect: &'static str, found: String },

    #[error("expected {expect}")]
    Empty { expect: &'static str },

    #[error("{source}")]
    Spanned {
        #[source]
        source: Box<MyError>,
        span: Span,
    },
}

SpannedError Trait

Attach spans to errors:

impl synkit::SpannedError for MyError {
    type Span = Span;

    fn with_span(self, span: Span) -> Self {
        Self::Spanned {
            source: Box::new(self),
            span,
        }
    }

    fn span(&self) -> Option<&Span> {
        match self {
            Self::Spanned { span, .. } => Some(span),
            _ => None,
        }
    }
}

Diagnostic Trait

Provide display names for error messages:

pub trait Diagnostic {
    fn fmt() -> &'static str;
}

// Auto-implemented for tokens using #[fmt(...)] or snake_case name
impl Diagnostic for IdentToken {
    fn fmt() -> &'static str { "identifier" }
}

Error Helpers

impl MyError {
    pub fn expected<D: Diagnostic>(found: &Token) -> Self {
        Self::Expected {
            expect: D::fmt(),
            found: format!("{}", found),
        }
    }

    pub fn empty<D: Diagnostic>() -> Self {
        Self::Empty { expect: D::fmt() }
    }
}

Error Propagation

Parse implementations automatically wrap errors with spans:

impl Parse for KeyValue {
    fn parse(stream: &mut TokenStream) -> Result<Self, MyError> {
        Ok(Self {
            // If parse fails, error includes span of failed token
            key: stream.parse()?,
            eq: stream.parse()?,
            value: stream.parse()?,
        })
    }
}

Accessing Spans

let kv: Spanned<KeyValue> = stream.parse()?;

// Get span of entire key-value
let full_span = &kv.span;

// Get span of just the key
let key_span = &kv.value.key.span;

// Get span of the value
let value_span = &kv.value.value.span;

Printing

The ToTokens trait enables round-trip formatting: parse source, modify AST, print back.

The `ToTokens` Trait

pub trait ToTokens {
    fn write(&self, printer: &mut Printer);
}

Implement for each AST node:

impl ToTokens for KeyValue {
    fn write(&self, p: &mut Printer) {
        self.key.value.write(p);
        p.space();
        self.eq.value.write(p);
        p.space();
        self.value.value.write(p);
    }
}

Printer Methods

Basic Output

p.word("text");         // Append literal text
p.token(&tok);          // Append token's string form
p.space();              // Single space
p.newline();            // Line break

Indentation

p.open_block();         // Increase indent, add newline
p.close_block();        // Decrease indent, add newline
p.indent();             // Just increase indent level
p.dedent();             // Just decrease indent level

Separators

// Write items with separator
p.write_separated(&items, ", ");

// Write with custom logic
for (i, item) in items.iter().enumerate() {
    if i > 0 { p.word(", "); }
    item.write(p);
}

Converting to String

// Using the trait method
let output = kv.to_string_formatted();

// Manual printer usage
let mut printer = Printer::new();
kv.write(&mut printer);
let output = printer.finish();

Round-trip Example

// Parse
let mut stream = TokenStream::lex("key = 42")?;
let kv: Spanned<KeyValue> = stream.parse()?;

// Modify
let mut modified = kv.value.clone();
modified.value = Spanned {
    value: Value::Integer(IntegerToken::new(100)),
    span: Span::CallSite,
};

// Print
let output = modified.to_string_formatted();
assert_eq!(output, "key = 100");

Implementation Patterns

Token Structs

Token structs need explicit ToTokens:

impl ToTokens for EqToken {
    fn write(&self, p: &mut Printer) {
        p.token(&self.token());
    }
}

impl ToTokens for BasicStringToken {
    fn write(&self, p: &mut Printer) {
        // Re-add quotes stripped during lexing
        p.word("\"");
        p.word(&self.0);
        p.word("\"");
    }
}

Enum Variants

impl ToTokens for Value {
    fn write(&self, p: &mut Printer) {
        match self {
            Value::String(s) => s.write(p),
            Value::Integer(n) => n.write(p),
            Value::True(t) => t.write(p),
            Value::False(f) => f.write(p),
            Value::Array(a) => a.write(p),
            Value::InlineTable(t) => t.write(p),
        }
    }
}

Collections

impl ToTokens for Array {
    fn write(&self, p: &mut Printer) {
        self.lbracket.value.write(p);
        for (i, item) in self.items.iter().enumerate() {
            if i > 0 { p.word(", "); }
            item.value.value.write(p);
        }
        self.rbracket.value.write(p);
    }
}

Preserving Trivia

For exact round-trip, preserve comments and whitespace:

impl ToTokens for Trivia {
    fn write(&self, p: &mut Printer) {
        match self {
            Trivia::Newline(_) => p.newline(),
            Trivia::Comment(c) => {
                p.token(&c.value.token());
            }
        }
    }
}

Async Streaming

synkit supports incremental, asynchronous parsing for scenarios where data arrives in chunks:

Network streams (HTTP, WebSocket, TCP)
Large file processing
Real-time log parsing
Interactive editors

Architecture

┌─────────────┐     chunks      ┌──────────────────┐
│   Source    │ ──────────────► │ IncrementalLexer │
│ (network,   │                 │   (tokenizer)    │
│  file, etc) │                 └────────┬─────────┘
└─────────────┘                          │
                                   tokens│
                                         ▼
                                ┌────────────────┐
                                │ IncrementalParse│
                                │   (parser)      │
                                └────────┬───────┘
                                         │
                                  AST    │
                                  nodes  ▼
                                ┌────────────────┐
                                │   Consumer     │
                                └────────────────┘

Key Traits

IncrementalLexer

Lex source text incrementally as chunks arrive:

pub trait IncrementalLexer: Sized {
    type Token: Clone;
    type Span: Clone;
    type Spanned: Clone;
    type Error: Display;

    fn new() -> Self;
    fn feed(&mut self, chunk: &str) -> Result<Vec<Self::Spanned>, Self::Error>;
    fn finish(self) -> Result<Vec<Self::Spanned>, Self::Error>;
    fn offset(&self) -> usize;
}

IncrementalParse

Parse AST nodes incrementally from token buffers:

pub trait IncrementalParse: Sized {
    type Token: Clone;
    type Error: Display;

    fn parse_incremental<S>(
        tokens: &[S],
        checkpoint: &ParseCheckpoint,
    ) -> Result<(Option<Self>, ParseCheckpoint), Self::Error>
    where
        S: AsRef<Self::Token>;

    fn can_parse<S>(tokens: &[S], checkpoint: &ParseCheckpoint) -> bool
    where
        S: AsRef<Self::Token>;
}

ParseCheckpoint

Track parser state across incremental calls:

pub struct ParseCheckpoint {
    pub cursor: usize,         // Position in token buffer
    pub tokens_consumed: usize, // Total tokens processed
    pub state: u64,            // Parser-specific state
}

Feature Flags

Enable async streaming with feature flags:

# Tokio-based (channels, spawn)
synkit = { version = "0.1", features = ["tokio"] }

# Futures-based (runtime-agnostic Stream trait)
synkit = { version = "0.1", features = ["futures"] }

Tokio Integration

With the tokio feature, use channel-based streaming:

use synkit::async_stream::tokio_impl::{AsyncTokenStream, AstStream};
use tokio::sync::mpsc;

async fn parse_stream<L, T>(mut source_rx: mpsc::Receiver<String>)
where
    L: IncrementalLexer,
    T: IncrementalParse<Token = L::Token>,
{
    let (token_tx, token_rx) = mpsc::channel(32);
    let (ast_tx, mut ast_rx) = mpsc::channel(16);

    // Lexer task
    tokio::spawn(async move {
        let mut lexer = AsyncTokenStream::<L>::new(token_tx);
        while let Some(chunk) = source_rx.recv().await {
            lexer.feed(&chunk).await?;
        }
        lexer.finish().await?;
    });

    // Parser task
    tokio::spawn(async move {
        let mut parser = AstStream::<T, L::Token>::new(token_rx, ast_tx);
        parser.run().await?;
    });

    // Consume AST nodes
    while let Some(node) = ast_rx.recv().await {
        process(node);
    }
}

Futures Integration

With the futures feature, use the Stream trait:

use synkit::async_stream::futures_impl::ParseStream;
use futures::StreamExt;

async fn parse_tokens<S, T>(tokens: S)
where
    S: Stream<Item = Token>,
    T: IncrementalParse<Token = Token>,
{
    let mut parse_stream: ParseStream<_, T, _> = ParseStream::new(tokens);

    while let Some(result) = parse_stream.next().await {
        match result {
            Ok(node) => process(node),
            Err(e) => handle_error(e),
        }
    }
}

Error Handling

The StreamError enum covers streaming-specific failures:

pub enum StreamError {
    ChannelClosed,           // Channel unexpectedly closed
    LexError(String),        // Lexer error
    ParseError(String),      // Parser error
    IncompleteInput,         // EOF with incomplete input
}

Configuration

Customize buffer sizes and limits:

let config = StreamConfig {
    token_buffer_size: 1024,   // Token buffer capacity
    ast_buffer_size: 64,       // AST node buffer capacity
    max_chunk_size: 64 * 1024, // Max input chunk size
};

let stream = AsyncTokenStream::with_config(tx, config);

Best Practices

Return None when incomplete: If parse_incremental can’t complete a node, return Ok((None, checkpoint)) rather than an error.
Implement can_parse: This optimization prevents unnecessary parse attempts when tokens are clearly insufficient.
Use checkpoints for backtracking: Store parser state in checkpoint.state for complex grammars.
Handle IncompleteInput: At stream end, incomplete input may be valid (e.g., truncated file) or an error depending on your grammar.
Buffer management: The AstStream automatically compacts its buffer. For custom implementations, drain consumed tokens periodically.

Tutorial: TOML Parser

Build a complete TOML parser with round-trip printing using synkit.

Source Code

📦 Complete source: examples/toml-parser

What You’ll Build

A parser for a TOML subset supporting:

# Comment
key = "value"
number = 42
flag = true

[section]
nested = "data"

[section.subsection]
array = [1, 2, 3]
inline = { a = 1, b = 2 }

Source Code

The complete example lives in examples/toml-parser/. Each chapter references the actual code.

Chapters

Project Setup - Dependencies, error type, parser_kit! invocation
Defining Tokens - Token patterns and attributes
AST Design - Node types with Spanned<T>
Parse Implementations - Converting tokens to AST
Round-trip Printing - ToTokens for output
Visitors - Traversing the AST
Testing - Parse and round-trip tests

Project Setup

Create the Project

cargo new toml-parser --lib
cd toml-parser

Dependencies

[package]
name = "toml-parser"
version = "0.1.0"
edition = "2024"

[dependencies]
synkit = "0.1"
thiserror = "2"
logos = "0.15"

Error Type

Define an error type that implements Default (required by logos):

#[derive(Error, Debug, Clone, Default, PartialEq)]
pub enum TomlError {
    #[default]
    #[error("unknown lexing error")]
    Unknown,

    #[error("expected {expect}, found {found}")]
    Expected { expect: &'static str, found: String },

    #[error("expected {expect}, found EOF")]
    Empty { expect: &'static str },

    #[error("unclosed string")]
    UnclosedString,

    #[error("{source}")]
    Spanned {
        #[source]
        source: Box<TomlError>,
        span: Span,
    },
}

Key requirements:

#[default] variant for unknown tokens
Expected variant with expect and found fields
Empty variant for EOF errors
Spanned variant wrapping errors with location

parser_kit! Invocation

The macro generates all parsing infrastructure:

synkit::parser_kit! {
    error: TomlError,

    skip_tokens: [Space, Tab],

    tokens: {
        // Whitespace
        #[token(" ", priority = 0)]
        Space,

        #[token("\t", priority = 0)]
        Tab,

        #[regex(r"\r?\n")]
        #[fmt("newline")]
        #[no_to_tokens]
        Newline,

        // Comments
        #[regex(r"#[^\n]*", allow_greedy = true)]
        #[fmt("comment")]
        Comment,

        // Punctuation
        #[token("=")]
        Eq,

        #[token(".")]
        Dot,

        #[token(",")]
        Comma,

        #[token("[")]
        LBracket,

        #[token("]")]
        RBracket,

        #[token("{")]
        LBrace,

        #[token("}")]
        RBrace,

        // Keywords/literals
        #[token("true")]
        True,

        #[token("false")]
        False,

        // Bare keys: alphanumeric, underscores, dashes
        #[regex(r"[A-Za-z0-9_-]+", |lex| lex.slice().to_string(), priority = 1)]
        #[fmt("bare key")]
        #[derive(PartialOrd, Ord, Hash, Eq)]
        BareKey(String),

        // Basic strings (double-quoted) - needs custom ToTokens for quote handling
        #[regex(r#""([^"\\]|\\.)*""#, |lex| {
            let s = lex.slice();
            // Remove surrounding quotes
            s[1..s.len()-1].to_string()
        })]
        #[fmt("string")]
        #[no_to_tokens]
        BasicString(String),

        // Integers
        #[regex(r"-?[0-9]+", |lex| lex.slice().parse::<i64>().ok())]
        #[fmt("integer")]
        Integer(i64),
    },

    delimiters: {
        Bracket => (LBracket, RBracket),
        Brace => (LBrace, RBrace),
    },

    span_derives: [Debug, Clone, PartialEq, Eq, Hash, Copy],
    token_derives: [Clone, PartialEq, Debug],
}

This generates:

span module with Span, Spanned<T>
tokens module with Token enum and *Token structs
stream module with TokenStream
traits module with Parse, Peek, ToTokens
delimiters module with Bracket, Brace

Error Helpers

Add convenience methods for error creation:

impl TomlError {
    pub fn expected<D: Diagnostic>(found: &Token) -> Self {
        Self::Expected {
            expect: D::fmt(),
            found: format!("{}", found),
        }
    }

    pub fn empty<D: Diagnostic>() -> Self {
        Self::Empty { expect: D::fmt() }
    }
}

impl synkit::SpannedError for TomlError {
    type Span = Span;

    fn with_span(self, span: Span) -> Self {
        Self::Spanned {
            source: Box::new(self),
            span,
        }
    }

    fn span(&self) -> Option<&Span> {
        match self {
            Self::Spanned { span, .. } => Some(span),
            _ => None,
        }
    }
}

Module Structure

// lib.rs
mod ast;
mod parse;
mod print;
mod visitor;

pub use ast::*;
pub use parse::*;
pub use visitor::*;

Verify Setup

cargo check

The macro should expand without errors. If you see errors about missing traits, ensure your error type has the required variants.

Defining Tokens

The tokens: block in parser_kit! defines your grammar’s lexical elements.

Token Categories

Whitespace Tokens

Skipped during parsing but tracked for round-trip:

// Skipped automatically
#[token(" ", priority = 0)]
Space,

#[token("\t", priority = 0)]
Tab,

// Not skipped - we track these for formatting
#[regex(r"\r?\n")]
#[fmt("newline")]
Newline,

Use skip_tokens: [Space, Tab] to mark tokens to skip.

Punctuation

Simple exact-match tokens:

#[token("=")]
Eq,

#[token(".")]
Dot,

#[token(",")]
Comma,

#[token("[")]
LBracket,

#[token("]")]
RBracket,

#[token("{")]
LBrace,

#[token("}")]
RBrace,

Keywords

Keywords need higher priority than identifiers:

#[token("true")]
True,

#[token("false")]
False,

Value Tokens

Tokens with captured data use callbacks:

// Bare keys: alphanumeric, underscores, dashes
#[regex(r"[A-Za-z0-9_-]+", |lex| lex.slice().to_string(), priority = 1)]
#[fmt("bare key")]
#[derive(PartialOrd, Ord, Hash, Eq)]
BareKey(String),

// Basic strings (double-quoted)
#[regex(r#""([^"\\]|\\.)*""#, |lex| {
    let s = lex.slice();
    s[1..s.len()-1].to_string()  // Strip quotes
})]
#[fmt("string")]
BasicString(String),

// Integers
#[regex(r"-?[0-9]+", |lex| lex.slice().parse::<i64>().ok())]
#[fmt("integer")]
Integer(i64),

Comments

Track but don’t interpret:

#[regex(r"#[^\n]*")]
#[fmt("comment")]
Comment,

Generated Types

For each token, synkit generates:

Token	Struct	Macro
`Eq`	`EqToken`	`Tok![=]`
`Dot`	`DotToken`	`Tok![.]`
`BareKey(String)`	`BareKeyToken(String)`	`Tok![bare_key]`
`BasicString(String)`	`BasicStringToken(String)`	`Tok![basic_string]`
`Integer(i64)`	`IntegerToken(i64)`	`Tok![integer]`

Delimiters

Define delimiter pairs for extraction:

delimiters: {
    Bracket => (LBracket, RBracket),
    Brace => (LBrace, RBrace),
},

Generates Bracket and Brace structs with span information, plus bracket! and brace! macros.

Priority Handling

When patterns overlap, use priority:

#[token("true", priority = 2)]   // Higher wins
True,

#[regex(r"[A-Za-z]+", priority = 1)]
BareKey(String),

Input "true" matches True, not BareKey("true").

Derives

Control derives at different levels:

// For all tokens
token_derives: [Clone, PartialEq, Debug],

// For specific token
#[derive(Hash, Eq)]  // Additional derives for BareKeyToken only
BareKey(String),

// For span types
span_derives: [Debug, Clone, PartialEq, Eq, Hash],

AST Design

Design AST nodes that preserve all information for round-trip formatting.

Design Principles

Use Spanned<T> for all children - Enables error locations and source mapping
Include punctuation tokens - Needed for exact round-trip
Track trivia - Comments and newlines for formatting

Document Structure

/// The root of a TOML document.
/// Contains a sequence of items (key-value pairs or tables).
#[derive(Debug, Clone)]
pub struct Document {
    pub items: Vec<DocumentItem>,
}

/// A single item in the document: either a top-level key-value or a table section.
#[derive(Debug, Clone)]
pub enum DocumentItem {
    /// A blank line or comment
    Trivia(Trivia),
    /// A key = value pair at the top level
    KeyValue(Spanned<KeyValue>),
    /// A [table] section
    Table(Spanned<Table>),
}

Document is the root containing all items
DocumentItem distinguishes top-level elements
Trivia captures non-semantic content

Keys

/// A TOML key, which can be bare, quoted, or dotted.
#[derive(Debug, Clone)]
pub enum Key {
    /// Bare key: `foo`
    Bare(tokens::BareKeyToken),
    /// Quoted key: `"foo.bar"`
    Quoted(tokens::BasicStringToken),
    /// Dotted key: `foo.bar.baz`
    Dotted(DottedKey),
}

/// A dotted key like `server.host.name`
#[derive(Debug, Clone)]
pub struct DottedKey {
    pub first: Spanned<SimpleKey>,
    pub rest: Vec<(Spanned<tokens::DotToken>, Spanned<SimpleKey>)>,
}

/// A simple (non-dotted) key
#[derive(Debug, Clone)]
pub enum SimpleKey {
    Bare(tokens::BareKeyToken),
    Quoted(tokens::BasicStringToken),
}

Key enum handles all key forms
DottedKey preserves dot tokens for round-trip
SimpleKey is the base case (bare or quoted)

Values

/// A TOML value.
#[derive(Debug, Clone)]
pub enum Value {
    /// String value
    String(tokens::BasicStringToken),
    /// Integer value
    Integer(tokens::IntegerToken),
    /// Boolean true
    True(tokens::TrueToken),
    /// Boolean false
    False(tokens::FalseToken),
    /// Array value
    Array(Array),
    /// Inline table value
    InlineTable(InlineTable),
}

Each variant stores its token type directly, preserving the original representation.

Key-Value Pairs

/// A key-value pair: `key = value`
#[derive(Debug, Clone)]
pub struct KeyValue {
    pub key: Spanned<Key>,
    pub eq: Spanned<tokens::EqToken>,
    pub value: Spanned<Value>,
}

Note how eq stores the equals token—this enables formatting choices like key=value vs key = value.

Tables

/// A table section: `[section]` or `[section.subsection]`
#[derive(Debug, Clone)]
pub struct Table {
    pub lbracket: Spanned<tokens::LBracketToken>,
    pub name: Spanned<Key>,
    pub rbracket: Spanned<tokens::RBracketToken>,
    pub items: Vec<TableItem>,
}

/// An item within a table section.
#[derive(Debug, Clone)]
pub enum TableItem {
    Trivia(Trivia),
    KeyValue(Box<Spanned<KeyValue>>),
}

Brackets stored explicitly for round-trip
Items include trivia for blank lines/comments within table

Arrays

/// An array: `[1, 2, 3]`
#[derive(Debug, Clone)]
pub struct Array {
    pub lbracket: Spanned<tokens::LBracketToken>,
    pub items: Vec<ArrayItem>,
    pub rbracket: Spanned<tokens::RBracketToken>,
}

/// An item in an array, including trailing trivia.
#[derive(Debug, Clone)]
pub struct ArrayItem {
    pub value: Spanned<Value>,
    pub comma: Option<Spanned<tokens::CommaToken>>,
}

ArrayItem includes optional trailing comma—essential for preserving:

[1, 2, 3]     # No trailing comma
[1, 2, 3,]    # With trailing comma

Inline Tables

/// An inline table: `{ key = value, ... }`
#[derive(Debug, Clone)]
pub struct InlineTable {
    pub lbrace: Spanned<tokens::LBraceToken>,
    pub items: Vec<InlineTableItem>,
    pub rbrace: Spanned<tokens::RBraceToken>,
}

/// An item in an inline table.
#[derive(Debug, Clone)]
pub struct InlineTableItem {
    pub kv: Spanned<KeyValue>,
    pub comma: Option<Spanned<tokens::CommaToken>>,
}

Similar structure to arrays, with key-value pairs instead of values.

Why This Design?

Span Preservation

Every Spanned<T> carries source location:

let kv: Spanned<KeyValue> = stream.parse()?;
let key_span = &kv.value.key.span;  // Location of key
let eq_span = &kv.value.eq.span;    // Location of '='
let val_span = &kv.value.value.span; // Location of value

Round-trip Fidelity

Storing tokens enables exact reconstruction:

// Original: key = "value"
// After parse → print:
//   key = "value"  (identical)

Trivia Handling

Without trivia tracking:

# Comment lost!
key = value

With trivia in AST:

# Comment preserved
key = value

Parse Implementations

Convert token streams into AST nodes.

Basic Pattern

impl Parse for MyNode {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        Ok(Self {
            field1: stream.parse()?,
            field2: stream.parse()?,
        })
    }
}

Implementing Peek

For types used in conditionals:

impl Peek for SimpleKey {
    fn is(token: &Token) -> bool {
        matches!(token, Token::BareKey(_) | Token::BasicString(_))
    }
}

impl Parse for SimpleKey {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        match stream.peek_token().map(|t| &t.value) {
            Some(Token::BareKey(_)) => {
                let tok: Spanned<tokens::BareKeyToken> = stream.parse()?;
                Ok(SimpleKey::Bare(tok.value))
            }
            Some(Token::BasicString(_)) => {
                let tok: Spanned<tokens::BasicStringToken> = stream.parse()?;
                Ok(SimpleKey::Quoted(tok.value))
            }
            Some(other) => Err(TomlError::Expected {
                expect: "key",
                found: format!("{}", other),
            }),
            None => Err(TomlError::Empty { expect: "key" }),
        }
    }
}

Peek::is() checks a token variant; Peek::peek() checks the stream’s next token.

Parsing Keys

impl Peek for Key {
    fn is(token: &Token) -> bool {
        SimpleKey::is(token)
    }
}

impl Parse for Key {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        let first: Spanned<SimpleKey> = stream.parse()?;

        // Check if this is a dotted key
        if stream.peek::<tokens::DotToken>() {
            let mut rest = Vec::new();
            while stream.peek::<tokens::DotToken>() {
                let dot: Spanned<tokens::DotToken> = stream.parse()?;
                let key: Spanned<SimpleKey> = stream.parse()?;
                rest.push((dot, key));
            }
            Ok(Key::Dotted(DottedKey { first, rest }))
        } else {
            // Single key
            match first.value {
                SimpleKey::Bare(tok) => Ok(Key::Bare(tok)),
                SimpleKey::Quoted(tok) => Ok(Key::Quoted(tok)),
            }
        }
    }
}

Parsing Values

Match on peeked token to determine variant:

impl Peek for Value {
    fn is(token: &Token) -> bool {
        matches!(
            token,
            Token::BasicString(_)
                | Token::Integer(_)
                | Token::True
                | Token::False
                | Token::LBracket
                | Token::LBrace
        )
    }
}

impl Parse for Value {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        match stream.peek_token().map(|t| &t.value) {
            Some(Token::BasicString(_)) => {
                let tok: Spanned<tokens::BasicStringToken> = stream.parse()?;
                Ok(Value::String(tok.value))
            }
            Some(Token::Integer(_)) => {
                let tok: Spanned<tokens::IntegerToken> = stream.parse()?;
                Ok(Value::Integer(tok.value))
            }
            Some(Token::True) => {
                let tok: Spanned<tokens::TrueToken> = stream.parse()?;
                Ok(Value::True(tok.value))
            }
            Some(Token::False) => {
                let tok: Spanned<tokens::FalseToken> = stream.parse()?;
                Ok(Value::False(tok.value))
            }
            Some(Token::LBracket) => {
                let arr = Array::parse(stream)?;
                Ok(Value::Array(arr))
            }
            Some(Token::LBrace) => {
                let tbl = InlineTable::parse(stream)?;
                Ok(Value::InlineTable(tbl))
            }
            Some(other) => Err(TomlError::Expected {
                expect: "value",
                found: format!("{}", other),
            }),
            None => Err(TomlError::Empty { expect: "value" }),
        }
    }
}

Arrays with Delimiters

Use the bracket! macro to extract delimited content:

impl Peek for Array {
    fn is(token: &Token) -> bool {
        matches!(token, Token::LBracket)
    }
}

impl Parse for Array {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        let lbracket: Spanned<tokens::LBracketToken> = stream.parse()?;

        let mut items = Vec::new();

        // Skip any leading newlines inside array
        while peek_newline(stream) {
            let _: Spanned<tokens::NewlineToken> = stream.parse()?;
        }

        // Parse array items
        while stream.peek::<Value>() {
            let value: Spanned<Value> = stream.parse()?;

            // Skip newlines after value
            while peek_newline(stream) {
                let _: Spanned<tokens::NewlineToken> = stream.parse()?;
            }

            let comma = if stream.peek::<tokens::CommaToken>() {
                let c: Spanned<tokens::CommaToken> = stream.parse()?;
                // Skip newlines after comma
                while peek_newline(stream) {
                    let _: Spanned<tokens::NewlineToken> = stream.parse()?;
                }
                Some(c)
            } else {
                None
            };

            items.push(ArrayItem { value, comma });
        }

        let rbracket: Spanned<tokens::RBracketToken> = stream.parse()?;

        Ok(Array {
            lbracket,
            items,
            rbracket,
        })
    }
}

Key points:

bracket!(inner in stream) extracts content between [ and ]
Returns a Bracket struct with span information
inner is a new TokenStream containing only bracket contents

Inline Tables

Similar pattern with brace!:

impl Peek for InlineTable {
    fn is(token: &Token) -> bool {
        matches!(token, Token::LBrace)
    }
}

impl Parse for InlineTable {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        let lbrace: Spanned<tokens::LBraceToken> = stream.parse()?;

        let mut items = Vec::new();

        // Parse inline table items
        while stream.peek::<Key>() {
            let kv: Spanned<KeyValue> = stream.parse()?;

            let comma = if stream.peek::<tokens::CommaToken>() {
                Some(stream.parse()?)
            } else {
                None
            };

            items.push(InlineTableItem { kv, comma });
        }

        let rbrace: Spanned<tokens::RBraceToken> = stream.parse()?;

        Ok(InlineTable {
            lbrace,
            items,
            rbrace,
        })
    }
}

Tables and Documents

impl Peek for Table {
    fn is(token: &Token) -> bool {
        matches!(token, Token::LBracket)
    }
}

impl Parse for Table {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        let lbracket: Spanned<tokens::LBracketToken> = stream.parse()?;
        let name: Spanned<Key> = stream.parse()?;
        let rbracket: Spanned<tokens::RBracketToken> = stream.parse()?;

        let mut items = Vec::new();

        // Consume trailing content on the header line
        while Trivia::peek(stream) {
            let trivia = Trivia::parse(stream)?;
            items.push(TableItem::Trivia(trivia));
            // Stop after we hit a newline
            if matches!(items.last(), Some(TableItem::Trivia(Trivia::Newline(_)))) {
                break;
            }
        }

        // Parse table contents until we hit another table or EOF
        loop {
            // Check for trivia (newlines, comments)
            if Trivia::peek(stream) {
                let trivia = Trivia::parse(stream)?;
                items.push(TableItem::Trivia(trivia));
                continue;
            }

            // Check for key-value pair
            if stream.peek::<Key>() {
                // But first make sure this isn't a table header by checking for `[`
                // This is tricky - we need to distinguish `[table]` from key-value
                // Since Key::peek checks for bare keys and strings, and table headers
                // start with `[`, we need to check `[` first in the document parser
                let kv = Box::new(stream.parse()?);
                items.push(TableItem::KeyValue(kv));
                continue;
            }

            // Either EOF or another table section
            break;
        }

        Ok(Table {
            lbracket,
            name,
            rbracket,
            items,
        })
    }
}

impl Peek for DocumentItem {
    fn is(token: &Token) -> bool {
        Trivia::is(token) || Key::is(token) || matches!(token, Token::LBracket)
    }
}

impl Parse for Document {
    fn parse(stream: &mut TokenStream) -> Result<Self, TomlError> {
        let mut items = Vec::new();

        loop {
            // Check for trivia (newlines, comments)
            if Trivia::peek(stream) {
                let trivia = Trivia::parse(stream)?;
                items.push(DocumentItem::Trivia(trivia));
                continue;
            }

            // Check for table header `[name]`
            if stream.peek::<tokens::LBracketToken>() {
                let table: Spanned<Table> = stream.parse()?;
                items.push(DocumentItem::Table(table));
                continue;
            }

            // Check for key-value pair
            if stream.peek::<Key>() {
                let kv: Spanned<KeyValue> = stream.parse()?;
                items.push(DocumentItem::KeyValue(kv));
                continue;
            }

            // EOF or unknown token
            if stream.is_empty() {
                break;
            }

            // Unknown token - error
            if let Some(tok) = stream.peek_token() {
                return Err(TomlError::Expected {
                    expect: "key, table, or end of file",
                    found: format!("{}", tok.value),
                });
            }
            break;
        }

        Ok(Document { items })
    }
}

Error Handling

Expected Token Errors

Some(other) => Err(TomlError::Expected {
    expect: "key",
    found: format!("{}", other),
}),

EOF Errors

None => Err(TomlError::Empty { expect: "key" }),

Using Diagnostic

// Auto-generated for tokens
impl Diagnostic for BareKeyToken {
    fn fmt() -> &'static str { "bare key" }  // From #[fmt("bare key")]
}

// Use in errors
Err(TomlError::expected::<BareKeyToken>(found_token))

Parsing Tips

Use `peek` Before Consuming

if SimpleKey::peek(stream) {
    // Safe to parse
    let key: Spanned<SimpleKey> = stream.parse()?;
}

Fork for Lookahead

let mut fork = stream.fork();
if try_parse(&mut fork).is_ok() {
    stream.advance_to(&fork);
}

Handle Optional Elements

// Option<T> auto-implements Parse if T implements Peek
let comma: Option<Spanned<CommaToken>> = stream.parse()?;

Raw Token Access

For tokens in skip_tokens (like Newline):

// Use peek_token_raw to see skipped tokens
fn peek_raw(stream: &TokenStream) -> Option<&Token> {
    stream.peek_token_raw().map(|t| &t.value)
}

Round-trip Printing

Implement ToTokens to convert AST back to formatted output.

Basic Pattern

impl ToTokens for MyNode {
    fn write(&self, p: &mut Printer) {
        self.child1.value.write(p);
        p.space();
        self.child2.value.write(p);
    }
}

Token Printing

// Custom ToTokens implementations for tokens that need special handling.
// Other token ToTokens are auto-generated by parser_kit!

impl ToTokens for tokens::BasicStringToken {
    fn write(&self, p: &mut Printer) {
        // BasicString stores content without quotes, so we add them back for round-trip
        p.word("\"");
        p.word(&self.0);
        p.word("\"");
    }
}

impl ToTokens for tokens::NewlineToken {
    fn write(&self, p: &mut Printer) {
        p.newline();
    }
}

Note: BasicStringToken strips quotes during lexing, so we re-add them for output.

Trivia

Preserve newlines and comments:

impl ToTokens for Trivia {
    fn write(&self, p: &mut Printer) {
        match self {
            Trivia::Newline(nl) => nl.value.write(p),
            Trivia::Comment(c) => c.value.write(p),
        }
    }
}

Key-Value Pairs

impl ToTokens for KeyValue {
    fn write(&self, p: &mut Printer) {
        self.key.value.write(p);
        p.space();
        self.eq.value.write(p);
        p.space();
        self.value.value.write(p);
    }
}

Spacing around = is a style choice—adjust as needed.

Arrays

Handle items with optional trailing commas:

impl ToTokens for ArrayItem {
    fn write(&self, p: &mut Printer) {
        self.value.value.write(p);
        if let Some(comma) = &self.comma {
            comma.value.write(p);
            p.space();
        }
    }
}

impl ToTokens for Array {
    fn write(&self, p: &mut Printer) {
        self.lbracket.value.write(p);
        for item in &self.items {
            item.write(p);
        }
        self.rbracket.value.write(p);
    }
}

Tables

impl ToTokens for TableItem {
    fn write(&self, p: &mut Printer) {
        match self {
            TableItem::Trivia(trivia) => trivia.write(p),
            TableItem::KeyValue(kv) => kv.value.write(p),
        }
    }
}

impl ToTokens for Table {
    fn write(&self, p: &mut Printer) {
        self.lbracket.value.write(p);
        self.name.value.write(p);
        self.rbracket.value.write(p);
        for item in &self.items {
            item.write(p);
        }
    }
}

Documents

impl ToTokens for DocumentItem {
    fn write(&self, p: &mut Printer) {
        match self {
            DocumentItem::Trivia(trivia) => trivia.write(p),
            DocumentItem::KeyValue(kv) => kv.value.write(p),
            DocumentItem::Table(table) => table.value.write(p),
        }
    }
}

impl ToTokens for Document {
    fn write(&self, p: &mut Printer) {
        for item in &self.items {
            item.write(p);
        }
    }
}

Using the Output

// Parse
let mut stream = TokenStream::lex(input)?;
let doc: Spanned<Document> = stream.parse()?;

// Print using trait method
let output = doc.value.to_string_formatted();

// Or manual printer
let mut printer = Printer::new();
doc.value.write(&mut printer);
let output = printer.finish();

Printer Methods Reference

Method	Effect
`word(s)`	Append string
`token(&tok)`	Append token’s display
`space()`	Single space
`newline()`	Line break
`open_block()`	Indent + newline
`close_block()`	Dedent + newline
`indent()`	Increase indent
`dedent()`	Decrease indent
`write_separated(&items, sep)`	Items with separator

Formatting Choices

The ToTokens implementation defines your output format:

// Compact: key=value
self.key.value.write(p);
self.eq.value.write(p);
self.value.value.write(p);

// Spaced: key = value
self.key.value.write(p);
p.space();
self.eq.value.write(p);
p.space();
self.value.value.write(p);

For exact round-trip, store original spacing as trivia. For normalized output, apply consistent rules in write().

Visitors

The visitor pattern traverses AST nodes without modifying them.

Visitor Trait

/// Visitor trait for traversing TOML AST nodes.
///
/// Implement the `visit_*` methods you care about. Default implementations
/// call the corresponding `walk_*` methods to traverse children.
pub trait TomlVisitor {
    fn visit_document(&mut self, doc: &Document) {
        self.walk_document(doc);
    }

    fn visit_document_item(&mut self, item: &DocumentItem) {
        self.walk_document_item(item);
    }

    fn visit_key_value(&mut self, kv: &KeyValue) {
        self.walk_key_value(kv);
    }

    fn visit_key(&mut self, key: &Key) {
        self.walk_key(key);
    }

    fn visit_simple_key(&mut self, key: &SimpleKey) {
        let _ = key; // leaf node
    }

    fn visit_value(&mut self, value: &Value) {
        self.walk_value(value);
    }

    fn visit_table(&mut self, table: &Table) {
        self.walk_table(table);
    }

    fn visit_array(&mut self, array: &Array) {
        self.walk_array(array);
    }

    fn visit_inline_table(&mut self, table: &InlineTable) {
        self.walk_inline_table(table);
    }

    // Walk methods traverse child nodes

    fn walk_document(&mut self, doc: &Document) {
        for item in &doc.items {
            self.visit_document_item(item);
        }
    }

    fn walk_document_item(&mut self, item: &DocumentItem) {
        match item {
            DocumentItem::Trivia(_) => {}
            DocumentItem::KeyValue(kv) => self.visit_key_value(&kv.value),
            DocumentItem::Table(table) => self.visit_table(&table.value),
        }
    }

    fn walk_key_value(&mut self, kv: &KeyValue) {
        self.visit_key(&kv.key.value);
        self.visit_value(&kv.value.value);
    }

    fn walk_key(&mut self, key: &Key) {
        match key {
            Key::Bare(tok) => self.visit_simple_key(&SimpleKey::Bare(tok.clone())),
            Key::Quoted(tok) => self.visit_simple_key(&SimpleKey::Quoted(tok.clone())),
            Key::Dotted(dotted) => {
                self.visit_simple_key(&dotted.first.value);
                for (_, k) in &dotted.rest {
                    self.visit_simple_key(&k.value);
                }
            }
        }
    }

    fn walk_value(&mut self, value: &Value) {
        match value {
            Value::Array(arr) => self.visit_array(arr),
            Value::InlineTable(tbl) => self.visit_inline_table(tbl),
            _ => {}
        }
    }

    fn walk_table(&mut self, table: &Table) {
        self.visit_key(&table.name.value);
        for item in &table.items {
            match item {
                TableItem::Trivia(_) => {}
                TableItem::KeyValue(kv) => self.visit_key_value(&kv.value),
            }
        }
    }

    fn walk_array(&mut self, array: &Array) {
        for item in &array.items {
            self.visit_value(&item.value.value);
        }
    }

    fn walk_inline_table(&mut self, table: &InlineTable) {
        for item in &table.items {
            self.visit_key_value(&item.kv.value);
        }
    }
}

Two method types:

visit_*: Override to handle specific nodes, calls walk_* by default
walk_*: Traverses children, typically not overridden

Example: Collecting Keys

/// Example visitor: collects all keys in the document.
pub struct KeyCollector {
    pub keys: Vec<String>,
}

impl KeyCollector {
    pub fn new() -> Self {
        Self { keys: Vec::new() }
    }

    pub fn collect(doc: &Document) -> Vec<String> {
        let mut collector = Self::new();
        collector.visit_document(doc);
        collector.keys
    }
}

impl Default for KeyCollector {
    fn default() -> Self {
        Self::new()
    }
}

impl TomlVisitor for KeyCollector {
    fn visit_simple_key(&mut self, key: &SimpleKey) {
        let name = match key {
            SimpleKey::Bare(tok) => tok.0.clone(),
            SimpleKey::Quoted(tok) => tok.0.clone(),
        };
        self.keys.push(name);
    }
}

Usage:

let mut collector = KeyCollector::new();
collector.visit_document(&doc.value);
// collector.keys now contains all key names

Example: Counting Values

/// Example visitor: counts values by type.
#[derive(Default, Debug)]
pub struct ValueCounter {
    pub strings: usize,
    pub integers: usize,
    pub booleans: usize,
    pub arrays: usize,
    pub inline_tables: usize,
}

impl ValueCounter {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn count(doc: &Document) -> Self {
        let mut counter = Self::new();
        counter.visit_document(doc);
        counter
    }
}

impl TomlVisitor for ValueCounter {
    fn visit_value(&mut self, value: &Value) {
        match value {
            Value::String(_) => self.strings += 1,
            Value::Integer(_) => self.integers += 1,
            Value::True(_) | Value::False(_) => self.booleans += 1,
            Value::Array(arr) => {
                self.arrays += 1;
                self.visit_array(arr);
            }
            Value::InlineTable(tbl) => {
                self.inline_tables += 1;
                self.visit_inline_table(tbl);
            }
        }
    }
}

Example: Finding Tables

/// Example visitor: finds all table names.
pub struct TableFinder {
    pub tables: Vec<String>,
}

impl TableFinder {
    pub fn new() -> Self {
        Self { tables: Vec::new() }
    }

    pub fn find(doc: &Document) -> Vec<String> {
        let mut finder = Self::new();
        finder.visit_document(doc);
        finder.tables
    }

    fn key_to_string(key: &Key) -> String {
        match key {
            Key::Bare(tok) => tok.0.clone(),
            Key::Quoted(tok) => format!("\"{}\"", tok.0),
            Key::Dotted(dotted) => {
                let mut parts = vec![Self::simple_key_to_string(&dotted.first.value)];
                for (_, k) in &dotted.rest {
                    parts.push(Self::simple_key_to_string(&k.value));
                }
                parts.join(".")
            }
        }
    }

    fn simple_key_to_string(key: &SimpleKey) -> String {
        match key {
            SimpleKey::Bare(tok) => tok.0.clone(),
            SimpleKey::Quoted(tok) => format!("\"{}\"", tok.0),
        }
    }
}

impl Default for TableFinder {
    fn default() -> Self {
        Self::new()
    }
}

impl TomlVisitor for TableFinder {
    fn visit_table(&mut self, table: &Table) {
        self.tables.push(Self::key_to_string(&table.name.value));
        self.walk_table(table);
    }
}

Visitor vs Direct Traversal

Visitor pattern when:

Multiple traversal operations needed
Want to separate traversal from logic
Building analysis tools

Direct recursion when:

One-off transformation
Simple structure
Need mutation

Transforming Visitors

For mutation, use a mutable visitor or return new nodes:

pub trait TomlTransform {
    fn transform_value(&mut self, value: Value) -> Value {
        self.walk_value(value)
    }

    fn walk_value(&mut self, value: Value) -> Value {
        match value {
            Value::Array(arr) => Value::Array(self.transform_array(arr)),
            other => other,
        }
    }
    // ...
}

Visitor Tips

Selective Traversal

Override visit_* to stop descent:

fn visit_inline_table(&mut self, _table: &InlineTable) {
    // Don't call walk_inline_table - skip inline table contents
}

Accumulating Results

Use struct fields:

struct Stats {
    tables: usize,
    keys: usize,
    values: usize,
}

impl TomlVisitor for Stats {
    fn visit_table(&mut self, table: &Table) {
        self.tables += 1;
        self.walk_table(table);
    }
    // ...
}

Context Tracking

Track path during traversal:

struct PathTracker {
    path: Vec<String>,
    paths: Vec<String>,
}

impl TomlVisitor for PathTracker {
    fn visit_table(&mut self, table: &Table) {
        self.path.push(table_name(table));
        self.paths.push(self.path.join("."));
        self.walk_table(table);
        self.path.pop();
    }
}

Testing

Verify parsing correctness and round-trip fidelity.

Parse Tests

Test that parsing produces expected AST:

#[test]
fn test_simple_key_value() {
    let mut stream = TokenStream::lex("key = \"value\"").unwrap();
    let kv: Spanned<KeyValue> = stream.parse().unwrap();

    match &kv.value.key.value {
        Key::Bare(tok) => assert_eq!(&**tok, "key"),
        _ => panic!("expected bare key"),
    }

    match &kv.value.value.value {
        Value::String(tok) => assert_eq!(&**tok, "value"),
        _ => panic!("expected string value"),
    }
}

Round-trip Tests

Verify parse → print produces equivalent output:

fn roundtrip(input: &str) -> String {
    let mut stream = TokenStream::lex(input).unwrap();
    let doc: Spanned<Document> = stream.parse().unwrap();
    doc.value.to_string_formatted()
}

#[test]
fn test_roundtrip_simple() {
    let input = "key = \"value\"";
    assert_eq!(roundtrip(input), input);
}

#[test]
fn test_roundtrip_table() {
    let input = "[section]\nkey = 42";
    assert_eq!(roundtrip(input), input);
}

Snapshot Testing with insta

For complex outputs, use snapshot testing:

use insta::assert_yaml_snapshot;

#[test]
fn snapshot_complex_document() {
    let input = r#"
Header comment
title = "Example"

[server]
host = "localhost"
port = 8080
"#.trim();

    let mut stream = TokenStream::lex(input).unwrap();
    let doc: Spanned<Document> = stream.parse().unwrap();
    let output = doc.value.to_string_formatted();

    assert_yaml_snapshot!(output);
}

Run cargo insta test to review and accept snapshots.

Error Tests

Verify error handling:

#[test]
fn test_error_missing_value() {
    let mut stream = TokenStream::lex("key =").unwrap();
    let result: Result<Spanned<KeyValue>, _> = stream.parse();
    assert!(result.is_err());
}

#[test]
fn test_error_invalid_token() {
    let result = TokenStream::lex("@invalid");
    assert!(result.is_err());
}

Visitor Tests

#[test]
fn test_key_collector() {
    let input = "a = 1\nb = 2\n[section]\nc = 3";
    let mut stream = TokenStream::lex(input).unwrap();
    let doc: Spanned<Document> = stream.parse().unwrap();

    let mut collector = KeyCollector::new();
    collector.visit_document(&doc.value);

    assert_eq!(collector.keys, vec!["a", "b", "c"]);
}

Test Organization

tests/
├── parse_test.rs      # Parse correctness
├── roundtrip_test.rs  # Round-trip fidelity
└── visitor_test.rs    # Visitor behavior

Testing Tips

Test Edge Cases

#[test] fn test_empty_document() { /* ... */ }
#[test] fn test_trailing_comma() { /* ... */ }
#[test] fn test_nested_tables() { /* ... */ }
#[test] fn test_unicode_strings() { /* ... */ }

Property-Based Testing

With proptest:

proptest! {
    #[test]
    fn roundtrip_integers(n: i64) {
        let input = format!("x = {}", n);
        let output = roundtrip(&input);
        assert_eq!(input, output);
    }
}

Debug Output

#[test]
fn debug_parse() {
    let mut stream = TokenStream::lex("key = [1, 2]").unwrap();
    let doc: Spanned<Document> = stream.parse().unwrap();

    // AST structure
    dbg!(&doc);

    // Formatted output
    println!("{}", doc.value.to_string_formatted());
}

Running Tests

# All tests
cargo test

# Specific test file
cargo test --test parse_test

# Update snapshots
cargo insta test --accept

Incremental Parsing

This chapter demonstrates how to add incremental parsing support to the TOML parser for streaming scenarios.

Overview

Incremental parsing allows processing TOML data as it arrives in chunks, useful for:

Parsing large configuration files without loading entirely into memory
Processing TOML streams from network connections
Real-time parsing in editors

Implementing IncrementalLexer

First, wrap the logos lexer with incremental capabilities:

use synkit::async_stream::IncrementalLexer;

pub struct TomlIncrementalLexer {
    buffer: String,
    offset: usize,
    pending_tokens: Vec<Spanned<Token>>,
}

impl IncrementalLexer for TomlIncrementalLexer {
    type Token = Token;
    type Span = Span;
    type Spanned = Spanned<Token>;
    type Error = TomlError;

    fn new() -> Self {
        Self {
            buffer: String::new(),
            offset: 0,
            pending_tokens: Vec::new(),
        }
    }

    fn feed(&mut self, chunk: &str) -> Result<Vec<Self::Spanned>, Self::Error> {
        use logos::Logos;

        self.buffer.push_str(chunk);
        let mut tokens = Vec::new();
        let mut lexer = Token::lexer(&self.buffer);

        while let Some(result) = lexer.next() {
            let span = lexer.span();
            let token = result.map_err(|_| TomlError::Unknown)?;
            tokens.push(Spanned {
                value: token,
                span: Span::new(self.offset + span.start, self.offset + span.end),
            });
        }

        // Handle chunk boundaries - hold back potentially incomplete tokens
        let emit_count = if self.buffer.ends_with('\n') {
            tokens.len()
        } else {
            tokens.len().saturating_sub(1)
        };

        let to_emit: Vec<_> = tokens.drain(..emit_count).collect();
        self.pending_tokens = tokens;

        if let Some(last) = to_emit.last() {
            let consumed = last.span.end() - self.offset;
            self.buffer.drain(..consumed);
            self.offset = last.span.end();
        }

        Ok(to_emit)
    }

    fn finish(mut self) -> Result<Vec<Self::Spanned>, Self::Error> {
        // Process remaining buffer
        if !self.buffer.is_empty() {
            use logos::Logos;
            let mut lexer = Token::lexer(&self.buffer);
            while let Some(result) = lexer.next() {
                let span = lexer.span();
                let token = result.map_err(|_| TomlError::Unknown)?;
                self.pending_tokens.push(Spanned {
                    value: token,
                    span: Span::new(self.offset + span.start, self.offset + span.end),
                });
            }
        }
        Ok(self.pending_tokens)
    }

    fn offset(&self) -> usize {
        self.offset
    }
}

Implementing IncrementalParse

Define an incremental document item that emits as soon as parseable:

use synkit::async_stream::{IncrementalParse, ParseCheckpoint};

#[derive(Debug, Clone)]
pub enum IncrementalDocumentItem {
    Trivia(Trivia),
    KeyValue(Spanned<KeyValue>),
    TableHeader {
        lbracket: Spanned<tokens::LBracketToken>,
        name: Spanned<Key>,
        rbracket: Spanned<tokens::RBracketToken>,
    },
}

impl IncrementalParse for IncrementalDocumentItem {
    type Token = Token;
    type Error = TomlError;

    fn parse_incremental<S>(
        tokens: &[S],
        checkpoint: &ParseCheckpoint,
    ) -> Result<(Option<Self>, ParseCheckpoint), Self::Error>
    where
        S: AsRef<Self::Token>,
    {
        let cursor = checkpoint.cursor;

        if cursor >= tokens.len() {
            return Ok((None, checkpoint.clone()));
        }

        let token = tokens[cursor].as_ref();

        match token {
            // Newline trivia - emit immediately
            Token::Newline => {
                let item = IncrementalDocumentItem::Trivia(/* ... */);
                let new_cp = ParseCheckpoint {
                    cursor: cursor + 1,
                    tokens_consumed: checkpoint.tokens_consumed + 1,
                    state: 0,
                };
                Ok((Some(item), new_cp))
            }

            // Table header: need [, name, ]
            Token::LBracket => {
                if cursor + 2 >= tokens.len() {
                    // Need more tokens
                    return Ok((None, checkpoint.clone()));
                }
                // Parse [name] and emit TableHeader
                // ...
            }

            // Key-value: need key, =, value
            Token::BareKey(_) | Token::BasicString(_) => {
                if cursor + 2 >= tokens.len() {
                    return Ok((None, checkpoint.clone()));
                }
                // Parse key = value and emit KeyValue
                // ...
            }

            // Skip whitespace
            Token::Space | Token::Tab => {
                let new_cp = ParseCheckpoint {
                    cursor: cursor + 1,
                    tokens_consumed: checkpoint.tokens_consumed + 1,
                    state: checkpoint.state,
                };
                Self::parse_incremental(tokens, &new_cp)
            }

            _ => Err(TomlError::Expected {
                expect: "key, table header, or trivia",
                found: format!("{:?}", token),
            }),
        }
    }

    fn can_parse<S>(tokens: &[S], checkpoint: &ParseCheckpoint) -> bool
    where
        S: AsRef<Self::Token>,
    {
        checkpoint.cursor < tokens.len()
    }
}

Using with Tokio

Stream TOML parsing with tokio channels:

use synkit::async_stream::tokio_impl::AstStream;
use tokio::sync::mpsc;

#[tokio::main]
async fn main() {
    let (source_tx, mut source_rx) = mpsc::channel::<String>(8);
    let (token_tx, token_rx) = mpsc::channel(32);
    let (ast_tx, mut ast_rx) = mpsc::channel(16);

    // Lexer task
    tokio::spawn(async move {
        let mut lexer = TomlIncrementalLexer::new();
        while let Some(chunk) = source_rx.recv().await {
            for token in lexer.feed(&chunk).unwrap() {
                token_tx.send(token).await.unwrap();
            }
        }
        for token in lexer.finish().unwrap() {
            token_tx.send(token).await.unwrap();
        }
    });

    // Parser task
    tokio::spawn(async move {
        let mut parser = AstStream::<IncrementalDocumentItem, Spanned<Token>>::new(
            token_rx,
            ast_tx
        );
        parser.run().await.unwrap();
    });

    // Feed source chunks
    source_tx.send("[server]\n".to_string()).await.unwrap();
    source_tx.send("host = \"localhost\"\n".to_string()).await.unwrap();
    source_tx.send("port = 8080\n".to_string()).await.unwrap();
    drop(source_tx);

    // Process items as they arrive
    while let Some(item) = ast_rx.recv().await {
        match item {
            IncrementalDocumentItem::TableHeader { name, .. } => {
                println!("Found table: {:?}", name);
            }
            IncrementalDocumentItem::KeyValue(kv) => {
                println!("Found key-value: {:?}", kv.value.key);
            }
            IncrementalDocumentItem::Trivia(_) => {}
        }
    }
}

Testing Incremental Parsing

Test with various chunk boundaries:

#[test]
fn test_incremental_lexer_chunked() {
    let mut lexer = TomlIncrementalLexer::new();

    // Split across chunk boundary
    let t1 = lexer.feed("ke").unwrap();
    let t2 = lexer.feed("y = ").unwrap();
    let t3 = lexer.feed("42\n").unwrap();

    let remaining = lexer.finish().unwrap();
    let total = t1.len() + t2.len() + t3.len() + remaining.len();

    // Should produce: key, =, 42, newline
    assert!(total >= 4);
}

#[test]
fn test_incremental_parse_needs_more() {
    let tokens = vec![
        Spanned { value: Token::BareKey("name".into()), span: Span::new(0, 4) },
        Spanned { value: Token::Eq, span: Span::new(5, 6) },
        // Missing value!
    ];

    let checkpoint = ParseCheckpoint::default();
    let (result, _) = IncrementalDocumentItem::parse_incremental(&tokens, &checkpoint).unwrap();

    // Should return None, not error
    assert!(result.is_none());
}

Summary

Key points for incremental parsing:

Buffer management: Hold back tokens at chunk boundaries that might be incomplete
Return None for incomplete: Don’t error when more tokens are needed
Track offset: Maintain byte offset across chunks for correct spans
Emit early: Emit AST nodes as soon as they’re complete
Test boundaries: Test parsing with data split at various points

Tutorial: JSONL Incremental Parser

Build a high-performance streaming JSON Lines parser using synkit’s incremental parsing infrastructure.

Source Code

📦 Complete source: examples/jsonl-parser

What You’ll Learn

ChunkBoundary - Define where to split token streams
IncrementalLexer - Buffer partial input, emit complete tokens
IncrementalParse - Parse from token buffers with checkpoints
Async streaming - tokio and futures integration
Stress testing - Validate memory stability under load

JSON Lines Format

JSON Lines uses newline-delimited JSON:

{"user": "alice", "action": "login"}
{"user": "bob", "action": "purchase", "amount": 42.50}
{"user": "alice", "action": "logout"}

Each line is a complete JSON value. This makes JSONL ideal for:

Log processing
Event streams
Large dataset processing
Network protocols

Why Incremental Parsing?

Traditional parsing loads entire input into memory:

let input = fs::read_to_string("10gb_logs.jsonl")?;  // ❌ OOM
let docs: Vec<Log> = parse(&input)?;

Incremental parsing processes chunks:

let mut lexer = JsonIncrementalLexer::new();
while let Some(chunk) = reader.read_chunk().await {
    for token in lexer.feed(&chunk)? {
        // Process tokens as they arrive
    }
}

Prerequisites

Completed the TOML Parser Tutorial (or familiarity with synkit basics)
Understanding of async Rust (for chapters 5-6)

Chapters

Chapter	Topic	Key Concepts
1. Token Definition	Token enum and `parser_kit!`	logos patterns, `#[no_to_tokens]`
2. Chunk Boundaries	`ChunkBoundary` trait	depth tracking, boundary detection
3. Incremental Lexer	`IncrementalLexer` trait	buffering, offset tracking
4. Incremental Parse	`IncrementalParse` trait	checkpoints, partial results
5. Async Streaming	tokio/futures integration	channels, backpressure
6. Stress Testing	Memory stability	1M+ events, leak detection

Token Definition

📦 Source: examples/jsonl-parser/src/lib.rs

Error Type

Define a parser error type with thiserror:

use thiserror::Error;

#[derive(Error, Debug, Clone, Default, PartialEq)]
pub enum JsonError {
    #[default]
    #[error("unknown lexing error")]
    Unknown,

    #[error("expected {expect}, found {found}")]
    Expected { expect: &'static str, found: String },

    #[error("expected {expect}, found EOF")]
    Empty { expect: &'static str },

    #[error("invalid number: {0}")]
    InvalidNumber(String),

    #[error("invalid escape sequence")]
    InvalidEscape,

    #[error("{source}")]
    Spanned {
        #[source]
        source: Box<JsonError>,
        span: Span,
    },
}

Token Definition

Use parser_kit! to define JSON tokens:

synkit::parser_kit! {
    error: JsonError,

    skip_tokens: [Space, Tab],

    tokens: {
        // Whitespace (auto-skipped during parsing)
        #[token(" ", priority = 0)]
        Space,

        #[token("\t", priority = 0)]
        Tab,

        // Newline is significant - it's our record delimiter
        #[regex(r"\r?\n")]
        #[fmt("newline")]
        #[no_to_tokens]  // Custom ToTokens impl
        Newline,

        // Structural tokens
        #[token("{")]
        LBrace,

        #[token("}")]
        RBrace,

        #[token("[")]
        LBracket,

        #[token("]")]
        RBracket,

        #[token(":")]
        Colon,

        #[token(",")]
        Comma,

        // Literals
        #[token("null")]
        Null,

        #[token("true")]
        True,

        #[token("false")]
        False,

        // Strings with escape sequences
        #[regex(r#""([^"\\]|\\.)*""#, |lex| {
            let s = lex.slice();
            s[1..s.len()-1].to_string()  // Strip quotes
        })]
        #[fmt("string")]
        #[no_to_tokens]
        String(String),

        // JSON numbers (integers and floats)
        #[regex(r"-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?",
                |lex| lex.slice().to_string())]
        #[fmt("number")]
        Number(String),
    },

    delimiters: {
        Brace => (LBrace, RBrace),
        Bracket => (LBracket, RBracket),
    },
}

Key Points

`#[no_to_tokens]` for Custom Printing

Some tokens need custom ToTokens implementations:

impl traits::ToTokens for tokens::StringToken {
    fn write(&self, p: &mut printer::Printer) {
        use synkit::Printer as _;
        p.word("\"");
        for c in self.0.chars() {
            match c {
                '"' => p.word("\\\""),
                '\\' => p.word("\\\\"),
                '\n' => p.word("\\n"),
                '\r' => p.word("\\r"),
                '\t' => p.word("\\t"),
                c => p.char(c),
            }
        }
        p.word("\"");
    }
}

Newline as Boundary

Unlike whitespace, Newline is semantically significant in JSONL - it separates records. Keep it in the token stream but handle it specially in parsing.

Chapter 2: Chunk Boundaries →

Chunk Boundaries

📦 Source: examples/jsonl-parser/src/incremental.rs

The ChunkBoundary trait defines where token streams can be safely split for incremental parsing.

The Problem

When processing streaming input, we need to know when we have enough tokens to parse a complete unit. For JSONL, a “complete unit” is a single JSON line ending with a newline.

But we can’t just split on any newline - consider:

{"message": "hello\nworld"}

The \n inside the string is NOT a record boundary.

ChunkBoundary Trait

pub trait ChunkBoundary {
    type Token;

    /// Is this token a potential boundary?
    fn is_boundary_token(token: &Self::Token) -> bool;

    /// Depth change: +1 for openers, -1 for closers
    fn depth_delta(token: &Self::Token) -> i32 { 0 }

    /// Should this token be skipped when scanning?
    fn is_ignorable(token: &Self::Token) -> bool { false }

    /// Find next boundary at depth 0
    fn find_boundary<S: AsRef<Self::Token>>(
        tokens: &[S],
        start: usize
    ) -> Option<usize>;
}

JSONL Implementation

impl ChunkBoundary for JsonLine {
    type Token = Token;

    #[inline]
    fn is_boundary_token(token: &Token) -> bool {
        matches!(token, Token::Newline)
    }

    #[inline]
    fn depth_delta(token: &Token) -> i32 {
        match token {
            Token::LBrace | Token::LBracket => 1,  // Opens nesting
            Token::RBrace | Token::RBracket => -1, // Closes nesting
            _ => 0,
        }
    }

    #[inline]
    fn is_ignorable(token: &Token) -> bool {
        matches!(token, Token::Space | Token::Tab)
    }
}

How It Works

The default find_boundary implementation:

Starts at depth = 0
For each token:
- Adds depth_delta() to depth
- If depth == 0 and is_boundary_token(): return position + 1
Returns None if no boundary found

Example Token Stream

Tokens: { "a" : 1 } \n { "b" : 2 } \n
Depth:  1       1 0 0  1       1 0 0
        ^             ^             ^
        open          boundary      boundary

The first \n at index 5 (after }) is a valid boundary because depth is 0.

Finding Boundaries

Use find_boundary to locate complete chunks:

let tokens: Vec<Spanned<Token>> = /* from lexer */;
let mut start = 0;

while let Some(end) = JsonLine::find_boundary(&tokens, start) {
    let chunk = &tokens[start..end];
    let line = parse_json_line(chunk)?;
    process(line);
    start = end;
}
// tokens[start..] contains incomplete data - wait for more

Incremental Lexer

📦 Source: examples/jsonl-parser/src/incremental.rs

The IncrementalLexer trait enables lexing input that arrives in chunks.

The Problem

Network data arrives in arbitrary chunks:

Chunk 1: {"name": "ali
Chunk 2: ce"}\n{"name
Chunk 3: e": "bob"}\n

We need to:

Buffer incomplete tokens across chunks
Emit complete tokens as soon as available
Track source positions across all chunks

IncrementalLexer Trait

pub trait IncrementalLexer: Sized {
    type Token: Clone;
    type Span: Clone;
    type Spanned: Clone;
    type Error: fmt::Display;

    /// Create with default capacity
    fn new() -> Self;

    /// Create with capacity hints for pre-allocation
    fn with_capacity_hint(hint: LexerCapacityHint) -> Self;

    /// Feed a chunk, return complete tokens
    fn feed(&mut self, chunk: &str) -> Result<Vec<Self::Spanned>, Self::Error>;

    /// Feed into existing buffer (avoids allocation)
    fn feed_into(
        &mut self,
        chunk: &str,
        buffer: &mut Vec<Self::Spanned>
    ) -> Result<usize, Self::Error>;

    /// Finish and return remaining tokens
    fn finish(self) -> Result<Vec<Self::Spanned>, Self::Error>;

    /// Current byte offset
    fn offset(&self) -> usize;
}

JSONL Implementation

pub struct JsonIncrementalLexer {
    buffer: String,      // Accumulated input
    offset: usize,       // Total bytes processed
    token_hint: usize,   // Capacity hint
}

impl IncrementalLexer for JsonIncrementalLexer {
    type Token = Token;
    type Span = Span;
    type Spanned = Spanned<Token>;
    type Error = JsonError;

    fn new() -> Self {
        Self {
            buffer: String::new(),
            offset: 0,
            token_hint: 64,
        }
    }

    fn with_capacity_hint(hint: LexerCapacityHint) -> Self {
        Self {
            buffer: String::with_capacity(hint.buffer_capacity),
            offset: 0,
            token_hint: hint.tokens_per_chunk,
        }
    }

    fn feed(&mut self, chunk: &str) -> Result<Vec<Self::Spanned>, Self::Error> {
        self.buffer.push_str(chunk);
        self.lex_complete_lines()
    }

    fn finish(self) -> Result<Vec<Self::Spanned>, Self::Error> {
        if self.buffer.is_empty() {
            return Ok(Vec::new());
        }
        // Lex remaining buffer
        self.lex_buffer(&self.buffer)
    }

    fn offset(&self) -> usize {
        self.offset
    }
}

Key Implementation: `lex_complete_lines`

fn lex_complete_lines(&mut self) -> Result<Vec<Spanned<Token>>, JsonError> {
    use logos::Logos;

    // Find last newline - only lex complete lines
    let split_pos = self.buffer.rfind('\n').map(|p| p + 1);

    let (to_lex, remainder) = match split_pos {
        Some(pos) if pos < self.buffer.len() => {
            // Have remainder after newline
            let (prefix, suffix) = self.buffer.split_at(pos);
            (prefix.to_string(), suffix.to_string())
        }
        Some(pos) if pos == self.buffer.len() => {
            // Newline at end, no remainder
            (std::mem::take(&mut self.buffer), String::new())
        }
        _ => return Ok(Vec::new()), // No complete lines yet
    };

    // Lex the complete portion
    let mut tokens = Vec::with_capacity(self.token_hint);
    let mut lexer = Token::lexer(&to_lex);

    while let Some(result) = lexer.next() {
        let token = result.map_err(|_| JsonError::Unknown)?;
        let span = lexer.span();
        tokens.push(Spanned {
            value: token,
            // Adjust span by global offset
            span: Span::new(
                self.offset + span.start,
                self.offset + span.end
            ),
        });
    }

    // Update state
    self.offset += to_lex.len();
    self.buffer = remainder;

    Ok(tokens)
}

Capacity Hints

Pre-allocate buffers based on expected input:

// Small: <1KB inputs
let hint = LexerCapacityHint::small();

// Medium: 1KB-64KB (default)
let hint = LexerCapacityHint::medium();

// Large: >64KB
let hint = LexerCapacityHint::large();

// Custom: from expected chunk size
let hint = LexerCapacityHint::from_chunk_size(4096);

let lexer = JsonIncrementalLexer::with_capacity_hint(hint);

Using `feed_into` for Zero-Copy

Avoid repeated allocations with feed_into:

let mut lexer = JsonIncrementalLexer::new();
let mut token_buffer = Vec::with_capacity(1024);

while let Some(chunk) = source.next_chunk().await {
    let added = lexer.feed_into(&chunk, &mut token_buffer)?;
    println!("Added {} tokens", added);

    // Process and drain tokens...
}

Incremental Parse

📦 Source: examples/jsonl-parser/src/incremental.rs

The IncrementalParse trait enables parsing from a growing token buffer.

IncrementalParse Trait

pub trait IncrementalParse: Sized {
    type Token: Clone;
    type Error: fmt::Display;

    /// Attempt to parse from tokens starting at checkpoint
    ///
    /// Returns:
    /// - `Ok((Some(node), new_checkpoint))` - Parsed successfully
    /// - `Ok((None, checkpoint))` - Need more tokens
    /// - `Err(error)` - Unrecoverable error
    fn parse_incremental<S>(
        tokens: &[S],
        checkpoint: &ParseCheckpoint,
    ) -> Result<(Option<Self>, ParseCheckpoint), Self::Error>
    where
        S: AsRef<Self::Token>;

    /// Check if parsing might succeed with current tokens
    fn can_parse<S>(tokens: &[S], checkpoint: &ParseCheckpoint) -> bool
    where
        S: AsRef<Self::Token>;
}

ParseCheckpoint

Track parser state between parse attempts:

#[derive(Debug, Clone, Default)]
pub struct ParseCheckpoint {
    /// Position in token buffer
    pub cursor: usize,
    /// Tokens consumed (for buffer compaction)
    pub tokens_consumed: usize,
    /// Custom state (e.g., nesting depth)
    pub state: u64,
}

JSONL Implementation Strategy

Rather than re-implementing parsing logic, we reuse the standard Parse trait:

impl IncrementalParse for JsonLine {
    type Token = Token;
    type Error = JsonError;

    fn parse_incremental<S>(
        tokens: &[S],
        checkpoint: &ParseCheckpoint,
    ) -> Result<(Option<Self>, ParseCheckpoint), Self::Error>
    where
        S: AsRef<Self::Token>,
    {
        // 1. Find chunk boundary
        let boundary = match Self::find_boundary(tokens, checkpoint.cursor) {
            Some(b) => b,
            None => return Ok((None, checkpoint.clone())), // Need more
        };

        // 2. Extract chunk tokens
        let chunk = &tokens[checkpoint.cursor..boundary];

        // 3. Build TokenStream from chunk
        let stream_tokens: Vec<_> = chunk.iter()
            .map(|s| /* convert to SpannedToken */)
            .collect();

        let mut stream = TokenStream::from_tokens(/* ... */);

        // 4. Use standard Parse implementation
        let line = JsonLine::parse(&mut stream)?;

        // 5. Return with updated checkpoint
        let consumed = boundary - checkpoint.cursor;
        Ok((
            Some(line),
            ParseCheckpoint {
                cursor: boundary,
                tokens_consumed: checkpoint.tokens_consumed + consumed,
                state: 0,
            },
        ))
    }

    fn can_parse<S>(tokens: &[S], checkpoint: &ParseCheckpoint) -> bool
    where
        S: AsRef<Self::Token>,
    {
        // Can parse if there's a complete chunk
        Self::find_boundary(tokens, checkpoint.cursor).is_some()
    }
}

Key Design: Reuse Parse Trait

The incremental parser delegates to the standard Parse implementation. This ensures:

Consistency - Same parsing logic for sync and async
Maintainability - One parser implementation to update
Testing - Sync tests validate incremental behavior

Using IncrementalBuffer

The IncrementalBuffer helper manages tokens efficiently:

use synkit::async_stream::{IncrementalBuffer, parse_available_chunks};

let mut buffer = IncrementalBuffer::with_capacity(1024);
let mut lexer = JsonIncrementalLexer::new();

// Feed tokens
buffer.extend(lexer.feed(chunk)?);

// Parse all available chunks
let results = parse_available_chunks::<JsonLine, _, _, _, _>(
    &mut buffer,
    |tokens| {
        let mut stream = TokenStream::from_tokens(/* ... */);
        JsonLine::parse(&mut stream)
    },
)?;

// Compact buffer to release memory
buffer.compact();

IncrementalBuffer Operations

// Access unconsumed tokens
let remaining = buffer.remaining();

// Mark tokens as consumed
buffer.consume(count);

// Remove consumed tokens from memory
buffer.compact();

// Check size
let len = buffer.len();         // Unconsumed count
let total = buffer.total_tokens(); // Including consumed

Error Handling

Return errors for unrecoverable parsing failures:

fn parse_incremental<S>(
    tokens: &[S],
    checkpoint: &ParseCheckpoint,
) -> Result<(Option<Self>, ParseCheckpoint), Self::Error> {
    // ...
    match JsonLine::parse(&mut stream) {
        Ok(line) => Ok((Some(line), new_checkpoint)),
        Err(e) => {
            // For recoverable errors, could return Ok((None, ...))
            // For unrecoverable, propagate the error
            Err(e)
        }
    }
}

Async Streaming

📦 Source: examples/jsonl-parser/src/incremental.rs

synkit provides async streaming support via tokio and futures feature flags.

Feature Flags

# Cargo.toml

# For tokio runtime
synkit = { version = "0.1", features = ["tokio"] }

# For runtime-agnostic futures
synkit = { version = "0.1", features = ["futures"] }

# For both
synkit = { version = "0.1", features = ["tokio", "futures"] }

Architecture

┌──────────┐     ┌───────────────────┐     ┌──────────────┐
│  Source  │────▶│ AsyncTokenStream  │────▶│  AstStream   │────▶ Consumer
│ (chunks) │     │     (lexer)       │     │   (parser)   │
└──────────┘     └───────────────────┘     └──────────────┘
                         │                        │
                    mpsc::channel            mpsc::channel

Tokio Implementation

AsyncTokenStream

Receives source chunks, emits tokens:

use synkit::async_stream::tokio_impl::AsyncTokenStream;
use tokio::sync::mpsc;

let (token_tx, token_rx) = mpsc::channel(1024);
let mut lexer = AsyncTokenStream::<JsonIncrementalLexer>::new(token_tx);

// Feed chunks
lexer.feed(chunk).await?;

// Signal completion
lexer.finish().await?;

AstStream

Receives tokens, emits AST nodes:

use synkit::async_stream::tokio_impl::AstStream;

let (ast_tx, mut ast_rx) = mpsc::channel(64);
let mut parser = AstStream::<JsonLine, Token>::new(token_rx, ast_tx);

// Run until token stream exhausted
parser.run().await?;

Full Pipeline

use synkit::async_stream::{StreamConfig, tokio_impl::*};
use tokio::sync::mpsc;

async fn process_jsonl_stream(
    mut source: impl Stream<Item = String>,
) -> Result<Vec<JsonLine>, StreamError> {
    let config = StreamConfig::medium();

    // Create channels
    let (token_tx, token_rx) = mpsc::channel(config.token_buffer_size);
    let (ast_tx, mut ast_rx) = mpsc::channel(config.ast_buffer_size);

    // Spawn lexer task
    let lexer_handle = tokio::spawn(async move {
        let mut lexer = AsyncTokenStream::<JsonIncrementalLexer>::with_config(
            token_tx,
            config.clone()
        );
        while let Some(chunk) = source.next().await {
            lexer.feed(&chunk).await?;
        }
        lexer.finish().await
    });

    // Spawn parser task
    let parser_handle = tokio::spawn(async move {
        let mut parser = AstStream::<JsonLine, Token>::with_config(
            token_rx,
            ast_tx,
            config,
        );
        parser.run().await
    });

    // Collect results
    let mut results = Vec::new();
    while let Some(line) = ast_rx.recv().await {
        results.push(line);
    }

    // Wait for tasks
    lexer_handle.await??;
    parser_handle.await??;

    Ok(results)
}

StreamConfig

Configure buffer sizes and limits:

let config = StreamConfig {
    token_buffer_size: 1024,  // Channel + buffer capacity
    ast_buffer_size: 64,       // AST channel capacity
    max_chunk_size: 64 * 1024, // Reject chunks > 64KB
    lexer_hint: LexerCapacityHint::medium(),
};

// Or use presets
let config = StreamConfig::small();   // <1KB inputs
let config = StreamConfig::medium();  // 1KB-64KB (default)
let config = StreamConfig::large();   // >64KB inputs

Futures Implementation

For runtime-agnostic streaming, use ParseStream:

use synkit::async_stream::futures_impl::ParseStream;
use futures_util::StreamExt;

let token_stream: impl Stream<Item = Token> = /* from lexer */;
let mut parse_stream = ParseStream::<_, JsonLine, _>::new(token_stream);

while let Some(result) = parse_stream.next().await {
    match result {
        Ok(line) => process(line),
        Err(e) => handle_error(e),
    }
}

Error Handling

StreamError covers all streaming failure modes:

pub enum StreamError {
    ChannelClosed,              // Unexpected channel closure
    LexError(String),           // Lexer failure
    ParseError(String),         // Parser failure
    IncompleteInput,            // EOF with partial data
    ChunkTooLarge { size, max }, // Input exceeds limit
    BufferOverflow { current, max }, // Buffer exceeded
    Timeout,                    // Deadline exceeded
    ResourceLimit { resource, current, max },
}

Handle errors appropriately:

match parser.run().await {
    Ok(()) => println!("Complete"),
    Err(StreamError::IncompleteInput) => {
        eprintln!("Warning: truncated input");
    }
    Err(StreamError::ParseError(msg)) => {
        eprintln!("Parse error: {}", msg);
        // Could log and continue with next record
    }
    Err(e) => return Err(e.into()),
}

Stress Testing

📦 Source: examples/jsonl-parser/tests/stress_tests.rs

Validate incremental parsers handle high throughput without memory leaks.

Test Strategy

Volume - Process millions of events
Memory stability - Track buffer sizes, detect leaks
Varied input - Different object sizes and structures
Buffer compaction - Verify consumed tokens are released

Million Event Test

#[test]
fn test_million_events_no_memory_leak() {
    let config = StressConfig {
        event_count: 1_000_000,
        chunk_size: 4096,
        memory_check_interval: 100_000,
        max_memory_growth: 2.0,
    };

    let input = r#"{"id": 1, "name": "test", "value": 42.5}\n"#;

    let mut lexer = JsonIncrementalLexer::new();
    let mut token_buffer: Vec<Spanned<Token>> = Vec::new();
    let mut checkpoint = ParseCheckpoint::default();
    let mut total_parsed = 0;
    let mut memory_tracker = MemoryTracker::new();

    for i in 0..config.event_count {
        // Feed one line
        token_buffer.extend(lexer.feed(&input)?);

        // Parse available
        loop {
            match JsonLine::parse_incremental(&token_buffer, &checkpoint) {
                Ok((Some(_line), new_cp)) => {
                    total_parsed += 1;
                    checkpoint = new_cp;
                }
                Ok((None, _)) => break,
                Err(e) => panic!("Parse error at event {}: {}", i, e),
            }
        }

        // Compact frequently to avoid memory growth
        if checkpoint.tokens_consumed > 500 {
            token_buffer.drain(..checkpoint.tokens_consumed);
            checkpoint.cursor -= checkpoint.tokens_consumed;
            checkpoint.tokens_consumed = 0;
        }

        // Memory sampling
        if i % config.memory_check_interval == 0 {
            memory_tracker.sample(token_buffer.len(), 0);
        }
    }

    assert_eq!(total_parsed, config.event_count);
    assert!(memory_tracker.is_stable(config.max_memory_growth));
}

Memory Tracking

struct MemoryTracker {
    initial_estimate: usize,
    samples: Vec<usize>,
}

impl MemoryTracker {
    fn sample(&mut self, token_buffer_size: usize, line_buffer_size: usize) {
        let estimate = token_buffer_size * size_of::<Spanned<Token>>()
            + line_buffer_size * size_of::<JsonLine>();

        if self.initial_estimate == 0 {
            self.initial_estimate = estimate.max(1);
        }
        self.samples.push(estimate);
    }

    fn max_growth_ratio(&self) -> f64 {
        let max = self.samples.iter().max().copied().unwrap_or(0);
        max as f64 / self.initial_estimate as f64
    }

    fn is_stable(&self, max_growth: f64) -> bool {
        self.max_growth_ratio() <= max_growth
    }
}

Varied Input Test

Test with different JSON structures:

#[test]
fn test_varied_objects_stress() {
    let objects = vec![
        r#"{"type": "simple", "value": 1}"#,
        r#"{"type": "nested", "data": {"inner": true}}"#,
        r#"{"type": "array", "items": [1, 2, 3, 4, 5]}"#,
        r#"{"type": "complex", "users": [{"name": "a"}], "count": 2}"#,
    ];

    for i in 0..500_000 {
        let obj = objects[i % objects.len()];
        let input = format!("{}\n", obj);

        // Feed, parse, verify...
    }
}

Buffer Compaction

Critical for memory stability:

// Bad: Buffer grows unbounded
loop {
    token_buffer.extend(lexer.feed(chunk)?);
    while let Some(line) = parse(&token_buffer)? {
        // Parse but never compact
    }
}

// Good: Compact after consuming
loop {
    token_buffer.extend(lexer.feed(chunk)?);

    while let Some(line) = parse(&token_buffer)? {
        checkpoint = new_checkpoint;
    }

    // Compact when enough consumed
    if checkpoint.tokens_consumed > THRESHOLD {
        token_buffer.drain(..checkpoint.tokens_consumed);
        checkpoint.cursor -= checkpoint.tokens_consumed;
        checkpoint.tokens_consumed = 0;
    }
}

Performance Metrics

Track throughput:

let start = Instant::now();

// ... process events ...

let elapsed = start.elapsed();
let rate = total_parsed as f64 / elapsed.as_secs_f64();
println!(
    "Processed {} events in {:?} ({:.0} events/sec)",
    total_parsed, elapsed, rate
);

Expected performance (rough guidelines):

Simple objects: 500K-1M events/sec
Complex nested: 100K-300K events/sec
Memory growth: <2x initial

Running Tests

# Run stress tests (may take minutes)
cd examples/jsonl-parser
cargo test stress -- --nocapture

# With release optimizations
cargo test --release stress -- --nocapture

Summary

Incremental parsing requires careful attention to:

Buffer management - Compact regularly
Memory bounds - Track growth, fail on overflow
Throughput - Profile hot paths
Correctness - Same results as sync parsing

The JSONL parser demonstrates these patterns at scale.

parser_kit! Macro

The parser_kit! macro generates parsing infrastructure from token definitions.

Syntax

synkit::parser_kit! {
    error: ErrorType,
    skip_tokens: [Token1, Token2],

    #[logos(skip r"...")] // Optional logos-level attributes
    tokens: {
        #[token("=")]
        Eq,

        #[regex(r"[a-z]+", |lex| lex.slice().to_string())]
        #[fmt("identifier")]
        Ident(String),
    },

    delimiters: {
        Bracket => (LBracket, RBracket),
    },

    span_derives: [Debug, Clone, PartialEq],
    token_derives: [Debug, Clone, PartialEq],
    custom_derives: [],
}

Fields

`error: ErrorType` (required)

Your error type. Must implement Default:

#[derive(Default)]
pub enum MyError {
    #[default]
    Unknown,
    // ...
}

`skip_tokens: [...]` (required)

Tokens to skip during parsing. Typically whitespace:

skip_tokens: [Space, Tab],

Skipped tokens don’t appear in stream.next() but are visible in stream.next_raw().

`tokens: { ... }` (required)

Token definitions using logos attributes.

Unit Tokens

#[token("=")]
Eq,

Generates EqToken with new() and token() methods.

Tokens with Values

#[regex(r"[a-z]+", |lex| lex.slice().to_string())]
Ident(String),

Generates IdentToken(String) implementing Deref<Target=String>.

Token Attributes

Attribute	Purpose
`#[token("...")]`	Exact string match
`#[regex(r"...")]`	Regex pattern
`#[regex(r"...", callback)]`	Regex with value extraction
`#[fmt("name")]`	Display name for errors
`#[derive(...)]`	Additional derives for this token
`priority = N`	Logos priority for conflicts

`delimiters: { ... }` (optional)

Delimiter pair definitions:

delimiters: {
    Bracket => (LBracket, RBracket),
    Brace => (LBrace, RBrace),
    Paren => (LParen, RParen),
},

Generates:

Struct (e.g., Bracket) storing spans
Macro (e.g., bracket!) for extraction

`span_derives: [...]` (optional)

Derives for Span, RawSpan, Spanned<T>:

span_derives: [Debug, Clone, PartialEq, Eq, Hash],

Default: Debug, Clone, PartialEq, Eq, Hash

`token_derives: [...]` (optional)

Derives for all token structs:

token_derives: [Debug, Clone, PartialEq],

`custom_derives: [...]` (optional)

Additional derives for all generated types:

custom_derives: [serde::Serialize],

Generated Modules

`span`

pub struct RawSpan { pub start: usize, pub end: usize }
pub enum Span { CallSite, Known(RawSpan) }
pub struct Spanned<T> { pub value: T, pub span: Span }

`tokens`

pub enum Token { Eq, Ident(String), ... }
pub struct EqToken;
pub struct IdentToken(pub String);

// Macros
macro_rules! Tok { ... }
macro_rules! SpannedTok { ... }

`stream`

pub struct TokenStream { ... }
pub struct MutTokenStream<'a> { ... }

impl TokenStream {
    pub fn lex(source: &str) -> Result<Self, Error>;
    pub fn parse<T: Parse>(&mut self) -> Result<Spanned<T>, Error>;
    pub fn peek<T: Peek>(&self) -> bool;
    pub fn fork(&self) -> Self;
    pub fn advance_to(&mut self, other: &Self);
}

`printer`

pub struct Printer { ... }

impl Printer {
    pub fn new() -> Self;
    pub fn finish(self) -> String;
    pub fn word(&mut self, s: &str);
    pub fn token(&mut self, tok: &Token);
    pub fn space(&mut self);
    pub fn newline(&mut self);
    pub fn open_block(&mut self);
    pub fn close_block(&mut self);
}

`delimiters`

For each delimiter definition:

pub struct Bracket { pub span: Span }

macro_rules! bracket {
    ($inner:ident in $stream:expr) => { ... }
}

`traits`

pub trait Parse: Sized {
    fn parse(stream: &mut TokenStream) -> Result<Self, Error>;
}

pub trait Peek {
    fn is(token: &Token) -> bool;
    fn peek(stream: &TokenStream) -> bool;
}

pub trait ToTokens {
    fn write(&self, printer: &mut Printer);
    fn to_string_formatted(&self) -> String;
}

pub trait Diagnostic {
    fn fmt() -> &'static str;
}

Expansion Example

Input:

synkit::parser_kit! {
    error: E,
    skip_tokens: [],
    tokens: {
        #[token("=")]
        Eq,
    },
    delimiters: {},
    span_derives: [Debug],
    token_derives: [Debug],
}

Expands to ~500 lines including all modules, traits, and implementations.

Core Traits

synkit provides traits in two locations:

synkit (synkit-core): Generic traits for library-level abstractions
Generated traits module: Concrete implementations for your grammar

Parse

Convert tokens to AST nodes.

pub trait Parse: Sized {
    fn parse(stream: &mut TokenStream) -> Result<Self, Error>;
}

Auto-implementations

Token structs implement Parse automatically:

// Generated for EqToken
impl Parse for EqToken {
    fn parse(stream: &mut TokenStream) -> Result<Self, Error> {
        match stream.next() {
            Some(tok) => match &tok.value {
                Token::Eq => Ok(EqToken::new()),
                other => Err(Error::expected::<Self>(other)),
            },
            None => Err(Error::empty::<Self>()),
        }
    }
}

Blanket Implementations

// Option<T> parses if T::peek() succeeds
impl<T: Parse + Peek> Parse for Option<T> { ... }

// Box<T> wraps parsed value
impl<T: Parse> Parse for Box<T> { ... }

// Spanned<T> wraps with span
impl<T: Parse> Parse for Spanned<T> { ... }

Peek

Check next token without consuming.

pub trait Peek {
    fn is(token: &Token) -> bool;
    fn peek(stream: &TokenStream) -> bool;
}

Usage

// In conditionals
if Value::peek(stream) {
    let v: Spanned<Value> = stream.parse()?;
}

// In loops
while SimpleKey::peek(stream) {
    items.push(stream.parse()?);
}

ToTokens

Convert AST back to formatted output.

pub trait ToTokens {
    fn write(&self, printer: &mut Printer);

    fn to_string_formatted(&self) -> String {
        let mut p = Printer::new();
        self.write(&mut p);
        p.finish()
    }
}

Blanket Implementations

impl<T: ToTokens> ToTokens for Spanned<T> {
    fn write(&self, p: &mut Printer) {
        self.value.write(p);
    }
}

impl<T: ToTokens> ToTokens for Option<T> {
    fn write(&self, p: &mut Printer) {
        if let Some(v) = self { v.write(p); }
    }
}

Diagnostic

Provide display name for error messages.

pub trait Diagnostic {
    fn fmt() -> &'static str;
}

Auto-generated using #[fmt("...")] or snake_case variant name.

IncrementalParse

Parse AST nodes incrementally from a token buffer with checkpoint-based state.

pub trait IncrementalParse: Sized {
    fn parse_incremental(
        tokens: &[Token],
        checkpoint: &ParseCheckpoint,
    ) -> Result<(Option<Self>, ParseCheckpoint), Error>;

    fn can_parse(tokens: &[Token], checkpoint: &ParseCheckpoint) -> bool;
}

Usage

impl IncrementalParse for KeyValue {
    fn parse_incremental(
        tokens: &[Token],
        checkpoint: &ParseCheckpoint,
    ) -> Result<(Option<Self>, ParseCheckpoint), TomlError> {
        let cursor = checkpoint.cursor;

        // Need at least 3 tokens: key = value
        if cursor + 2 >= tokens.len() {
            return Ok((None, checkpoint.clone()));
        }

        // Parse key = value pattern
        // ...

        let new_cp = ParseCheckpoint {
            cursor: cursor + 3,
            tokens_consumed: checkpoint.tokens_consumed + 3,
            state: 0,
        };
        Ok((Some(kv), new_cp))
    }

    fn can_parse(tokens: &[Token], checkpoint: &ParseCheckpoint) -> bool {
        checkpoint.cursor < tokens.len()
    }
}

With Async Streaming

use synkit::async_stream::tokio_impl::AstStream;

let (token_tx, token_rx) = mpsc::channel(32);
let (ast_tx, mut ast_rx) = mpsc::channel(16);

tokio::spawn(async move {
    let mut parser = AstStream::<KeyValue, Token>::new(token_rx, ast_tx);
    parser.run().await?;
});

while let Some(kv) = ast_rx.recv().await {
    process_key_value(kv);
}

TokenStream (core trait)

Generic stream interface from synkit-core:

pub trait TokenStream {
    type Token;
    type Span;
    type Spanned;
    type Error;

    fn next(&mut self) -> Option<Self::Spanned>;
    fn peek_token(&self) -> Option<&Self::Spanned>;
    fn next_raw(&mut self) -> Option<Self::Spanned>;
    fn peek_token_raw(&self) -> Option<&Self::Spanned>;
}

The generated stream::TokenStream implements this trait.

Printer (core trait)

Generic printer interface from synkit-core:

pub trait Printer {
    fn word(&mut self, s: &str);
    fn token<T: std::fmt::Display>(&mut self, tok: &T);
    fn space(&mut self);
    fn newline(&mut self);
    fn open_block(&mut self);
    fn close_block(&mut self);
    fn indent(&mut self);
    fn dedent(&mut self);
    fn write_separated<T, F>(&mut self, items: &[T], sep: &str, f: F)
    where F: Fn(&T, &mut Self);
}

SpannedError

Attach source spans to errors:

pub trait SpannedError: Sized {
    type Span;

    fn with_span(self, span: Self::Span) -> Self;
    fn span(&self) -> Option<&Self::Span>;
}

Implementation pattern:

impl SpannedError for MyError {
    type Span = Span;

    fn with_span(self, span: Span) -> Self {
        Self::Spanned { source: Box::new(self), span }
    }

    fn span(&self) -> Option<&Span> {
        match self {
            Self::Spanned { span, .. } => Some(span),
            _ => None,
        }
    }
}

SpanLike / SpannedLike

Abstractions for span types:

pub trait SpanLike {
    fn call_site() -> Self;
    fn new(start: usize, end: usize) -> Self;
}

pub trait SpannedLike<T> {
    type Span: SpanLike;
    fn new(value: T, span: Self::Span) -> Self;
    fn value(&self) -> &T;
    fn span(&self) -> &Self::Span;
}

Enable generic code over different span implementations.

Container Types

synkit provides container types for common parsing patterns.

Punctuated Sequence Types

Three wrapper types for punctuated sequences with different trailing policies:

Type	Trailing Separator	Use Case
`Punctuated<T, P>`	Optional	Array literals: `[1, 2, 3]` or `[1, 2, 3,]`
`Separated<T, P>`	Forbidden	Function args: `f(a, b, c)`
`Terminated<T, P>`	Required	Statements: `use foo; use bar;`

Punctuated

use synkit::Punctuated;

// Optional trailing comma
let items: Punctuated<Value, CommaToken> = parse_punctuated(&mut stream)?;

for value in items.iter() {
    process(value);
}

// Check if trailing comma present
if items.trailing_punct() {
    // ...
}

Separated

use synkit::Separated;

// Trailing separator is an error
let args: Separated<Arg, CommaToken> = parse_separated(&mut stream)?;

Terminated

use synkit::Terminated;

// Each statement must end with separator
let stmts: Terminated<Stmt, SemiToken> = parse_terminated(&mut stream)?;

Common Methods

All three types share these methods via PunctuatedInner:

fn new() -> Self;
fn with_capacity(capacity: usize) -> Self;
fn push_value(&mut self, value: T);
fn push_punct(&mut self, punct: P);
fn len(&self) -> usize;
fn is_empty(&self) -> bool;
fn iter(&self) -> impl Iterator<Item = &T>;
fn pairs(&self) -> impl Iterator<Item = (&T, Option<&P>)>;
fn first(&self) -> Option<&T>;
fn last(&self) -> Option<&T>;
fn trailing_punct(&self) -> bool;

Repeated

Alternative sequence type preserving separator tokens:

use synkit::Repeated;

pub struct Repeated<T, Sep, Spanned> {
    pub values: Vec<RepeatedItem<T, Sep, Spanned>>,
}

pub struct RepeatedItem<T, Sep, Spanned> {
    pub value: Spanned,
    pub sep: Option<Spanned>,
}

Use Repeated when you need to preserve separator token information (e.g., for source-accurate reprinting).

Methods

fn empty() -> Self;
fn with_capacity(capacity: usize) -> Self;
fn len(&self) -> usize;
fn is_empty(&self) -> bool;
fn iter(&self) -> impl Iterator<Item = &RepeatedItem<...>>;
fn push(&mut self, item: RepeatedItem<...>);

Delimited

Value enclosed by delimiters:

use synkit::Delimited;

pub struct Delimited<T, Span> {
    pub span: Span,   // Span covering "[...]" or "{...}"
    pub inner: T,     // The content
}

Created automatically by delimiter macros:

let mut inner;
let bracket = bracket!(inner in stream);
// bracket.span covers "[" through "]"
// inner is a TokenStream of the contents

Usage Patterns

Comma-Separated Arguments

pub struct FnCall {
    pub name: Spanned<IdentToken>,
    pub paren: Paren,
    pub args: Separated<Expr, CommaToken>,  // No trailing comma
}

Array with Optional Trailing

pub struct Array {
    pub bracket: Bracket,
    pub items: Punctuated<Expr, CommaToken>,  // Optional trailing
}

Statement Block

pub struct Block {
    pub brace: Brace,
    pub stmts: Terminated<Stmt, SemiToken>,  // Required trailing
}

// Parse arms manually for control
let mut arms = Vec::new();
while Pattern::peek(stream) {
    arms.push(stream.parse::<Spanned<MatchArm>>()?);
}

Printing Containers

impl<T: ToTokens, Sep: ToTokens> ToTokens for Repeated<T, Sep, Spanned<T>> {
    fn write(&self, p: &mut Printer) {
        for (i, item) in self.iter().enumerate() {
            if i > 0 {
                // Separator between items
                p.word(", ");
            }
            item.value.write(p);
        }
        // Handle trailing separator if present
        if self.has_trailing() {
            p.word(",");
        }
    }
}

Safety & Clamping Behavior

synkit uses safe Rust throughout and employs defensive clamping to prevent panics from edge-case inputs. This page documents behaviors where invalid inputs are silently corrected rather than rejected.

Span Length Calculation

The SpanLike::len() method uses saturating subtraction:

fn len(&self) -> usize {
    self.end().saturating_sub(self.start())
}

Behavior: If end < start (an inverted span), returns 0 instead of panicking or wrapping around.

Rationale: Inverted spans can occur as sentinel values or from malformed input. Returning zero length treats them as empty spans.

Span Join

The SpanLike::join() method computes the union of two spans:

fn join(&self, other: &Self) -> Self {
    Self::new(self.start().min(other.start()), self.end().max(other.end()))
}

Behavior: Uses min() for start and max() for end. No validation that inputs are well-formed.

Rationale: Mathematical min/max cannot overflow or panic. Even inverted input spans produce a consistent result.

Incremental Buffer Consumption

The IncrementalBuffer::consume() method advances the cursor:

pub fn consume(&mut self, n: usize) {
    self.cursor = (self.cursor + n).min(self.tokens.len());
}

Behavior: If n exceeds remaining tokens, cursor clamps to buffer length.

Rationale: Allows callers to safely “consume all” by passing usize::MAX. Prevents out-of-bounds access.

Generated TokenStream Rewind

The TokenStream::rewind() method generated by parser_kit! uses clamp:

fn rewind(&mut self, pos: usize) {
    self.cursor = pos.clamp(self.range_start, self.range_end);
}

Behavior: Invalid positions are silently adjusted to the valid range [range_start, range_end].

Rationale: Parsing backtrack positions may become stale after buffer modifications. Clamping ensures the cursor remains valid.

When Clamping Matters

These behaviors are designed to:

Prevent panics in library code - synkit never panics on edge-case numeric inputs
Allow sentinel values - Special spans like (0, 0) or (MAX, MAX) work safely
Support defensive programming - Callers don’t need to pre-validate every operation

When to Validate Explicitly

If your application requires strict validation (e.g., rejecting inverted spans), add checks at parse boundaries:

fn validate_span(span: &impl SpanLike) -> Result<(), MyError> {
    if span.end() < span.start() {
        return Err(MyError::InvalidSpan);
    }
    Ok(())
}

Resource Limits

For protection against resource exhaustion (e.g., deeply nested input), see:

StreamError::ResourceLimit for runtime limit checking
StreamConfig for configuring buffer sizes
ParseConfig (when using recursion limits) for nesting depth

Security Considerations

synkit is designed for parsing untrusted input. This page documents the security model, protections, and best practices for generated parsers.

No Unsafe Code

synkit uses zero unsafe blocks in core, macros, and kit crates. Memory safety is guaranteed by the Rust compiler.

# Verify yourself
grep -r "unsafe" core/src macros/src kit/src
# Returns no matches

Resource Exhaustion Protection

Recursion Limits

Deeply nested input like [[[[[[...]]]]]] can cause stack overflow. synkit provides configurable recursion limits:

use synkit::ParseConfig;

// Default: 128 levels (matches serde_json)
let config = ParseConfig::default();

// Stricter limit for untrusted input
let config = ParseConfig::new()
    .with_max_recursion_depth(32);

// Track depth manually in your parser
use synkit::RecursionGuard;

struct MyParser {
    depth: RecursionGuard,
    config: ParseConfig,
}

impl MyParser {
    fn parse_nested(&mut self) -> Result<(), MyError> {
        self.depth.enter(self.config.max_recursion_depth)?;
        // ... parse nested content ...
        self.depth.exit();
        Ok(())
    }
}

Token Limits

Prevent CPU exhaustion from extremely long inputs:

let config = ParseConfig::new()
    .with_max_tokens(100_000);  // Fail after 100k tokens

Buffer Limits (Streaming)

For incremental parsing, StreamConfig controls memory usage:

use synkit::StreamConfig;

let config = StreamConfig {
    max_chunk_size: 16 * 1024,      // 16KB max per chunk
    token_buffer_size: 1024,        // Token buffer capacity
    ast_buffer_size: 64,            // AST node buffer
    ..StreamConfig::default()
};

Exceeding limits produces explicit errors:

Error	Trigger
`StreamError::ChunkTooLarge`	Input chunk > `max_chunk_size`
`StreamError::BufferOverflow`	Token buffer exceeded capacity
`StreamError::ResourceLimit`	Generic limit exceeded
`Error::RecursionLimitExceeded`	Nesting depth > `max_recursion_depth`
`Error::TokenLimitExceeded`	Token count > `max_tokens`

Integer Safety

All span arithmetic uses saturating operations to prevent overflow panics:

// Span length - saturating subtraction
fn len(&self) -> usize {
    self.end().saturating_sub(self.start())
}

// Recursion guard - saturating increment
self.depth = self.depth.saturating_add(1);

// Cursor bounds - clamped to valid range
self.cursor = pos.clamp(self.range_start, self.range_end);

See Safety & Clamping for detailed behavior documentation.

Memory Safety

Generated TokenStream uses Arc for shared ownership:

pub struct TokenStream {
    source: Arc<str>,           // Shared source text
    tokens: Arc<Vec<Token>>,    // Shared token buffer
    // ... cursors are Copy types
}

Benefits:

fork() is zero-copy (Arc::clone only)
Thread-safe: TokenStream is Send + Sync
No dangling references possible

Fuzz Testing

synkit includes continuous fuzz testing via cargo-fuzz:

# Run lexer fuzzer
cargo +nightly fuzz run fuzz_lexer

# Run parser fuzzer
cargo +nightly fuzz run fuzz_parser

Fuzz targets exercise:

Arbitrary UTF-8 input
Edge cases in span arithmetic
Token stream operations
Incremental buffer management

Adding Fuzz Tests for Your Parser

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        // Ignore lex errors, just ensure no panics
        let _ = my_parser::TokenStream::lex(s);
    }
});

Security Checklist

When building a parser for untrusted input:

Set max_recursion_depth appropriate for your format
Set max_tokens to prevent CPU exhaustion
Use StreamConfig limits for streaming parsers
Handle all error variants (don’t unwrap)
Add fuzz tests for your grammar
Consider timeout limits at the application layer

Threat Model

synkit protects against:

Threat	Protection
Stack overflow	Recursion limits
Memory exhaustion	Buffer limits, Arc sharing
CPU exhaustion	Token limits
Integer overflow	Saturating arithmetic
Undefined behavior	No unsafe code

synkit does NOT protect against:

Threat	Mitigation
Regex backtracking (logos)	Use logos’ regex restrictions
Application-level DoS	Add timeouts in your application
Malicious AST semantics	Validate AST after parsing

Reporting Vulnerabilities

Please open a Github security advisory.

Testing Generated Code

This guide covers testing strategies for parsers built with synkit, from unit tests to fuzz testing.

Unit Testing

Token-Level Tests

Test individual token recognition:

#[test]
fn test_lex_identifier() {
    let stream = TokenStream::lex("foo_bar").unwrap();
    let tok = stream.peek_token().unwrap();

    assert!(matches!(tok.value, Token::Ident(_)));
    if let Token::Ident(s) = &tok.value {
        assert_eq!(s, "foo_bar");
    }
}

#[test]
fn test_lex_rejects_invalid() {
    // Logos returns errors for unrecognized input
    let result = TokenStream::lex("\x00\x01\x02");
    assert!(result.is_err());
}

Span Accuracy Tests

Verify spans point to correct source locations:

#[test]
fn test_span_accuracy() {
    let source = "let x = 42";
    let mut stream = TokenStream::lex(source).unwrap();

    let kw: Spanned<LetToken> = stream.parse().unwrap();
    assert_eq!(&source[kw.span.start()..kw.span.end()], "let");

    let name: Spanned<IdentToken> = stream.parse().unwrap();
    assert_eq!(&source[name.span.start()..name.span.end()], "x");
}

Parse Tests

Test AST construction:

#[test]
fn test_parse_key_value() {
    let mut stream = TokenStream::lex("name = \"Alice\"").unwrap();
    let kv: Spanned<KeyValue> = stream.parse().unwrap();

    assert!(matches!(kv.key.value, Key::Bare(_)));
    assert!(matches!(kv.value.value, Value::String(_)));
}

#[test]
fn test_parse_error_recovery() {
    let mut stream = TokenStream::lex("= value").unwrap();
    let result: Result<Spanned<KeyValue>, _> = stream.parse();

    assert!(result.is_err());
    // Verify error message is helpful
    let err = result.unwrap_err();
    assert!(err.to_string().contains("expected"));
}

Round-Trip Testing

Verify parse-then-print produces equivalent output:

#[test]
fn test_roundtrip() {
    let original = "name = \"value\"\ncount = 42";

    let mut stream = TokenStream::lex(original).unwrap();
    let doc: Document = stream.parse().unwrap();

    let mut printer = Printer::new();
    doc.write(&mut printer);
    let output = printer.into_string();

    // Re-parse and compare AST
    let mut stream2 = TokenStream::lex(&output).unwrap();
    let doc2: Document = stream2.parse().unwrap();

    assert_eq!(format!("{:?}", doc), format!("{:?}", doc2));
}

Snapshot Testing

Use insta for golden-file testing:

use insta::assert_snapshot;

#[test]
fn snapshot_complex_document() {
    let input = include_str!("fixtures/complex.toml");
    let mut stream = TokenStream::lex(input).unwrap();
    let doc: Document = stream.parse().unwrap();

    assert_snapshot!(format!("{:#?}", doc));
}

#[test]
fn snapshot_formatted_output() {
    let input = "messy   =   \"spacing\"";
    let doc: Document = parse(input).unwrap();

    let mut printer = Printer::new();
    doc.write(&mut printer);

    assert_snapshot!(printer.into_string());
}

Parameterized Tests

Use test-case for table-driven tests:

use test_case::test_case;

#[test_case("42", Value::Integer(42); "positive integer")]
#[test_case("-17", Value::Integer(-17); "negative integer")]
#[test_case("true", Value::Bool(true); "boolean true")]
#[test_case("false", Value::Bool(false); "boolean false")]
fn test_parse_value(input: &str, expected: Value) {
    let mut stream = TokenStream::lex(input).unwrap();
    let value: Spanned<Value> = stream.parse().unwrap();
    assert_eq!(value.value, expected);
}

Edge Case Testing

Test boundary conditions:

#[test]
fn test_empty_input() {
    let stream = TokenStream::lex("").unwrap();
    assert!(stream.is_empty());
}

#[test]
fn test_whitespace_only() {
    let mut stream = TokenStream::lex("   \t\n  ").unwrap();
    // peek_token skips whitespace
    assert!(stream.peek_token().is_none());
}

#[test]
fn test_max_nesting() {
    let nested = "[".repeat(200) + &"]".repeat(200);
    let result = parse_array(&nested);

    // Should fail with recursion limit error
    assert!(matches!(
        result,
        Err(MyError::RecursionLimit { .. })
    ));
}

#[test]
fn test_unicode_boundaries() {
    // Multi-byte UTF-8: emoji is 4 bytes
    let input = "key = \"hello 🦀 world\"";
    let mut stream = TokenStream::lex(input).unwrap();
    let kv: Spanned<KeyValue> = stream.parse().unwrap();

    // Spans should be valid UTF-8 boundaries
    let slice = &input[kv.span.start()..kv.span.end()];
    assert!(slice.is_char_boundary(0));
}

Fuzz Testing

Setup

Add fuzz targets to your project:

# fuzz/Cargo.toml
[package]
name = "my-parser-fuzz"
version = "0.0.0"
publish = false
edition = "2021"

[package.metadata]
cargo-fuzz = true

[[bin]]
name = "fuzz_lexer"
path = "fuzz_targets/fuzz_lexer.rs"
test = false
doc = false
bench = false

[[bin]]
name = "fuzz_parser"
path = "fuzz_targets/fuzz_parser.rs"
test = false
doc = false
bench = false

[dependencies]
libfuzzer-sys = "0.4"
my-parser = { path = ".." }

Lexer Fuzzing

// fuzz/fuzz_targets/fuzz_lexer.rs
#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        // Should never panic
        let _ = my_parser::TokenStream::lex(s);
    }
});

Parser Fuzzing

// fuzz/fuzz_targets/fuzz_parser.rs
#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        if let Ok(mut stream) = my_parser::TokenStream::lex(s) {
            // Parse should never panic, only return errors
            let _: Result<Document, _> = stream.parse();
        }
    }
});

Running Fuzzers

# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz

# Run lexer fuzzer
cargo +nightly fuzz run fuzz_lexer

# Run with timeout and iterations
cargo +nightly fuzz run fuzz_parser -- -max_total_time=60

# Run with corpus
cargo +nightly fuzz run fuzz_parser corpus/parser/

Integration Testing

Test complete workflows:

#[test]
fn test_parse_real_file() {
    let content = std::fs::read_to_string("fixtures/config.toml").unwrap();
    let doc = parse(&content).expect("should parse real config file");

    // Verify expected structure
    assert!(doc.get_table("server").is_some());
    assert!(doc.get_value("server.port").is_some());
}

Benchmarking

Use divan or criterion for performance testing:

use divan::Bencher;

#[divan::bench]
fn bench_lex_small(bencher: Bencher) {
    let input = include_str!("fixtures/small.toml");
    bencher.bench(|| TokenStream::lex(input).unwrap());
}

#[divan::bench(args = [100, 1000, 10000])]
fn bench_lex_lines(bencher: Bencher, lines: usize) {
    let input = "key = \"value\"\n".repeat(lines);
    bencher.bench(|| TokenStream::lex(&input).unwrap());
}

CI Configuration

Example GitHub Actions workflow:

name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo test --all-features

  fuzz:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@nightly
      - run: cargo install cargo-fuzz
      - run: cargo +nightly fuzz run fuzz_lexer -- -max_total_time=30
      - run: cargo +nightly fuzz run fuzz_parser -- -max_total_time=30

Test Coverage

Use cargo-llvm-cov for coverage reports:

cargo install cargo-llvm-cov
cargo llvm-cov --html
open target/llvm-cov/html/index.html

Aim for high coverage on:

All token variants
All AST node types
Error paths
Edge cases (empty, whitespace, limits)

Keyboard shortcuts

synkit