Understanding Spans
Spans are a fundamental concept in the Huff Neo compiler that track the source location of tokens, AST nodes, and bytecode. They enable accurate error reporting and source mapping for debugging.
What is a Span?
A Span
represents a range of characters in the source code:
pub struct Span {
pub start: usize, // Byte offset where the span starts
pub end: usize, // Byte offset where the span ends (exclusive)
pub file: Option<Arc<FileSource>>, // Reference to the source file
}
Exclusive End Convention
Huff Neo uses exclusive end positions, following Rust's Range
convention:
start
points to the first character of the spanend
points to the position after the last character- This matches Rust's range syntax:
source[start..end]
Example
Given the source code:
#define macro MAIN() = takes(0) returns (0) {
0x42
add
}
The span for the token 0x42
would be:
- Text:
"0x42"
- Start: 50 (position of '0')
- End: 54 (position after '2')
- Length: 4 characters
- Extract with:
source[50..54]
→"0x42"
Working with Spans
Creating Spans
The lexer creates spans as it tokenizes:
// In the lexer, when we recognize a token:
let start = self.position; // Current position
let token_text = self.eat_while(|c| c.is_alphanumeric());
let end = self.position + token_text.len(); // Exclusive end
Token {
kind: TokenKind::Ident(token_text),
span: Span::new(start..end, file)
}
Using Spans for Error Reporting
Spans enable precise error messages:
// When reporting an error:
return Err(CodegenError {
kind: CodegenErrorKind::UnmatchedJumpLabel,
span: label.span, // Points exactly to the problematic label
token: Some(label.clone()),
});
This produces output like:
Error: Unmatched jump label
--> example.huff:5:8
|
5 | unknown_label jumpi
| ^^^^^^^^^^^^^
Span Preservation During Compilation
Spans flow through the compilation pipeline:
- Lexer → Creates initial spans for tokens
- Parser → Combines token spans into AST node spans
- Codegen → Preserves spans when generating bytecode
- Source Maps → Maps bytecode positions back to source spans
Special Cases
Macro Expansion
When expanding macros, we preserve the span of the actual operation, not the invocation:
#define macro ADD() = takes(2) returns (1) {
add // Span points here
}
#define macro MAIN() = takes(0) returns (0) {
0x01
0x02
ADD() // NOT here - we want the span of 'add', not 'ADD()'
}
The source map for the add
opcode points to line 2, not line 8.
Constants and Arguments
For constants and macro arguments, the span includes the brackets:
#define constant VALUE = 0x42
#define macro MAIN() = takes(0) returns (0) {
[VALUE] // Span covers all of "[VALUE]" including brackets
}
Jump Labels
Jump labels have two different spans:
- Label definition:
label:
- Span covers "label:" - Label reference:
label jumpi
- Span covers just "label"
Common Pitfalls
Off-by-One Errors
Remember that end
is exclusive:
// WRONG - treats end as inclusive
let text = &source[span.start..=span.end];
// CORRECT - end is exclusive
let text = &source[span.start..span.end];
Multi-byte Characters
Spans use byte offsets, not character counts. Be careful with UTF-8:
// The lexer handles this correctly by using byte positions
let end_position = self.position + char.len_utf8();
Empty Spans
Empty spans (where start == end
) are valid and used for:
- EOF tokens
- Inserted/synthetic tokens
- Zero-width positions