Deep Dive Research · Compiler Infrastructure

Multi-Target
Compiler Architecture

A comprehensive exploration of modern compiler design implementing lexical analysis with SLY, LALR(1) parsing with AST generation, semantic type checking with symbol tables, stack-based intermediate code generation, and multiple backend targets including LLVM native code, WebAssembly binary encoding, Python transpilation, and direct interpretation.

Source Lexer Parser AST Checker IR Backend
9
Project Phases
4
Backend Targets
50+
IR Opcodes
4
Data Types
10
WASM Sections

Language Design

Source Language Specification

A statically-typed systems programming language with direct memory access.

Type System

The language features four built-in primitive types with explicit sizing for predictable performance across all backend targets:

TypeSizeDescriptionExample
int32-bitSigned integervar x int = 42;
float64-bitIEEE 754 double precisionvar pi float = 3.14159;
char8-bitSingle byte charactervar c char = 'A';
bool1-bitBoolean true/falsevar flag bool = true;
No Automatic Type Coercion

Integers and floats cannot be mixed in expressions. Operations require both operands to be the same type. For x / y to be legal, both x and y must match. The result type is always the same as the operands. Integer division truncates. Explicit type casting functions must be used for conversions.

Lexical Conventions

The language follows standard lexical conventions for identifiers, literals, and comments:

  • Statements: Terminated by semicolons (print 3;)
  • Single-line comments: // This is a comment
  • Multi-line comments: /* This spans multiple lines */
  • Identifiers: Letters, numbers, underscore; must start with non-numeric
  • Character literals: Single quotes with escape codes ('h', '\n', '\xhh')

Reserved words: const else import false func if print return true while var break continue

Operators and Precedence

All operators are left-associative. Precedence from highest to lowest:

precedence_rules.txt
// Highest precedence
Unary:     +  -  !          // Unary plus, minus, logical negation
Factor:    *  /             // Multiplication, division
Term:      +  -             // Addition, subtraction
Relation:  < <= > >= == !=  // Comparisons (non-associative!)
Logic:     &&               // Logical AND
Logic:     ||               // Logical OR (lowest precedence)

Critical: Relational operators may NOT be chained. a < b < c is illegal; use a < b && b < c instead. Logical operators implement short-circuit evaluation: in a && b, if a is false, b is not evaluated.

Control Flow

The language supports structured control flow with if-else conditionals and while loops. The conditional expression must evaluate to type bool:

control_flow.src
// If-else statement (else is optional)
if (a < b) {
    minval = a;
} else {
    minval = b;
}

// While loop with break and continue
var n int = 0;
while (n < 10) {
    if (n == 5) {
        break;      // Exit loop early
    }
    print n;
    n = n + 1;
}

Functions

Functions are defined with explicit parameter and return types. External functions can be imported for runtime integration:

functions.src
// External function import (resolved by linker/loader)
import func sin(x float) float;

// Recursive function definition
func fibonacci(n int) int {
    if (n <= 2) {
        return 1;
    } else {
        return fibonacci(n-1) + fibonacci(n-2);
    }
}

// Entry point: main() takes no args, returns int
func main() int {
    var i int = 0;
    while (i < 20) {
        print fibonacci(i);
        i = i + 1;
    }
    return 0;
}
Scoping Rules

Global scope: Declarations outside functions are visible to the entire program.

Local scope: Declarations inside functions are only visible within that function.

No nested functions: Function definitions cannot appear inside other functions.

Initialization order: Global variable initializations execute before main().

Direct Memory Access

The language provides direct memory access using the backtick (`) operator. Memory addresses are integers, and the type is inferred from context:

memory_access.src
// Store values at memory addresses
`128 = 45;        // Store int 45 at address 128
`128 = 4.5;       // Store float 4.5 at address 128
`128 = 'x';       // Store byte 'x' at address 128

// Write array to memory (4 bytes per int)
var addr int = 1000;
var n int = 0;
while (n < 100) {
    `(addr + n*4) = n;
    n = n + 1;
}

// Read and sum from memory
var total int = 0;
n = 0;
while (n < 100) {
    total = total + `(addr + n*4);
    n = n + 1;
}
print total;  // 4950

Phase 1

Lexical Analysis with SLY

Breaking source code into tokens using regex-based pattern matching.

What is Lexical Analysis?

The lexer (tokenizer) is the first phase of compilation. It transforms raw source text into a stream of tokens—the atomic units of the language's grammar. For example:

tokenization_example.txt
// Input source:
a = 3 + (4 * 5)

// Output tokens:
ID('a') ASSIGN NUMBER(3) PLUS LPAREN NUMBER(4) TIMES NUMBER(5) RPAREN

Using SLY (Sly Lex Yacc)

The compiler uses the SLY framework for lexing. Lexers are defined as classes where token patterns are specified as regular expressions:

tokenizer.py
from sly import Lexer

class SimpleLexer(Lexer):
    # Token names (terminals)
    tokens = { NUMBER, ID, PLUS, TIMES, LPAREN, RPAREN,
               ASSIGN, LT, LE, GT, GE, EQ, NE, IF, ELSE, WHILE }
    
    # Ignored characters (whitespace)
    ignore = ' \t'
    
    # Token patterns (order matters for overlapping patterns!)
    NUMBER = r'\d+'
    ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
    PLUS = r'\+'
    TIMES = r'\*'
    LPAREN = r'\('
    RPAREN = r'\)'
    ASSIGN = r'='
    
    # Multi-character operators (must come before single-char!)
    LE = r'<='
    GE = r'>='
    EQ = r'=='
    NE = r'!='
    LT = r'<'
    GT = r'>'
    
    # Reserved words (remap identifiers)
    ID['if'] = IF
    ID['else'] = ELSE
    ID['while'] = WHILE
    
    # Track line numbers
    @_(r'\n+')
    def ignore_newline(self, t):
        self.lineno += t.value.count('\n')
    
    # Error handling
    def error(self, t):
        print(f'Bad character {t.value[0]!r} at line {self.lineno}')
        self.index += 1
Pattern Ordering Critical

SLY matches patterns in the order defined. Multi-character operators like <= must be defined before single-character <, otherwise <= would be tokenized as LT followed by ASSIGN.

Token Output

Each token carries its type, value, line number, and character index:

token_output.txt
Token(type='IF', value='if', lineno=2, index=12)
Token(type='ID', value='a', lineno=2, index=15)
Token(type='LT', value='<', lineno=2, index=17)
Token(type='ID', value='b', lineno=2, index=19)
Token(type='LE', value='<=', lineno=3, index=31)

Phase 2

Parsing & Abstract Syntax Trees

Transforming token streams into hierarchical tree structures.

What is Parsing?

The parser consumes the token stream and produces an Abstract Syntax Tree (AST)—a hierarchical representation of the program's structure. Consider this expression:

ast_example.txt
// Source: a = 2 + 3 * (4 + 5)
// AST Structure:

              =
            /   \
          "a"    +
               /   \
              2     *
                  /   \
                 3     +
                     /   \
                    4     5

Grammar Rules in SLY

Grammar rules are specified using decorated methods. The decorator gives the right-hand side, and the method name is the left-hand side:

parser.py
from sly import Parser

class SimpleParser(Parser):
    tokens = SimpleLexer.tokens
    
    # Grammar:  expr ::= expr PLUS term | term
    @_('expr PLUS term')
    def expr(self, p):
        return BinOp('+', p.expr, p.term)
    
    @_('term')
    def expr(self, p):
        return p.term
    
    # Grammar:  term ::= term TIMES factor | factor
    @_('term TIMES factor')
    def term(self, p):
        return BinOp('*', p.term, p.factor)
    
    @_('factor')
    def term(self, p):
        return p.factor
    
    # Grammar:  factor ::= LPAREN expr RPAREN | NUMBER | ID
    @_('LPAREN expr RPAREN')
    def factor(self, p):
        return p.expr
    
    @_('NUMBER')
    def factor(self, p):
        return Number(p.NUMBER)
    
    @_('ID')
    def factor(self, p):
        return Identifier(p.ID)

AST Node Classes

Each construct in the language has a corresponding AST node class:

ast.py
class AST:
    pass

class Assignment(AST):
    def __init__(self, location, value):
        self.location = location
        self.value = value

class BinOp(AST):
    def __init__(self, operator, left, right):
        self.operator = operator
        self.left = left
        self.right = right

class Number(AST):
    def __init__(self, value):
        self.value = value

class Identifier(AST):
    def __init__(self, name):
        self.name = name

class IfStatement(AST):
    def __init__(self, condition, then_body, else_body):
        self.condition = condition
        self.then_body = then_body
        self.else_body = else_body

class WhileLoop(AST):
    def __init__(self, condition, body):
        self.condition = condition
        self.body = body

class FunctionDef(AST):
    def __init__(self, name, params, return_type, body):
        self.name = name
        self.params = params
        self.return_type = return_type
        self.body = body
First Rule = Top of Grammar

SLY treats the first decorated rule (first method with @_(...)) as the top of the grammar. This should be your highest-level construct, typically representing an entire program or module.

Phase 3

Type Checking & Semantic Analysis

Validating program correctness before code generation.

The Visitor Pattern

Type checking traverses the AST using the Visitor pattern. A base NodeVisitor class dispatches to specific visit_* methods based on node type:

checker.py
class TypeChecker(NodeVisitor):
    def __init__(self):
        self.symbols = {}  # Symbol table: name → type
    
    def visit_Num(self, node):
        node.type = 'int'  # Annotate with type
    
    def visit_Name(self, node):
        if node.id not in self.symbols:
            print(f'Error: {node.id} not defined')
        else:
            node.type = self.symbols[node.id]
    
    def visit_BinOp(self, node):
        self.visit(node.left)
        self.visit(node.right)
        left_type = getattr(node.left, 'type', None)
        right_type = getattr(node.right, 'type', None)
        
        if left_type != right_type:
            print(f'Error: Type mismatch {left_type} vs {right_type}')
            node.type = 'error'
        else:
            node.type = left_type

Symbol Tables

The compiler maintains symbol tables mapping identifiers to their types. With functions, two-level scoping is needed:

symbol_tables.py
class TypeChecker(NodeVisitor):
    def __init__(self):
        self.global_symbols = {}  # Global scope
        self.local_symbols = {}   # Local scope (inside function)
        self.in_function = False
    
    def lookup(self, name):
        # Check local first, then global
        if name in self.local_symbols:
            return self.local_symbols[name]
        return self.global_symbols.get(name)
    
    def define(self, name, type_info):
        if self.in_function:
            self.local_symbols[name] = type_info
        else:
            self.global_symbols[name] = type_info

Type Capabilities

A table defines which operations are valid for each type combination:

type_capabilities.py
# (left_type, operator, right_type) → result_type
supported_binops = {
    ('int', '+', 'int'): 'int',
    ('int', '-', 'int'): 'int',
    ('int', '*', 'int'): 'int',
    ('int', '/', 'int'): 'int',
    ('float', '+', 'float'): 'float',
    ('float', '-', 'float'): 'float',
    ('float', '*', 'float'): 'float',
    ('float', '/', 'float'): 'float',
    ('int', '<', 'int'): 'bool',
    ('float', '<', 'float'): 'bool',
    # ... etc
}
Semantic Checks Performed

Undefined names: Using a variable before declaration

Type mismatches: int + float without explicit cast

Invalid operations: char * char not supported

Function signatures: Argument count and types must match

Return type: Returned value must match declared return type

Execution Target

Stack Machine Interpreter

Direct execution of IR code on a simulated machine.

Interpreter Architecture

The interpreter simulates a stack machine with memory, a value stack, and a program counter:

interpreter.py
class Interpreter:
    def __init__(self):
        self.store = {}    # Variable storage
        self.stack = []    # Operand stack
        self.pc = 0        # Program counter
    
    def run(self, code):
        self.pc = 0
        while self.pc < len(code):
            op, *opargs = code[self.pc]
            getattr(self, f'run_{op}')(*opargs)
            self.pc += 1
    
    def push(self, item):
        self.stack.append(item)
    
    def pop(self):
        return self.stack.pop()
    
    # Variable operations
    def run_VARI(self, name):
        self.store[name] = None
    
    def run_CONSTI(self, value):
        self.push(value)
    
    def run_STORE(self, name):
        self.store[name] = self.pop()
    
    def run_LOAD(self, name):
        self.push(self.store[name])
    
    # Arithmetic operations
    def run_ADDI(self):
        self.push(self.pop() + self.pop())
    
    def run_MULI(self):
        self.push(self.pop() * self.pop())
    
    def run_SUBI(self):
        right = self.pop()
        left = self.pop()
        self.push(left - right)  # Order matters!
    
    # I/O
    def run_PRINTI(self):
        print(self.pop())

Example Execution

The following source code and its corresponding IR execution:

execution_trace.txt
// Source:
var x int = 4;
var y int = 5;
var d int = x * x + y * y;
print d;

// IR Code:
[('VARI', 'x'),     // Declare x
 ('CONSTI', 4),     // Push 4
 ('STORE', 'x'),     // x = 4
 ('VARI', 'y'),     // Declare y
 ('CONSTI', 5),     // Push 5
 ('STORE', 'y'),     // y = 5
 ('VARI', 'd'),     // Declare d
 ('LOAD', 'x'),      // Push x (4)
 ('LOAD', 'x'),      // Push x (4)
 ('MULI',),         // Pop 4,4 → Push 16
 ('LOAD', 'y'),      // Push y (5)
 ('LOAD', 'y'),      // Push y (5)
 ('MULI',),         // Pop 5,5 → Push 25
 ('ADDI',),         // Pop 16,25 → Push 41
 ('STORE', 'd'),     // d = 41
 ('LOAD', 'd'),      // Push d (41)
 ('PRINTI',)]       // Print 41

// Output: 41
Why Interpret?

Direct interpretation enables rapid development and testing without compilation. This is how Python, Ruby, and PHP work—simulation is a core aspect of dynamic typing. The tradeoff is performance: interpreted code runs substantially slower than compiled native code.

Intermediate Representation

Stack Machine IR Design

The portable bridge between frontend and backends.

Design Philosophy

No Registers: Operands are implicitly on the stack—no register allocation needed.

Structured Control: IF/ELSE/ENDIF and LOOP/CBREAK/ENDLOOP—no goto or labels.

WebAssembly Alignment: Near-direct translation to Wasm's stack-based model.

Complete IR Opcode Reference

CategoryOpcodesStack Effect
ConstantsCONSTI, CONSTF→ value
ArithmeticADDI/F, SUBI/F, MULI/F, DIVI/Fa, b → result
ComparisonLTI/F, LEI/F, GTI/F, GEI/F, EQI/F, NEI/Fa, b → bool
LogicANDI, ORIa, b → result
Local VarsVARI, VARF— (declare)
Global VarsGLOBALI, GLOBALF— (declare)
Load/StoreLOAD, STORE→ value / value →
ConversionITOF, FTOIvalue → converted
I/OPRINTI, PRINTF, PRINTBvalue → (output)
MemoryGROW, PEEKI/F/B, POKEI/F/Bvaries
ControlIF, ELSE, ENDIFcond → (branch)
LoopsLOOP, CBREAK, ENDLOOPvaries
FunctionsCALL, RETURNargs → result

Structured Control Flow

The IR uses structured control flow without goto statements or labels:

structured_control.txt
// If-statement pattern:
... evaluate condition ...
('IF',)
... consequent code ...
('ELSE',)
... alternative code ...
('ENDIF',)

// While-loop pattern:
('LOOP',)
... evaluate test (invert condition!) ...
('CBREAK',)          // Break if test is true
... loop body ...
('ENDLOOP',)         // Jump back to LOOP

Deep Dive

WebAssembly Binary Encoding

Direct .wasm generation without external tools.

What is WebAssembly?

WebAssembly (Wasm) is a binary instruction format designed as a portable compilation target. It runs in web browsers at near-native speed in a memory-safe, sandboxed environment. Key characteristics:

Key Characteristics

Binary Format: Compact binary encoding enables faster parsing than JavaScript text.

Stack Machine: Operations pop operands from and push results onto an implicit stack.

Structured Control: Uses block/loop/if instructions—no arbitrary goto.

Four Value Types: i32, i64, f32, f64—that's it!

LEB128 Variable-Length Encoding

WebAssembly uses LEB128 (Little Endian Base 128) for integers. Each byte uses 7 bits for data and 1 bit as a continuation flag:

leb128.py
def encode_unsigned(value):
    '''LEB128 unsigned integer encoding.'''
    parts = []
    for i in range((value.bit_length() // 7) + 1):
        parts.append((value & 0x7f) | 0x80)  # 7 bits + continuation
        value >>= 7
    if not parts:
        parts.append(0)
    parts[-1] &= 0x7f  # Clear continuation on last byte
    return bytes(parts)

def encode_signed(value):
    '''LEB128 signed integer encoding.'''
    if value >= 0:
        return encode_unsigned(value)
    else:
        value = (1 << (value.bit_length() + (7 - value.bit_length() % 7))) + value
        return encode_unsigned(value)

def encode_f64(value):
    '''Encode 64-bit float as little endian.'''
    import struct
    return struct.pack(', value)

# Examples:
# encode_unsigned(0)    → b'\x00'       (1 byte)
# encode_unsigned(127)  → b'\x7f'       (1 byte)
# encode_unsigned(128)  → b'\x80\x01'   (2 bytes)
# encode_unsigned(624485) → b'\xe5\x8e\x26' (3 bytes)

Module Structure

A .wasm file begins with magic number + version, followed by numbered sections:

SectionIDContents
Type1Function type signatures (param/return types)
Import2External functions (JavaScript FFI)
Function3Function declarations (type indices only)
Memory5Linear memory specification (pages × 64KB)
Global6Global variables with initializers
Export7Names visible to JavaScript host
Start8Entry point function index
Code10Function bodies (locals + instructions)

Value Type Encoding

i320x7F
i640x7E
f320x7D
f640x7C

Instruction Opcodes

Arithmetic

i32.add0x6A
i32.sub0x6B
i32.mul0x6C
i32.div_s0x6D
f64.add0xA0
f64.sub0xA1
f64.mul0xA2
f64.div0xA3

Comparisons

i32.eq0x46
i32.ne0x47
i32.lt_s0x48
i32.gt_s0x4A
i32.le_s0x4C
i32.ge_s0x4E
f64.lt0x63
f64.gt0x64
f64.le0x65
f64.ge0x66

Control Flow

block0x02
loop0x03
if0x04
else0x05
end0x0B
br0x0C
br_if0x0D
return0x0F
call0x10

Variables & Memory

local.get0x20
local.set0x21
global.get0x23
global.set0x24
i32.load0x28
f64.load0x2B
i32.store0x36
f64.store0x39

Constants

i32.const0x41 + LEB128
f64.const0x44 + 8 bytes

Wasm Loop Encoding

Loops require an outer block for break targets:

wasm_loop.txt
block {              // Outer block for break target
  loop {             // Inner loop for continue target
    ... evaluate test ...
    br_if 1          // Break to enclosing block if true
    ... loop body ...
    br 0             // Jump back to loop start
  }
  // br_if 1 targets here
}
No Text Names in Wasm

Everything in WebAssembly uses numeric indices—no strings! Functions, locals, globals, and types are all referenced by index. The compiler must maintain mapping tables from names to indices during code generation.

Deep Dive

LLVM Code Generation

High-performance native code via llvmlite.

LLVM Architecture

LLVM is a modular compiler infrastructure that provides a target-independent intermediate representation. The same IR can compile to x86, ARM, RISC-V, or WebAssembly.

Key Concepts

SSA Form: Static Single Assignment—each variable is assigned exactly once. This enables powerful optimizations like constant propagation and dead code elimination.

Basic Blocks: Sequences of instructions with single entry and exit points, connected by explicit branch instructions.

Not a Stack Machine: Unlike our IR, LLVM uses registers. Operations produce new temporaries that must be explicitly stored.

LLVM Module Setup

llvm_setup.py
from llvmlite.ir import (
    Module, Function, FunctionType, IRBuilder,
    IntType, DoubleType, VoidType, Constant
)

# Create module and types
mod = Module('mycompiler')
int_type = IntType(32)      # i32
float_type = DoubleType()   # f64
void_type = VoidType()

# Declare a function: int hello()
hello_func = Function(mod, 
    FunctionType(int_type, []), 
    name='hello')

# Add a basic block and builder
block = hello_func.append_basic_block('entry')
builder = IRBuilder(block)

# Return a constant
builder.ret(Constant(int_type, 42))

Stack-to-SSA Translation

The core challenge: translating stack-based IR to SSA form. A value stack tracks intermediate LLVM values:

llvm_codegen.py
class GenerateLLVM:
    def __init__(self):
        self.stack = []        # LLVM value stack
        self.locals = {}       # Local variable pointers
        self.globals = {}      # Global variable pointers
        self.block_stack = []  # Control flow blocks
    
    def push(self, value):
        self.stack.append(value)
    
    def pop(self):
        return self.stack.pop()
    
    def emit_ADDI(self):
        right = self.pop()
        left = self.pop()
        result = self.builder.add(left, right)  # SSA!
        self.push(result)
    
    def emit_MULI(self):
        right = self.pop()
        left = self.pop()
        self.push(self.builder.mul(left, right))
    
    def emit_ADDF(self):
        right = self.pop()
        left = self.pop()
        self.push(self.builder.fadd(left, right))  # Float add

Variable Storage

Local variables use alloca for stack slots. Globals use GlobalVariable:

variables.py
from llvmlite.ir import GlobalVariable

def emit_VARI(self, name):
    '''Declare local integer variable.'''
    self.locals[name] = self.builder.alloca(int_type, name=name)

def emit_GLOBALI(self, name):
    '''Declare global integer variable.'''
    var = GlobalVariable(self.module, int_type, name=name)
    var.initializer = Constant(int_type, 0)
    self.globals[name] = var

def emit_LOAD(self, name):
    '''Load variable onto stack.'''
    ptr = self.locals.get(name) or self.globals.get(name)
    self.push(self.builder.load(ptr, name))

def emit_STORE(self, name):
    '''Store top of stack to variable.'''
    ptr = self.locals.get(name) or self.globals.get(name)
    self.builder.store(self.pop(), ptr)

Comparison Operations

LLVM comparisons return i1 (1-bit bool). Zero-extend to i32 for stack compatibility:

comparisons.py
def emit_LTI(self):
    '''Integer less-than comparison.'''
    right = self.pop()
    left = self.pop()
    # icmp returns i1 (1-bit bool)
    result = self.builder.icmp_signed('<', left, right)
    # Zero-extend i1 → i32 for stack
    self.push(self.builder.zext(result, IntType(32)))

def emit_LTF(self):
    '''Float less-than comparison.'''
    right = self.pop()
    left = self.pop()
    result = self.builder.fcmp_ordered('<', left, right)
    self.push(self.builder.zext(result, IntType(32)))

# Available comparisons:
# icmp_signed: '<', '<=', '>', '>=', '==', '!='
# fcmp_ordered: '<', '<=', '>', '>=', '==', '!='

Control Flow with Basic Blocks

Control flow creates new basic blocks. A block stack tracks branch targets:

control_flow.py
def emit_IF(self):
    then_block = self.function.append_basic_block('then')
    else_block = self.function.append_basic_block('else')
    exit_block = self.function.append_basic_block('endif')
    
    # Truncate i32 → i1 for branch condition
    cond = self.builder.trunc(self.pop(), IntType(1))
    self.builder.cbranch(cond, then_block, else_block)
    self.builder.position_at_end(then_block)
    self.block_stack.append((then_block, else_block, exit_block))

def emit_ELSE(self):
    _, else_block, exit_block = self.block_stack[-1]
    self.builder.branch(exit_block)  # End of then-block
    self.builder.position_at_end(else_block)

def emit_ENDIF(self):
    _, _, exit_block = self.block_stack.pop()
    self.builder.branch(exit_block)
    self.builder.position_at_end(exit_block)

def emit_LOOP(self):
    loop_block = self.function.append_basic_block('loop')
    exit_block = self.function.append_basic_block('endloop')
    self.builder.branch(loop_block)
    self.builder.position_at_end(loop_block)
    self.block_stack.append((loop_block, exit_block))

External Function Calls

I/O and runtime functions are implemented in C and linked:

runtime.py
# Declare external C function: void _print_int(int x)
_print_int = Function(mod,
    FunctionType(void_type, [int_type]),
    name='_print_int')

def emit_PRINTI(self):
    self.builder.call(_print_int, [self.pop()])

def emit_CALL(self, name):
    func = self.globals[name]
    # Pop args in reverse order
    args = [self.pop() for _ in range(len(func.args))][::-1]
    result = self.builder.call(func, args)
    self.push(result)
Every Block Must Terminate

LLVM requires every basic block to end with a terminator instruction (ret, br, or cbranch). Forgetting this causes cryptic errors. Also, you cannot add instructions after a terminator—they're unreachable.

Compiler Phases

Multi-Phase Architecture

Nine project phases building a complete compiler pipeline.

📝

1. Lexical Analysis

Tokenization using SLY framework. Recognizes keywords, identifiers, literals (int, float, char with escape codes), operators, and handles comments. Tracks line numbers for error reporting.

🌳

2. Syntax Parsing

LALR(1) parser generating typed Abstract Syntax Trees. Enforces operator precedence, associativity, and non-associativity for relational operators.

3. Type Checking

Semantic validation with two-level symbol tables (global/local). Type compatibility, undefined reference detection, function signature validation.

⚙️

4. IR Generation

Stack-based intermediate representation with typed opcodes. Target-independent code enabling multiple backends.

▶️

5. Interpreter

Direct IR execution for rapid development. Stack-based VM with simulated memory, call frames, and control flow.

🔧

6-7. LLVM Backend

Native code via llvmlite. SSA form, basic blocks, comparison with zero-extension, C runtime integration.

🌐

8. WebAssembly

Direct binary .wasm encoding. LEB128, module sections, type signatures, JavaScript FFI.