---
name: compiler-design
description: >
  Use this skill for designing and implementing compilers, assemblers, and language processors
  specifically for or targeting the Commodore 64 and 128. Covers lexical analysis, parsing,
  code generation for 6510, cross-compilation, and building assemblers in BASIC or ML.
  Sources: Compiler Design and Implementation (64 and 128), COMPUTE!'s SpeedScript source.
---

# Compiler Design and Implementation for the C64/128

## Overview

Writing a compiler or assembler on (or for) the Commodore 64 requires special consideration for:
- **Memory constraints**: 64KB total, ~38KB for BASIC programs, ~4KB free upper RAM
- **Speed**: 1 MHz CPU — interpreter overhead is expensive; native code is essential
- **Target architecture**: 6510 with its register-sparse, accumulator-centric ISA
- **Output format**: C64 `.prg` files (2-byte load address header + raw binary)

---

## Assembler Design on the C64

### Two-Pass Assembly

Standard two-pass technique (used by all period C64 assemblers):

**Pass 1 — Symbol collection**:
1. Scan source code line by line
2. Record each label and its current address in a symbol table
3. Count instruction bytes to advance the address counter
4. Do NOT produce output

**Pass 2 — Code generation**:
1. Re-scan source code
2. Look up all labels/symbols in the symbol table
3. Emit object bytes to output buffer or file
4. Report unresolved symbols as errors

### Symbol Table Implementation

For a C64 assembler written in BASIC or ML, the symbol table is typically:
- A sorted array of name/value pairs (use binary search for speed)
- Names limited to 6-8 characters to conserve memory
- Values stored as 16-bit addresses

```basic
' Simple symbol table in BASIC (array-based)
DIM SYMNAM$(100)   ' symbol names
DIM SYMVAL%(100)   ' 16-bit values (integer)
NSYM% = 0

' Add symbol
SYMNAM$(NSYM%) = NAME$
SYMVAL%(NSYM%) = VALUE%
NSYM% = NSYM% + 1

' Find symbol (linear search)
FOR I = 0 TO NSYM%-1
  IF SYMNAM$(I) = NAME$ THEN FOUND = SYMVAL%(I) : GOTO FOUND_LABEL
NEXT I
' not found
```

### 6510 Instruction Encoding

Each instruction consists of an opcode byte followed by 0, 1, or 2 operand bytes.
The assembler must map mnemonic + addressing mode → opcode byte.

```
Encoding pattern for most 6502 instructions:
Bits 7-5: instruction group
Bits 4-2: addressing mode
Bits 1-0: instruction select within group

Groups:
  aaa=000: BIT, JMP, JMP(), STY, LDY, CPY, CPX
  aaa=001: ORA, AND, EOR, ADC, STA, LDA, CMP, SBC
  aaa=010: ASL, ROL, LSR, ROR, STX, LDX, DEC, INC

Addressing mode encoding (for group 01):
  000: (zp,X)    001: zp      010: #imm    011: abs
  100: (zp),Y   101: zp,X    110: abs,Y   111: abs,X
```

### Expression Evaluator

An assembler's expression evaluator handles operands like `LABEL+2`, `$C000+OFFSET`, `>ADDR`:

```basic
' Operators to support:
' + - * / (arithmetic)
' AND OR EOR NOT (bitwise)
' < (low byte), > (high byte)
' Precedence: NOT > */  > +- > AND > OR/EOR
```

### Forward Reference Handling

When a label is referenced before its definition:
1. Emit a placeholder byte (typically $00 $00)
2. Record the location and label name in a fixup table
3. After Pass 1 completes, apply fixups using the resolved symbol table

---

## Lexical Analysis (Tokenizer)

The first stage of any compiler — breaking source into tokens.

### Token Types for an Assembler
```
LABEL      — identifier followed by ':'
OPCODE     — recognized mnemonic (LDA, STA, etc.)
DIRECTIVE  — assembler directive (.BYTE, .WORD, .TEXT, *= etc.)
NUMBER     — decimal ($nnn hex, %nnn binary, 'c' char literal)
STRING     — quoted text "..."
OPERATOR   — + - * / < > = ( )
COMMA      — ,
NEWLINE    — end of logical line
EOF        — end of input
```

### Efficient Lexer in ML

```asm
; Simple character classifier for assembler lexer
; Input: A = character
; Output: A = token class (0=whitespace, 1=alpha, 2=digit, 3=operator, 4=EOL)

CLASSIFY:
        CMP #$20        ; space
        BEQ IS_SPACE
        CMP #$0D        ; CR
        BEQ IS_EOL
        CMP #$30        ; '0'
        BMI IS_OP
        CMP #$3A        ; past '9'
        BMI IS_DIGIT
        CMP #$41        ; 'A'
        BMI IS_OP
        CMP #$5B        ; past 'Z'
        BMI IS_ALPHA
        ; default: operator
IS_OP   LDA #3 : RTS
IS_SPACE LDA #0 : RTS
IS_EOL  LDA #4 : RTS
IS_DIGIT LDA #2 : RTS
IS_ALPHA LDA #1 : RTS
```

---

## Parser Design (Recursive Descent)

A recursive-descent parser is ideal for the C64's memory constraints because:
- Small code size (each rule is a subroutine)
- No separate parse table needed
- Easy to hand-code in ML

### Grammar for a Simple BASIC-like Language

```
program     → statement*
statement   → LET var '=' expr NEWLINE
            | PRINT expr NEWLINE  
            | IF expr THEN statement
            | GOTO number
            | FOR var '=' expr TO expr [STEP expr]
            | NEXT [var]
            | END

expr        → term (('+' | '-') term)*
term        → factor (('*' | '/') factor)*
factor      → NUMBER | STRING | VAR | '(' expr ')' | '-' factor | NOT factor
```

### Parser Subroutine Template (ML)
```asm
; Parse an expression; result in FAC1 (using BASIC math)
; Returns with carry set on error

PARSE_EXPR:
        JSR PARSE_TERM      ; parse first term
        BCS PERR
EXPR_LOOP:
        JSR PEEK_TOKEN      ; look at next token
        CMP #TOK_PLUS
        BEQ EXPR_ADD
        CMP #TOK_MINUS
        BEQ EXPR_SUB
        RTS                 ; done: no more + or -
EXPR_ADD:
        JSR NEXT_TOKEN      ; consume '+'
        JSR SAVE_FAC1       ; save left side
        JSR PARSE_TERM
        BCS PERR
        JSR FADD            ; FAC1 = left + FAC1
        BCC EXPR_LOOP
PERR    SEC : RTS
```

---

## Code Generation for 6510

### Register Allocation Strategy

The 6510 has only 3 registers (A, X, Y) and no general-purpose registers. Effective code generation strategies:

1. **Accumulator-primary**: Keep the most recent value in A; use X/Y for indices and loop counters
2. **Zero-page variables**: Allocate frequently used compiler temporaries in zero page ($FB-$FE free)
3. **Stack-based expression evaluation**: For complex expressions, use the hardware stack (PHA/PLA)
4. **Inline vs. subroutine**: For short sequences (≤8 bytes), inline is faster; longer sequences justify JSR

### Expression Code Generation Example

For `A + B * C` (where A, B, C are zero-page variables):
```asm
; Generated code for: A + B * C
        LDA B           ; load B
        STA TEMP        ; save
        LDA C           ; load C
        ; multiply TEMP * A (need ML multiply routine)
        JSR MULTIPLY    ; result in A (low byte)
        CLC
        ADC A_VAR       ; add A
        STA RESULT
```

### Peephole Optimization

Common optimizations for 6510 code generation:

| Pattern | Optimized |
|---------|-----------|
| `STA $xx; LDA $xx` | `STA $xx` (remove redundant load) |
| `LDA #0` | `LDA #0` → prefer `AND #0` when flags needed |
| `TAX; TXA` | Remove both (no-op) |
| `PHA; PLA` | Remove both if A not changed |
| Branch over branch | Convert to opposite-condition branch |

---

## Building the C64 `.PRG` File Format

A C64 program file begins with a 2-byte load address (little-endian), followed by raw binary:

```asm
; Assembler output format for a file that loads at $C000:
.BYTE $00, $C0      ; load address: $C000 (low byte first)
.BYTE <code>        ; raw binary from $C000 onward
```

To generate from BASIC:
```basic
' Write PRG file with load address $C000 (49152)
OPEN 1,8,1,"MYPRG,P,W"   ' SA=1 for raw PRG output
PRINT#1, CHR$(0) CHR$(192)  ' load address low, high
' ... write code bytes ...
CLOSE 1
```

---

## SpeedScript Architecture (Real-World Example)

COMPUTE!'s SpeedScript is a complete word processor written in assembly — a model for structured C64 application design:

**Memory layout**:
- `$0200–$02FF`: I/O buffer and workspace
- `$033C–$03FB`: Cassette buffer (used for printer spooling)
- `$C000–$CFFF`: Main application code (4KB upper RAM)
- `$0400–$07FF`: Screen display buffer
- `$D800–$DBFF`: Color RAM (controlled directly)

**Key architectural patterns**:
1. **IRQ for keyboard** — custom IRQ handler for responsive input
2. **Modular subroutines** — each function (cursor move, insert, delete) is a separate JSR
3. **Kernal for I/O** — uses standard Kernal OPEN/CLOSE/BASIN/BSOUT for file operations
4. **Direct VIC-II control** — manipulates screen and color RAM directly for speed
5. **Zero-page workspace** — all frequently used pointers and counters in zero page

---

## Cross-Compilation Considerations

When building a compiler that **targets** the C64 (runs on a modern machine, produces C64 code):

1. **Little-endian 16-bit values** throughout
2. **Memory model**: Assume program loads at $0801 (BASIC stub) or $C000 (pure ML)
3. **Calling conventions** (for generated subroutines):
   - Parameters: A (1 byte), X/Y (2 bytes combined), or zero page
   - Return values: A (byte), A/Y (16-bit, A=low), FAC1 (float)
4. **Stack depth**: Max recursion ~20 levels (stack is only 256 bytes)
5. **ROM routines**: Can JSR to Kernal at $FFxx from generated code — document dependencies

### BASIC Line Header for Machine Code Stubs
```
; BASIC stub that does SYS 49152 when RUN
$0801: $0B,$08  ; link to next line = $080B
       $0A,$00  ; line 10
       $9E      ; SYS token
       $20,"49152",$00  ; " 49152" as text
$080B: $00,$00  ; end of BASIC program
; Code follows at $080D or $C000
```

Minimal BASIC stub loader (5-line BASIC):
```basic
2018 SYS 2074  ' must be at line 2018 for standard stub
```