Writing a compiler or assembler on (or for) the Commodore 64 requires special consideration for:
.prg files (2-byte load address header + raw binary)Standard two-pass technique (used by all period C64 assemblers):
Pass 1 — Symbol collection:
Pass 2 — Code generation:
For a C64 assembler written in BASIC or ML, the symbol table is typically:
' Simple symbol table in BASIC (array-based)
DIM SYMNAM$(100) ' symbol names
DIM SYMVAL%(100) ' 16-bit values (integer)
NSYM% = 0
' Add symbol
SYMNAM$(NSYM%) = NAME$
SYMVAL%(NSYM%) = VALUE%
NSYM% = NSYM% + 1
' Find symbol (linear search)
FOR I = 0 TO NSYM%-1
IF SYMNAM$(I) = NAME$ THEN FOUND = SYMVAL%(I) : GOTO FOUND_LABEL
NEXT I
' not found
Each instruction consists of an opcode byte followed by 0, 1, or 2 operand bytes. The assembler must map mnemonic + addressing mode → opcode byte.
Encoding pattern for most 6502 instructions:
Bits 7-5: instruction group
Bits 4-2: addressing mode
Bits 1-0: instruction select within group
Groups:
aaa=000: BIT, JMP, JMP(), STY, LDY, CPY, CPX
aaa=001: ORA, AND, EOR, ADC, STA, LDA, CMP, SBC
aaa=010: ASL, ROL, LSR, ROR, STX, LDX, DEC, INC
Addressing mode encoding (for group 01):
000: (zp,X) 001: zp 010: #imm 011: abs
100: (zp),Y 101: zp,X 110: abs,Y 111: abs,X
An assembler's expression evaluator handles operands like LABEL+2, $C000+OFFSET, >ADDR:
' Operators to support:
' + - * / (arithmetic)
' AND OR EOR NOT (bitwise)
' < (low byte), > (high byte)
' Precedence: NOT > */ > +- > AND > OR/EOR
When a label is referenced before its definition:
The first stage of any compiler — breaking source into tokens.
LABEL — identifier followed by ':'
OPCODE — recognized mnemonic (LDA, STA, etc.)
DIRECTIVE — assembler directive (.BYTE, .WORD, .TEXT, *= etc.)
NUMBER — decimal ($nnn hex, %nnn binary, 'c' char literal)
STRING — quoted text "..."
OPERATOR — + - * / < > = ( )
COMMA — ,
NEWLINE — end of logical line
EOF — end of input
; Simple character classifier for assembler lexer
; Input: A = character
; Output: A = token class (0=whitespace, 1=alpha, 2=digit, 3=operator, 4=EOL)
CLASSIFY:
CMP #$20 ; space
BEQ IS_SPACE
CMP #$0D ; CR
BEQ IS_EOL
CMP #$30 ; '0'
BMI IS_OP
CMP #$3A ; past '9'
BMI IS_DIGIT
CMP #$41 ; 'A'
BMI IS_OP
CMP #$5B ; past 'Z'
BMI IS_ALPHA
; default: operator
IS_OP LDA #3 : RTS
IS_SPACE LDA #0 : RTS
IS_EOL LDA #4 : RTS
IS_DIGIT LDA #2 : RTS
IS_ALPHA LDA #1 : RTS
A recursive-descent parser is ideal for the C64's memory constraints because:
program → statement*
statement → LET var '=' expr NEWLINE
| PRINT expr NEWLINE
| IF expr THEN statement
| GOTO number
| FOR var '=' expr TO expr [STEP expr]
| NEXT [var]
| END
expr → term (('+' | '-') term)*
term → factor (('*' | '/') factor)*
factor → NUMBER | STRING | VAR | '(' expr ')' | '-' factor | NOT factor
; Parse an expression; result in FAC1 (using BASIC math)
; Returns with carry set on error
PARSE_EXPR:
JSR PARSE_TERM ; parse first term
BCS PERR
EXPR_LOOP:
JSR PEEK_TOKEN ; look at next token
CMP #TOK_PLUS
BEQ EXPR_ADD
CMP #TOK_MINUS
BEQ EXPR_SUB
RTS ; done: no more + or -
EXPR_ADD:
JSR NEXT_TOKEN ; consume '+'
JSR SAVE_FAC1 ; save left side
JSR PARSE_TERM
BCS PERR
JSR FADD ; FAC1 = left + FAC1
BCC EXPR_LOOP
PERR SEC : RTS
The 6510 has only 3 registers (A, X, Y) and no general-purpose registers. Effective code generation strategies:
For A + B * C (where A, B, C are zero-page variables):
; Generated code for: A + B * C
LDA B ; load B
STA TEMP ; save
LDA C ; load C
; multiply TEMP * A (need ML multiply routine)
JSR MULTIPLY ; result in A (low byte)
CLC
ADC A_VAR ; add A
STA RESULT
Common optimizations for 6510 code generation:
| Pattern | Optimized |
|---|---|
STA $xx; LDA $xx |
STA $xx (remove redundant load) |
LDA #0 |
LDA #0 → prefer AND #0 when flags needed |
TAX; TXA |
Remove both (no-op) |
PHA; PLA |
Remove both if A not changed |
| Branch over branch | Convert to opposite-condition branch |
.PRG File FormatA C64 program file begins with a 2-byte load address (little-endian), followed by raw binary:
; Assembler output format for a file that loads at $C000:
.BYTE $00, $C0 ; load address: $C000 (low byte first)
.BYTE <code> ; raw binary from $C000 onward
To generate from BASIC:
' Write PRG file with load address $C000 (49152)
OPEN 1,8,1,"MYPRG,P,W" ' SA=1 for raw PRG output
PRINT#1, CHR$(0) CHR$(192) ' load address low, high
' ... write code bytes ...
CLOSE 1
COMPUTE!‘s SpeedScript is a complete word processor written in assembly — a model for structured C64 application design:
Memory layout:
$0200–$02FF: I/O buffer and workspace$033C–$03FB: Cassette buffer (used for printer spooling)$C000–$CFFF: Main application code (4KB upper RAM)$0400–$07FF: Screen display buffer$D800–$DBFF: Color RAM (controlled directly)Key architectural patterns:
When building a compiler that targets the C64 (runs on a modern machine, produces C64 code):
; BASIC stub that does SYS 49152 when RUN
$0801: $0B,$08 ; link to next line = $080B
$0A,$00 ; line 10
$9E ; SYS token
$20,"49152",$00 ; " 49152" as text
$080B: $00,$00 ; end of BASIC program
; Code follows at $080D or $C000
Minimal BASIC stub loader (5-line BASIC):
2018 SYS 2074 ' must be at line 2018 for standard stub
Powered by TurnKey Linux.