A breakdown and guide to the Commodore .PRG file format, and creating and running prg files on the VIC-20.
		Commodore's .prg filetype is a simple executable file format used in the PET,
		VIC-20, C64, and C128. It is a simple (BASIC?-Ha!) format, but the details of how a .prg
		file is implemented on a particular Commodore system as well as what the file format can and
		cannot do require some elaboration.
	
		HEADER + PROGRAM
		The gross anatomy of a Commodore .prg file is a two-byte header followed by 
		the sequence of bytes that comprise the body of the program.
	
HEADER: The two-byte header designates the memory location into which the program is to be loaded. This location is almost always designated to be the start of the user's BASIC program RAM space. For the VIC-20 that address is 0x1001, and on the C64 it is 0x0801 (while the system's user RAM starts at 0x1000, that low byte must remain a zero- value). When loading the program into main memory, the system skips the header and only loads the program byte sequence into memory starting at the designated memory location.
PROGRAM: The program itself in the .prg is a tokenized BASIC program, but by
		using the SYS command the program can contain and run 6502 machine code. Thus a .prg
		 executable can be either a BASIC program, a 6502 assembly program (paired with a 1-line
		BASIC launch program), or a mix (using SYS to move from executing BASIC code to machine code and
		calling the kernel subroutine for parsing tokenized BASIC to move from executing machine
		code to BASIC code).
	
		Once loaded, the .prg program can be run by typing RUN into the 
		system BASIC interpreter. 
	
Here is a breakdown of the hello-world.prg program produced by the simple
	   Hello World program
	   on this site. 
NOTE: All 2-byte/ multi-byte values (e.g., for addresses) are in Little Endian format.
	
HEX DUMP
Hex dump of our simple hello-world.prg
	
		
00000000: 0110 0d10 0a00 9e28 3431 3131 2900 0000  .......(4111)...
00000010: 4c20 1048 454c 4c4f 2057 4f52 4c44 110d  L .HELLO WORLD..
00000020: 00a2 008a 48bd 1210 f009 20d2 ff68 aae8  ....H..... ..h..
00000030: 4c22 10ea                                L"..           
		
	
	
	
		Header: The first two bytes 01 10 are the header, designating the memory address into which
		the program will be loaded. Little-Endian format. 
	
		BASIC Launch Program: The next 14 bytes comprise a short BASIC program (tokenized) that launches
		the machine language portion of the program. Adding these bytes to the beginning of a 6502 assembly
		source code file effectively inserts this program at the start of your .prg file. 
		This 1-line BASIC program breaks down as follows:
 
		BASIC:  10 SYS 4111
                BYTES:  0d10 0a00 9e28 3431 3131 2900 0000 
		
		
| 0x0d10 | 0x0a00 | 0x9e | 0x283431313129 | 0x00 | 0x0000 | 
| Pointer to addr of next BASIC line | Line number of current BASIC line | SYS command token | PETSCII characters for BASIC command argument, here: (4111) | Zero byte indicates end of current BASIC line | Two consecutive zero bytes indicate end of BASIC program | 
10 SYS 4111 tells the VIC-20 to being executing
		6502 machine code starting at the address 4111, which is 0x100F. Consider that this BASIC
		program begins at the address 0x1001 and comprises 14 bytes, ending at address 0x100E. This
		means that the 6502 machine code begins as the next byte, 0x100F, which corresponds exactly
		to our SYS 4111 call. In this way, we have successfully switched from executing a
		BASIC program to a 6502 machine language program.
	
	Machine Language Program: The machine language program is broken into two parts: data and instructions.
	
		;== DATA =======================================================================
		; Hello World + carr rtrn + cursor down + NUL term
		DATA	.BYTE	$48,$45,$4C,$4C,$4F,$20,$57,$4F,$52,$4C,$44,$11,$0D,$00
		;== MAIN =======================================================================
		MAIN
			LDX #$00	; use X as offset
		LOOP
			TXA
			PHA		; push X to stack
			LDA DATA, X	; loads A w/ char
			BEQ DONE	; If byte in A is zero, we're done string
			JSR PRINT_CHAR
			PLA
			TAX		; pull X off stack
			INX		; increment X (offset into char data)
			JMP LOOP
		DONE
			NOP
			JMP DONE	; loop to keep msg on screen; RESTORE to quit
	
	
	NOTE The 2-section layout here is merely a personal design choice. It has pros and cons. Modifications to the DATA section, such as adding or removing a byte, will alter any hardcoded addresses in the assembly code below. Hence, this particular layout is a personal preference, not an ideal; however, in 6502 assembly, moving things around and making changes necessarily can alter any hardcoded addresses, so there is no real winning it. I like the consistency in sources of error if I stick to one particular schema.
The program itself is fairly straightforward: load the elements of an array (representing the petscii code for the chars in the string) using the X register as an offset (Addressing mode: Absolute,X). X is pushed and pulled from the stack at the top and bottom of each loop to preserve its value across calls to the PRINT_CHAR kernel subroutine.
Laslty, the program ends in an infinite loop to keep the message on the screen. Press RESTORE to exit/reset. The reason for not simply ended with an RTS is that upon control returning to the BASIC console with RTS, the screen is cleared (at least that is what my machine does).
Last updated Feb 2024