This assembler generates code compatible with Microchip's midline microcontrollers but is incompatible with their assembler. It should feel familiar to any PC assembly programmer. The instruction mnemonics and operand order are Intel style (i.e. 'right', as opposed to 'wrong'). The code is distributed under version 2.0 of the GPL license.
Table of Contents
You will also need
ratasm [ options ] [ filename ] options -c | --console read input from stdin write object to stdout write listing to stderr -d | --device name specify the target processor -l | --listing filename override the default listing filename -l- | --no-listing disable listing -o | --object filename override the default object filename -I | --include directory specify include directory -v | --version output assembler version -h | --help display help
The input is a sequence of lines each of which contains one or more of the following fields
label instruction expression ; comment
The label and comment are optional. The expression required depends on the instruction. Operands in an expression list are given target first and source second.
The assembler is case sensitive, even for instructions. However, both uppercase and lowercase versions of the reserved words are predefined.
Valid operands to instructions include
Hex values can be specified with C-style "0x"[[:xdigit:]]+. Binary values can be specified with "0b"[01]+. Decimal values require no prefix as decimal is the default base. For compatibility with Microchip header files, hex values may also be specified with "H'[[:xdigit:]]+'". For example:
; four copies of the same value dw 0x1fe, 0b000111111110, 510, H'01FE'
String constants are specified by enclosing zero or more characters and escaped characters within matching single or double quotes. Valid escape characters are '\n', '\r', '\t', '\b', '\0', '\x'[[:xdigit:]][[:xdigit:]], and '\\'. String constants generate one character constant for each character in the string. There is no trailing zero stored. String constants of one character can be used anywhere an integer can. For example:
mov w, 'h' add w, "a" - "A" db "Hello 'world'\n", 0, 'a', 'b', '\r', '\n', '\t'
A label is a sequence of alphanumeric characters (including underbar and period) that refers to a constant position in the emitted code. If a label is in the first column of a line, then that label will be defined equal to the org position of that line. Every nonlocal label must be defined exactly once. Labels do not have colons.
Labels can appear anywhere in an expression than an integer can. They evaluate to the org of the line that defines them. They can be referenced before their definition, and the assembler will make however many passes are necessary to resolve all the labels. The symbol '$' is a special label that refers to the org position of the current line. It can appear anywhere in an expression that a label can, but cannot appear in the first column of a line.
A label where the first character is a period is a local label. The scope of a local label is from the line with the previous nonlocal label up to, but not including, the line with the next nonlocal label. For example:
jmp .1 .1 jmp .1 ; first local foo jmp .1 .1 jmp .1 ; second local jmp .1 bar jmp .1 .1 jmp .1 ; third local jmp .1
In this example, the first local is only visible for the first two lines before the declaration of 'foo'. The second local is visible from the declaration of 'foo' to before the declaration of 'bar'. The third local is visible from the declaration of 'bar' to the end. Locals cannot be referenced outside of their scope.
To make the expression syntax easy to learn and remember, the operators and precedence levels were chosen to be as close as possible to C.
Data can be declared. The declarator takes the place of the opcode and is followed by a list of expressions. Each expression corresponds to one word in the output code regardless of the declarator type.
For instance:
db 1,2,3 dw 0x3fff, 0x3ff * 16 + 15, -1 dt 0b001, 0b010, 0b100
Equates are named expressions. They can be defined with 'equ'. For example:
led_1 equ 0x100 | 1 led_2 equ 0x100 | 2 combo equ ( led_1 ) | ( led_2 )
Defines are named sequences of tokens. They can be defined with 'define'. They differ from equates in that an equate must be a valid expression and is evaluated at the point of definition. A define is substituted into each target. For example:
org 0 msg1 db 'hello', 0 size1 equ $-msg1 msg2 db 'world', 0 size2 define $-msg2 org 0x100 dw size1, size2
In this example, 'size1' is evaluated at its point of definition to the integer 6. Every subsequent occurence of 'size1' will be replaced with the integer 6. But 'size2' is replaced with the text '$-msg2' leading to different values stored in the final declaration.
Other directives:
The assembler has a relatively sophisticated pattern-matching macro system. Macros can be defined with the syntax:
name macro expression_list .. body lines .. endm
They can be instantiated like regular instructions. They can be overloaded, and can even have the same name as instructions. When defined, unknown symbols in the expression list are taken to be arguments. When instantiated, the defining expression is matched against the instance expression to determine substitutions for the arguments. Nonargument tokens in the defining expression must match tokens exactly in the instance expression. For example:
foo macro [x], y mov w, [y] dw x+y endm foo [100],200expands to
mov w, [200] dw 100+200
To make the assembler more comfortable to i86 assembly programmers, many instructions are defined as macros. For instance, there is no instruction in the PIC instruction set to increment the accumulator directly. So there is a macro defined in "include.asm":
inc macro w add w, 1 endm
When the opcode 'inc' is encountered, the assembler will first match the expression list against the macro since it was declared more recently than the native instruction. Failing that, it will match against the native instruction.
Many of the instructions that are implemented as macros require equates defined in a Microchip include file. These files can be found in the gputils package or at Microchip.
Examples:
add w, [123] ; w += memory[123] add [123], w ; memory[123] += w add w, 123 ; w += 123
Examples:
and w, [123] ; w &= memory[123] and [123], w ; memory[123] &= w and w, 123 ; w &= 123
Example:
bclr [123], 7 ; memory[123] &= 0x7f
Example:
bset [123], 7 ; memory[123] |= 0x80
Example:
btsz [123], 0 ; if( !( memory[123] & 1 ) ) skip()
Example:
btsnz [123], 7 ; if( memory[123] & 128 ) skip()
Example:
call somewhere ; somewhere()
Example:
clc ; /* macro */
Example:
cli ; /* macro */
Examples:
clr w ; /* macro */ ; w = 0 clr [123] ; memory[123] = 0 clr w, [123] ; not useful
Example:
clrwdt
Example:
cmc ; /* macro */
Examples:
dec w ; /* macro */ ; --w dec [123] ; --memory[123] dec w, [123] ; w = memory[123] - 1
Examples:
decsz w ; /* macro */ ; if( !--w ) skip() decsz [123] ; if( !--memory[123] ) skip() decsz w, [123] ; if( !( w = memory[123] - 1 ) skip()
Examples:
inc w ; /* macro */ ; ++w inc [123] ; ++memory[123] inc w, [123] ; w = memory[123] + 1
Examples:
incsz w ; /* macro */ ; if( !++w ) skip() incsz [123] ; if( !++memory[123] ) skip() incsz w, [123] ; if( !( w = memory[123] + 1 ) skip()
Example:
ja somewhere ; /* macro */
Example:
jae somewhere ; /* macro */
Example:
jb somewhere ; /* macro */
Example:
jbe somewhere ; /* macro */
Example:
je somewhere ; /* macro */
Example:
jne somewhere ; /* macro */
Example:
jmp somewhere ; goto somewhere
Examples:
loop w, somewhere ; /* macro */ ; if( !--w ) goto somewhere loop [123], somewhere ; /* macro */ ; if( !--memory[123] ) goto somewhere
Examples:
mov w, [123] ; w = memory[123] mov [123], w ; memory[123] = w mov w,123 ; w = 123 mov [123], 45 ; /* macro */ ; memory[123] = 45
Example:
nop
Examples:
not w ; /* macro */ ; w = ~w not [123] ; /* macro */ ; memory[123] = ~memory[123] not w, [123] ; /* macro */ ; w = ~memory[123]
Examples:
or w, [123] ; w |= memory[123] or [123], w ; memory[123] |= w or w, 123 ; w |= 123
Examples:
ret ; return ret 123 ; w = 123 ; return
Example:
reti
Examples:
rlc w ; /* macro */ ; new_carry = ( w & 0x80 ) != 0 ; w = ( w << 1 ) | carry_flag ; carry_flag = new_carry rlc [123] ; new_carry = ( memory[123] & 0x80 ) != 0 ; memory[123] = ( memory[123] << 1 ) | carry_flag ; carry_flag = new_carry rlc w, [123] ; new_carry = ( memory[123] & 0x80 ) != 0 ; w = ( memory[123] << 1 ) | carry_flag ; carry_flag = new_carry
Examples:
rrc w ; /* macro */ ; new_carry = w & 1 ; w = ( w >> 1 ) | ( carry_flag ? 0x80 : 0 ) ; carry_flag = new_carry rrc [123] ; new_carry = memory[123] & 1 ; memory[123] = ( memory[123] >> 1 ) | ( carry_flag ? 0x80 : 0 ) ; carry_flag = new_carry rrc w, [123] ; new_carry = memory[123] & 1 ; w = ( memory[123] >> 1 ) | ( carry_flag ? 0x80 : 0 )
Examples:
shl w ; /* macro */ ; w <<= 1 shl [123] ; /* macro */ ; memory[123] <<= 1 shl w, [123] ; /* macro */ ; w = memory[123] << 1
Examples:
shr w ; /* macro */ ; w >>= 1 shr [123] ; /* macro */ ; memory[123] >>= 1 shr w, [123] ; /* macro */ ; w = memory[123] >> 1
Example:
sleep
Example:
stc ; /* macro */
Example:
sti ; /* macro */
Examples:
sub w, [123] ; w = memory[123] - w sub [123], w ; memory[123] -= w sub w, 123 ; w = 123 - w
Examples:
swap [123] ; hi = ( memory[123] >> 4 ) & 0x0f ; lo = memory[123] & 0x0f ; memory[123] = ( lo << 4 ) | hi swap w, [123] ; hi = ( memory[123] >> 4 ) & 0x0f ; lo = memory[123] & 0x0f ; w = ( lo << 4 ) | hi
Example:
test [123] ; /* macro */ ; z_flag = ( memory[123] == 0 )
Examples:
xchg w, [123] ; /* macro */ ; int i = memory[123] ; memory[123] = i ; w = i xchg [123], w ; /* macro */ ; int i = memory[123] ; memory[123] = i ; w = i
Examples:
xor w, [123] ; w ^= memory[123] xor [123], w ; memory[123] ^= w xor w, 123 ; w ^= 123
Changing the org in the middle of emitted code in a macro can cause the latter emitted code to not appear in the listing file if macro expansion is turned off. The assembler makes an effort to migrate emitted code from nonlisted lines to listed lines, but will get confused if there is more than one noncontiguous section of code. For example:
org 0 foo macro dw 1 org 100 dw 2 endm noexpand foo
In this example, '1' will show up in the listing as code for 'foo'. However, '2' will not appear. This bug does not affect the object output.
Strings are stored internally as C-strings. Embedded nuls in string constants terminate the string. Neither the first nul, nor anything after it, are stored. This could be fixed by using a separate string data type.
The preprocessor is currently integrated with the parser. I would like to make it a separate layer. Doing so will allow implementation of an 'undef' directive and 'defined' predicate.
Local labels defined inside of a macro are translated to have two leading periods. It is recommended that user-defined local labels not have more than one leading period to avoid confusing the assembler.
There is no capability for structures or other complex data types. I don't think this is a serious drawback for an assembler for microcontrollers.
A 'repeat' directive would be handy and not difficult to implement. I also intend to write a disassembler and instruction generator to automate testing.