80x86 32-bit Disassembler and Assembler

Legal part
Introduction
Brief description of functions
Assemble
Checkcondition
Decodeaddress
Disasm
Disassembleback
Disassembleforward
Isfilling
Printfloat* functions

Download source



Legal part

This package includes source code of 32-bit Disassembler and 32-bit single line Assembler for 80x86-compatible processors. The source is a slightly stripped version of code used in OllyDbg v1.04 and is well proven by its numerous users. (If you haven't heard before, OllyDbg is a 32-bit Assembler level debugger with powerful analyzing capabilities that makes binary machine code understandable).

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License (http://www.fsf.org/copyleft/gpl.html) for more details.

You should have received a copy of the GNU General Public License (gpl.txt) along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.

All brand names and product names used in 80x86 Assembler and Disassembler, accompanying files or in this help file are trademarks, registered trademarks, or trade names of their respective holders.
 



Introduction

Disassembler understands all standard 80x86 commands, FPU, MMX, AMD's MMX extensions, Athlon/PIII MMX extensions and 3DNow! instructions. It does not decode SSI or SSI2 commands. Disassembler assumes 32 bit code and data segments but correctly decodes prefixed 16-bit commands. Several decoding modes allow you to select the amount of returned information (which is inversely proportional to execution speed): command length only, basic information useful for code analysis, or full decoding with dump and assembler form. Multiple options select desired format. Disassembler and Assembler support both MASM and Borland's IDEAL modes.

Assembler converts single command from the ASCII form to the binary code. It allows to find several possible encodings, or even to create search patterns with undefined operands.

This package includes following files:

Total source size exceeds 3800 lines of dense text (more than 190 K!). I have used Borland C and do not guarantee that it will work with any other compiler. Please set the default character type to unsigned! Please also place the following statements into the main file of your program, and do not #define MAINPROG in any other file:

    #define MAINPROG // Place all unique variables here
    #include "disasm.h"

(I use this trick to define shared global variables). Below is a small piece of code disassembled with OllyDbg 1.04 using different text settings:
 
004505B3  A1 DC464B00         MOV EAX,DS:[4B46DC]
004505B8  8B0498              MOV EAX,DS:[EAX+EBX*4]
004505BB  50                  PUSH EAX
004505BC  8D85 E0FBFFFF       LEA EAX,SS:[EBP-420]
004505C2  50                  PUSH EAX
004505C3  E8 141BFCFF         CALL 004120DC 
004505C8  83C4 08             ADD ESP,8
004505CB  43                  INC EBX
004505CC  3B1D D8464B00       CMP EBX,DS:[4B46D8]
004505D2  0F8C AFFEFFFF       JL 00450487 
004505D8  80BD E0FDFFFF 00    CMP BYTE PTR SS:[EBP-220],0
004505DF  75 14               JNZ SHORT 004505F5
004505E1  68 B39E4600         PUSH 469EB3 
004505E6  8D85 E0FDFFFF       LEA EAX,SS:[EBP-220]
004505EC  50                  PUSH EAX
004505ED  E8 521BFCFF         CALL 00412144 

 
004505B3  A1 DC464B00         mov     eax,[dword ds:4B46DC]
004505B8  8B0498              mov     eax,[dword ds:eax+ebx*4]
004505BB  50                  push    eax
004505BC  8D85 E0FBFFFF       lea     eax,[dword ss:ebp-420]
004505C2  50                  push    eax
004505C3  E8 141BFCFF         call    004120DC
004505C8  83C4 08             add     esp,8
004505CB  43                  inc     ebx
004505CC  3B1D D8464B00       cmp     ebx,[dword ds:4B46D8]
004505D2  0F8C AFFEFFFF       jl      00450487
004505D8  80BD E0FDFFFF 00    cmp     [byte ss:ebp-220],0
004505DF  75 14               jnz     short 004505F5
004505E1  68 B39E4600         push    469EB3
004505E6  8D85 E0FDFFFF       lea     eax,[dword ss:ebp-220]
004505EC  50                  push    eax
004505ED  E8 521BFCFF         call    00412144



Brief description of functions

Assemble

Function Assemble(), as expected, converts command from ASCII form to binary 32 bit code. It shares command table with Disasm(), so if some command can be disassembled, it can be assembled back too, with one exception: Assemble doesn't support 16 bit addresses. With some unimportant exceptions, 16 bit addresses cannot be used in Win32 programs.

Some commands have more than one encoding. Assemble() allows you to find them all. This is important, for example, if you want to find the shortest possible code or to find all possible occurrences of this command in the code. There are two parameters, constsize and attempt. First parameter selects size of immediate constant and address constant (8 or 32 bits), second is the occurrence of the command in the command table. To find all variants, call Assemble() with attempt=0,1,2... and for each attempt with constsize=0,1,2,3 as long as function reports success for at least one constsize. Generated codes may repeat. Please note that if command uses memory addresses, only one form will be generated in each case: [EAX*2] but not [EAX+EAX]; [EBX+EAX] but not [EAX+EBX]; [EAX] will not use SIB byte; no DS: prefix and so on.

Assemble compiles also imprecise commands that include following generalized operands:

This allows to generate imprecise search patterns, where mask contains zero bits at the positions occupied by imprecise operands in binary code. For example, patterns generated for command MOV R32,CONST will match both MOV EAX,1 and MOV ECX,12345678h.

Function returns number of bytes in assembled code or non-positive (zero or negative) number in case of error or when variant selected by combination of attempt and constsize doesn't exist. This number is the negative position of error in the input command. If you generate executable code, imprecise commands are usually not allowed. To assure that command is precise, check that all significant bytes in mask contain 0xFF.

int Assemble(char *cmd,ulong ip,t_asmmodel *model,int attempt,int constsize,char *errtext);

Parameters:

t_asmmodel: structure that receives assembled code.

typedef struct t_asmmodel {    // Model to search for assembler command
    char code[MAXCMDSIZE];     // Binary code
    char mask[MAXCMDSIZE];     // Mask for binary code (0: bit ignored)
    int length;                // Length of code, bytes (0: empty)
    int jmpsize;               // Offset size if relative jump
    int jmpoffset;             // Offset relative to IP
    int jmppos;                // Position of jump offset in command
} t_asmmodel;

Members:



Checkcondition

Checks whether 80x86 flags meet condition code in the command. Returns 1 if condition is met and 0 if not.

int Checkcondition(int code,ulong flags);

Parameters:



Decodeaddress

Custom user-supplied function that converts constant (address) into symbolic name. Initially, source code includes dummy function that returns 0.

Decodeaddress() decodes memory address or constant to the ASCII string and optionally comments this address. Returns length of decoded string (not including terminal 0), or 0 on error or if symbolic name is not available.

int Decodeaddress(ulong addr,char *symb,int nsymb,char *comment);

Parameters:



Disasm

The most important (and complex) function in this package. Depending on the specified disasmmode, Disasm() performs one of the four functions:

Function returns size of disassembled command. There are several global constants that influence the behavior of this function. They are described later in this section. All symbolic constants are described in file disasm.h.

ulong Disasm(char *src,ulong srcsize,ulong srcip,t_disasm *disasm,int disasmmode);

Parameters:

t_disasm:

typedef struct t_disasm {     // Results of disassembling
    ulong pi;                 // Instruction pointer
    char dump[TEXTLEN];       // (*) Hexadecimal dump of the command
    char result[TEXTLEN];     // (*) Disassembled command
    char comment[TEXTLEN];    // (*) Brief comment
    int cmdtype;              // One of C_xxx
    int memtype;              // Type of addressed variable in memory
    int nprefix;              // Number of prefixes
    int indexed;              // Address contains register(s)
    ulong jmpconst;           // Constant jump address
    ulong jmptable;           // Possible address of switch table
    ulong adrconst;           // Constant part of address
    ulong immconst;           // Immediate constant
    int zeroconst;            // Whether contains zero constant
    int fixupoffset;          // Possible offset of 32 bit fixups
    int fixupsize;            // Possible total size of fixups or 0
    int error;                // Error while disassembling command
    int warnings;             // Combination of DAW_xxx
} t_disasm;

Members:

Global flags that influence text of disassembled command:                 0 - PUSHA/PUSHAD
                1 - PUSHAW/PUSHAD
                2 - PUSHAW/PUSHA Global flags that warn of potentially invalid commands: If Disasm() encounters potentially invalid command and corresponding flag is 0, it sets bit in disasm->warning and places warning message in disasm->comment.



Disassembleback

Calculates address of assembler instruction that is n instructions (maximally 127) back from the instruction at specified pi. Returns address of found instruction. In case of error, it may be less than n instructions apart.

80x86 commands have variable length. Disassembleback uses heuristical methods to separate commands and in some (astoundingly rare!) cases may return invalid answer.

ulong Disassembleback(char *block,ulong base,ulong size,ulong ip,int n);

Parameters:



Disassembleforward

Calculates address of assembler instruction that is n instructions forward from instruction at specified address. Returns address of found instruction. In case of error, it may be less than n instructions apart.

ulong Disassembleforward(char *block,ulong base,ulong size,ulong ip,int n,int usedec);

Parameters:



Isfilling

Function determines whether pointed instruction is a no-action command (equivalent to NOP) used by different compilers to fill the gap between procedures or data blocks to a specified aligned border. Returns length of filling command in bytes or 0 if command is not a recognized filling.

int Isfilling(ulong addr,char *data,ulong size,ulong align);

Parameters:



Printfloat* functions

These functions decode 4-, 8-, 10-byte floating point number or 8-byte 3DNow! operand into the text form to string s. They correctly decode all cases of NANs or INFs without triggering floating point exceptions. If operand is not a valid floating point number, functions print hexadecimal dump of the number. Return length of decoded string in bytes, not including terminal 0.

int Print3dnow(char *s,char *f);
int Printfloat10(char *s,long double ext);
int Printfloat4(char *s,float f);
int Printfloat8(char *s,double d);
 
 

Copyleft (C) 2001 Oleh Yuschuk