Sorted data

Many kinds of internal OllyDbg data consist of homogenous items that have start address and non-zero size, and do not overlap with each other. A good example is the table of memory blocks. INT3 breakpoints may be treated as items occupying 1 byte in the memory space of the debugged program. Threads exist in the address space of  thread identifiers and also occupy 1 address of this space. Data items usually can be displayed in some window and sorted using some criterium. A set of such elements is called sorted data.

OllyDbg implements a powerful set of functions that allow easy operations with sorted data, like initilaization, adding or replacing of elements, removing of elements or address ranges, sorting, search and so on. OllyDbg automatically allocates new memory for sorted data if necessary. Sorted data is excellently scalable. In one (rather uncommon) case, while analysing a 70-Mb code section of VB program, OllyDbg gathered 1.5 Gb of register prediction data!

Internally, data is always sorted by address. This allows for simple and extremely fast binary search. Adding new data is, of course, not so easy and may take significant time. Weighted binary trees may look more suitable, but in our case data is read much more frequently than added to the table. If you sort data by method other than increasing addresses, OllyDbg creates additional array of indexes pointing to data items, but their order in memory remains unchanged.

By default, data is kept in a single contiguous memory block. When data overflows, OllyDbg allocates another block, usually twice as large, and copies existing data there. If data items are large or numerous, adding new elements or deleting existing results in moving large chunks of memory. Also, reallocation of large data buffers may lead to the excessive memory fragmentation. To prevent this, use indexed data (SDM_INDEXED). Indexing usually pays only if total data size exceeds 3-5 Mb.

This strategy has one significant drawback. API functions that search for the sorting data items return pointers to the found items in the data buffer. Therefore each API call that adds, removes or replaces items in the same sorted data invalidates all received pointers.

If flag SDM_NOSIZE is not set, elements of sorted data must begin with the standard 12-byte header:

typedef struct t_sorthdr {             // Header of sorted data item
  ulong          addr;                 // Base address of the entry
  ulong          size;                 // Size of the entry
  ulong          type;                 // Type and address extension, TY_xxx
} t_sorthdr;

Please don't mix the size specified in this header and the size of the element of sorted data. They belong to different address spaces! Size in header is the size of piece of virtual address space described by sorted data and usually belongs to debugged program. Physical size of the element is the size of memory ocuppied by element in the OllyDbg's memory. All elements have the same physical size necessary to fit all the characteristics and descriptions of the described object; size in the header is simply one (albeit most important) of the object's characteristics and may be different for each object.

Plugins are not allowed to change addr and size directly. If these parameters must be changed, create temporary copy of the item, delete item and add copy to the data. (If new and old versions overlap, just add new item and it will replace the previous). Keep in mind that in both cases old item will be passed to destructor function.

32-bit address addr can be extended by another 8 least significant bits (for example, when you need many records associated with the single memory address).
They are kept in the field TY_AEXTMASK of the type. If extended addressing is active, size of all items in the sorted data is forced to 1 (that is, only exact address matches count). To create such data, specify SDM_EXTADDR. An example of the sorted data that uses extended addressing is the list of names.

Four other bits in the type are reserved for the internal purposes:

TY_NEW Indicates that item is new. Add new items with TY_NEW set and call Unmarknewsorteddata() to declare all existing items as "old"
TY_CONFIRMED Used to delete multiple scattered data items at once. Mark items that should remain in the data as TY_CONFIRMED and clear this flag in items that must be deleted. Confirmsorteddata() gives you the fast way to clear or set this bit in all data items. Deletenonconfirmedsorteddata() deletes all items where bit is cleared and clears it in remaining items
TY_EXTADDR Extended address (TY_AEXTMASK) is used
TY_SELECTED Reserved for multiple selection (may be implemented in one of the future OllyDbg versions)

Plugins are free to use remaining type bits for any purposes.

If you add new item to the sorted data and it overlaps, fully or partially, with one or more existing items, all overlapping items will be deleted. Note that each time an item is deleted - implicitly or on explicit request - OllyDbg calls an optional destructor function associated with the sorted data.

There is a special kind of sorted data called autoarrangeable. Autoarrangeable data assumes that address of the element is simply its 0-based ordinal number in the data array and size occupied by the element in address space is always 1. In other words, it 
works as a list of items numbered 0, 1, 2... Even in this case, elements must begin with valid t_sorthdr. Data is declared as autoarrangeable if its sorting function is set to AUTOARRANGE. Unlike for standard data, insertion of item with existing address does not replace this item. Instead, all items with address equal or exceeding given will be moved one step up and automatically renumerated. To replace item in the autoarrangeable data, either delete existing and insert new one, or (faster way) get pointer to the item and modify it directly.

If size is always 1 and type is not necessary, you may reduce memory requirements by declaring sorted data as SDM_NOSIZE. In this case, size of the header is reduced to a single doubleword:

typedef struct t_sorthdr_nosize {      // Header of item w/o size and type
  ulong          addr;                 // Base address of the entry
} t_sorthdr_nosize;

Each sorted data has a version. Data version increments by 1 each time the data is changed. This allows for easy implementation of small cache: if version is not 0 and is not changed, previously fetched data is still valid. In any imaginable application, wraparound of 32-bit variable is impossible. Of course, this works only if data is changed using API functions. If you access data item by pointer and use data caching, set version to 0.

There are many sorted data collections defined by OllyDbg that can be accessed by plugins. However, plugins should avoid tampering with this data directly, as this may influence OllyDbg stability:

t_sorted premod - preliminary module data
t_table module - loaded modules (DLLs)
t_sorted aqueue - modules that are not yet analysed
t_table thread - active threads
t_table memory - allocated memory blocks
t_table win  - list of windows
t_table bpoint - INT3 breakpoints
t_table bpmem - memory breakpoints
t_sorted bppage - memory pages with changed attributes
t_table bphard - hardware breakpoints
t_table watch - watch expressions
t_table patch - list of patches from previous runs
t_sorted procdata - descriptions of analyzed procedures
t_table source - list of source files
t_table srccode - source code

To create your own table of sorted data, allocate data descriptor (structure of type t_sorted) and initialize it to 0, then call Createsorteddata() to initialize table and allocate data buffers. After initialization, you can use all sorted data functions to change or retrieve data. Do not modify items of data descriptor directly, this may lead to severe data integrity problems!

In the OllyDbg 1.xx it was possible to extract or walk data items by casting the storage to the type of the element, like in this example:

t_memory *pmem;
pmem=(t_memory *)memory.data.data;
for (i=0; i<memory.data.n; i++,pmem++) {
   // some actions
}


This is no longer allowed in the version 2.xx. Always use Find...() or Get...() instead, these functions are fast!


API functions:

int Createsorteddata(t_sorted *sd,ulong itemsize,int nexp,SORTFUNC *sortfunc,DESTFUNC *destfunc,int mode);
void Destroysorteddata(t_sorted *sd);
void Deletesorteddata(t_sorted *sd,ulong addr,ulong subaddr);
int Deletesorteddatarange(t_sorted *sd,ulong addr0,ulong addr1);
void * Addsorteddata(t_sorted *sd,void *item);
int Replacesorteddatarange(t_sorted *sd,void *data,int n,ulong addr0,ulong addr1);
void Renumeratesorteddata(t_sorted *sd);
int Confirmsorteddata(t_sorted *sd,int confirm);
int Deletenonconfirmedsorteddata(t_sorted *sd);
void Unmarknewsorteddata(t_sorted *sd);
void * Findsorteddata(t_sorted *sd,ulong addr,ulong subaddr);
void * Findsorteddatarange(t_sorted *sd,ulong addr0,ulong addr1);
int Findsortedindexrange(t_sorted *sd,ulong addr0,ulong addr1);
void * Getsortedbyindex(t_sorted *sd,int index);
void * Getsortedbyselection(t_sorted *sd,int index);
int Sortsorteddata(t_sorted *sd,int sort);
int Issortedinit(t_sorted *sd);


Example:

// Layout of the data item.
typedef struct t_custom {
  // Standard sorted data items begin with t_sorthdr.
  ulong          addr;                 // Base address of the entry
  ulong          size;                 // Size of the entry
  ulong          type;                 // Type and address extension, TY_xxx
  // Place your custom fields here.
  int            customdata;           // Custom data
} t_custom;

// Destructor of t_custom, called each time the data item is deleted.
void Customdestfunc(t_sorthdr *sh) {
  t_custom *pcustom;
  if (sh==NULL) return;                // Error in input, must not happen
  pcustom=(t_custom *)sh;
  Addtolist(pcustom->addr,DRAW_HILITE,L"Deleting customdata=%i",pcustom->customdata);
}

int Playwithsorteddata(void) {
  t_sorted sd;
  t_custom custom,*pcustom;
  // Local variable sd is filled with garbage and must be cleared.
  memset(&sd,o,sizeof(t_sorted));
  // Let's assume that usually we need not more than 100 items
  if (
Createsorteddata(&sd,sizeof(t_custom),100,NULL,Customdestfunc,0)!=0)
    return -1;                         // Low memory?

  // Add item.
  memset(&custom,0,sizeof(t_custom));
  custom.addr=1000;
  custom.size=100;
  custom.customdata=1;
  // This call will succeed.
  Addsorteddata(&sd,&custom);
  // Add another item.
  custom.addr=1010;
  custom.size=20;
  custom.customdata=2;
  // This call will succeed. But, as second item overlaps with the first, first
  // item will be passed to the Customdestfunc() and then removed from the data.
  Addsorteddata(&sd,&custom);
  // Search for data item at address 1020 will succeed because item 2 occupies
  // address range 1010..1029:
  pcustom=(t_custom *)Findsorteddata(&sd,1020,0);
  if (pcustom!=NULL)
    Addtolist(pcustom->addr,DRAW_HILITE,L"Found customdata=%i",pcustom->customdata);
  // We no longer need sorted data, free allocated resources.
  Destroysorteddata(&sd);
}


See also:

Tables