A technical description of the file formats used by Ultima 6. Last updated on 18-Nov-2001 ========================================================================== Terminology ========================================================================== byte = 8 bits word = 16 bits dword = 32 bits paragraph = a 16-byte block ========================================================================== Debugging and Disassembly ========================================================================== The information in this file was discovered by running Ultima 6 in a debugger, and by disassembling some of its driver files. If you want to do the same, I suggest you use 386SWAT. This is a free, but very powerful, DOS debugger. Download it at: http://www.sudleyplace.com/swat/swat.htm ========================================================================== Packed DOS Executables ========================================================================== The file "game.exe" is a packed exe file. To decompress it, you need an exe unpacker. You can use UX 0.55, which is available from: http://www.kiarchive.ru:8093/pub/msdos/execomp/inf-ux55.zip ========================================================================== Offsets in EXE files ========================================================================== All offsets in EXE files are relative to the start of the code. In other words, you may skip the EXE-header. ========================================================================== Files and Compression ========================================================================== LZW-compressed Files ==================== Many U6 files have been compressed with the LZW encoding algorithm. Decompression code is located in "u.exe", at offset 0xF20F. What does a compressed file look like? ====================================== Compressed files have the following structure: struct compressed_file { unsigned byte size0; unsigned byte size1; unsigned byte size2; unsigned byte size3; unsigned byte compressed_data[filesize-4]; }; The first 4 bytes give you the length of the uncompressed file: uncompressed_size = size0 + (size1 << 8) + (size2 << 16) + (size3 << 24); The rest of the file contains the LZW-encoded data. A valid compressed file satisfies the following conditions: 1) size3 == 0 That's because no compressed file has an uncompressed size greater than 16 MB :-) 2) uncompressed_size > (filesize-4) 3) compressed_data[0] + ((compressed_data[1] & 1) << 8) == 0x100 The first 9 bits of compressed_data[] are equal to 0x100, because they're a special command that tells the decompression code to re-initialize its dictionary. Which files are compressed? =========================== Generally speaking, all files that are loaded into memory as a whole. For example, all driver files are compressed, as well as "maptiles.vga", because all MapTiles are kept in memory at all times. The file "objtiles.vga", on the other hand, is not compressed, because only parts of it are cached in at any given time. How can I easily encode a file for U6? ====================================== Here's a simple method that doesn't infringe the Unisys patent: a) Create a new file. b) Write the 4 length bytes to the file. c) Divide the uncompressed driver into blocks of, say, 64 bytes (last block can be smaller than 64 bytes, of course) d) For each block, do this: - write the 9-bit value 0x100 to the file. - write each byte of the 64-byte-block to the file (as a 9-bit value with the MSB set to 0). e) Write the 9-bit value 0x101 to the file. Collections =========== A "collection" is a file that contains objects of the same type. Libraries ========= A "library" is a collection of objects of the same type, all stored in a single file. There are several types of libraries: lib_16 = set of offset_16, set of object offset_16 = unsigned word ; offset of the object within the library file lib_32 = set of offset_32, set of object offset_32 = unsigned dword ; offset of the object within the library file s_lib_16 = file_size, lib_16 s_lib_32 = file_size, lib_32 file_size = unsigned dword ; the length of the entire library file Some library files are lzw-compressed. Other library files contain blocks of lzw-compressed data, even though the library files themselves are not compressed. The object offsets in a library file are sorted in ascending order. This means that you can use the first offset to calculate the total number of offsets, and thus the total number of objects, in the library. However, the file "converse.a" is a special case. It is a lib_32, but its first two offsets are 'null pointers.' ========================================================================== Graphics ========================================================================== Only the VGA graphics of U6 are discussed here. If you want to know anything about the other graphics types, you must do your own research. Tiles ===== A "tile" is a 16x16 pixel image. U6 has 0x800 tiles. The first 0x200 tiles are stored in "maptiles.vga", which is LZW-compressed. The other 0x600 tiles are stored in "objtiles.vga", which is not compressed. Now, if you're looking for a particular tile, how do find it quickly, without searching through all of "maptiles.vga" or "objtiles.vga"? That's what "tileindx.vga" is for. This file (which is LZW-compressed, btw) contains 0x800 words that give the location of of each tile, as follows: 1) Uncompress "maptiles.vga" into a file called "alltiles.vga". 2) Append "objtiles.vga" to "alltiles.vga". 3) Take a pointer from "tileindx.vga" and multiply it by 16. You now have the offset of the tile in "alltiles.vga". U6 uses three formats to store its tiles: Plain ===== Plain tiles are always 256 bytes long. There are no transparent pixels. The first 16 bytes represent the first 16 pixels, and so on. tile = line0, ... ,line15 line = pixel0, ... ,pixel15 pixel = unsigned byte Transparent Pixels ================== The next step of sophistication. It's just like the "plain" format, except that 0xFF represents a completely transparent pixel. Pixel blocks ============ The tile is represented as a collection of horizontal strips, which I've chosen to call "pixel blocks." The first word of each pixel block determines its placement within the tile. tile = tile_length, set of pixel_block, padding tile_length = unsigned byte ; the tile length in paragraphs (a "paragraph" is a 16-byte block) pixel_block = displacement, block_length=0 || displacement, block_length, set of pixel padding = set of 0xED ; if (tile_length mod 16) != 0 --> append 0xED's at the end of the tile displacement = unsigned word ; explained below block_length = unsigned byte ; if (block_length==0) --> end of tile data (there may be a few more 0xED's, though) pixel = unsigned byte About the "displacement" field: For the first pixel block, this field determines its location relative to the tile's upper left corner. For the other pixel blocks, it determines the location relative to the pixel directly to the right of the previous pixel block. The formula: displacement = y*176+x Where: 0 <= y <= 15 -16 <= x <= 15 Where does the value "176" come from? U6 doesn't draw all of its graphics directly to the screen. For some, it uses a buffer. The graphics are drawn into this buffer, which is later copied to the screen. Because U6 updates the screen by regions, the buffer doesn't need to be 320x200 pixels big. In fact, it's only 176 pixels wide (I don't know its height). How do I know which tiles are stored in what format? ==================================================== That information is contained in "masktype.vga", which is LZW-compressed. The first 0x800 bytes of "masktype.vga" tell you what format each tile is stored in: 0x0 ==> plain. 0x5 ==> transparent pixels. 0xA ==> pixel blocks. I don't know what the other (0x600 + 0x180) bytes are for, but I suspect they have something to do with the U6 tile caching algorithm. Shapes ====== I chose the name "shape" because this format is very similar to the U7 shape/frame format. shape = rightX, leftX, upperY, lowerY, set of shape_pixel_block rightX = unsigned word; no. of pixels to the right of the shape's hotspot leftX = unsigned word; no. of pixels to the left of the shape's hotspot upperY = unsigned word; no. of pixels above the shape's hotspot lowerY = unsigned word; no. of pixels below the shape's hotspot shape_pixel_block = double_length, x_pos, y_pos, set of pixel || double_length=0 double_length = unsigned word; (length of pixel block << 1) | compression_flag ; if (double_length==0) --> end of data ; compression flag is always 0 x_pos = signed word; beginning of the pixel block, relative to the hotspot y_pos = signed word; beginning of the pixel block, relative to the hotspot pixel = unsigned byte Trivia: 1) Take a look at the uncompressed "u6mcga.drv", offset 0x30A4. This is the start of the function that draws a shape to the screen. You will see that the "compression flag" mentioned earlier has a function similar to the compression flag in the U7 shape format. 2) I don't know what "intro.ptr" is for. Yet. Bitmaps ======= A "bitmap" is an image with the following format: bitmap = width, height, set of pixel width = unsigned word height = unsigned word pixel = unsigned byte Bitmaps are always compressed. The files "*.bmp" are lzw-compressed bitmaps. Font ==== The font is stored in "u6.ch". There are 256 characters. Each is 8x8 pixels big, uses 2 colors, and can therefore be stored in 8 bytes. Bit 0 --> color 0x31. Bit 1 --> color 0x48. These mappings are 'hard-coded' into "u6mcga.drv". Portraits ========= The portraits you see during conversations are stored in: portrait.a portrait.b portrait.z These files are lib_32's of lzw-compressed data blocks. As far as I've been able to tell, the uncompressed data blocks are always 0xE00 = 3584 = 56*64 bytes long. Every block contains raw pixel data for a 56x64 image (one byte per pixel). Mouse Pointers ============== There are 10 mouse pointers: small arrows for the 8 compass directions, a crosshair and a large arrow. The mouse pointers are stored in "u6mcga.ptr". The format of this file: lzw -> s_lib_32 -> shape Palettes ======== There are two palette types. In-game palette =============== The in-game palette is stored in the file "u6pal". struct u6pal { unsigned byte palette[0x100][3]; byte no_idea_yet[0x100]; } The first 0x300 bytes seem to contain the initial palette. The RGB components are stored in the order red, green, blue. Please note that even though the RGB components are stored as bytes, they only take up 6 bits each. That is because the color DAC of the original VGA only supported 6 bits per RGB component. So if you want to use the U6 palette on a modern computer, you should left-shift each component by 2 bits. What the other 0x100 bytes are for, I don't know. Cut-scene palettes ================== The palettes for the cut-scenes (startup, introduction, character creation) are stored in "palettes.int". This file is a collection of 8 'packed' palettes. Every packed palette is 0x240 bytes long. 'Packed' means that every color component is stored as 6 bits, so that 3 bytes are enough to store 4 color component entries. Some code to explain this: unsigned byte packed_palette[0x240]; unsigned byte unpacked_palette[0x100][3]; for (int i = 0; i < 0x100; i++) { for (int j = 0; j < 3; j++) { int byte_pos = (i*3*6 + j*6) / 8; int shift_val = (i*3*6 + j*6) % 8; int color = ((packed_palette[byte_pos] + (packed_palette[byte_pos+1] << 8)) >> shift_val) & 0x3F; unpacked_palette[i][j] = (unsigned byte) (color << 2); } } Animation ========= There are two types of animations. Palette Cycling =============== U6 uses this method to animate ... fireplaces ... candles ... braziers ... BluGlo[tm] magic items ... cauldrons The relevant code is located at offset 0x13D5 in the unpacked "game.exe". All palette cycling information is 'hard-coded' into "game.exe". The following palette intervals get cycled: 0xE0 - 0xE7 (fires, braziers, candles) 0xE8 - 0xEF (BluGlo[tm] magical items) 0xF0 - 0xF3 (?) 0xF4 - 0xF7 (kitchen cauldrons) 0xF8 - 0xFB (?) The 8-entry intervals are cycled twice as fast as the 4-entry intervals. Pseudo-code for rotating ("cycling") palette intervals ====================================================== // the VGA palette unsigned byte palette[256][3]; // cycle the 8-entry interval 0xE0 - 0xE7 unsigned byte temp[3]; for (i = 0; i < 3; i++) { temp[i] = palette[0xE0][i]; } for (i = 0; i < 7; i++) { for (j = 0; j < 3; j++) { palette[0xE0+i][j] = palette[0xE0+i+1][j]; } } for (i = 0; i < 3; i++) { palette[0xE0+7][i] = temp[i]; } // the 4-entry intervals are left as an exercise Multiple Animation Frames ========================= U6 uses this method to animate ... water ... fountains ... pennants/flags ... PC's and NPC's ... protection fields The relevant code is located at offset 0x1F28 in the unpacked "game.exe". Animation information can be found in the file "animdata", which has the following structure: struct animdata { unsigned word number_of_tiles_to_animate = 0x1D; unsigned word tile_to_animate[0x20]; unsigned word first_anim_frame[0x20]; unsigned byte and_masks[0x20]; unsigned byte shift_values[0x20]; }; Some of the Ultima 6 tiles do not contain any graphics. They represent animated tiles. The actual animation frames are stored in other tiles. Here's some pseudo-code to demonstrate how this works: // pointers to the tile data unsigned byte *tile_pointers[0x800]; // the game timer is incremented regularly. I don't know how regularly, // though :) unsigned word game_timer; // temp variable used by the loop unsigned word current_anim_frame; for (i = 0; i < animdata.number_of_tiles_to_animate; i++) { current_anim_frame = (game_timer & animdata.and_masks[i]) >> animdata.shift_values[i]; tile_pointers[animdata.tile_to_animate[i]] = tile_pointers[animdata.first_anim_frame[i] + current_anim_frame]; } Both Animaton Methods ===================== There are tiles that use both animation methods: ... protection fields I don't know if 'multiple frame' animation and palette rotation are synchronized in any way. Hybrid Tiles ============ Some tiles are part animated tile and part static tile, such as coast lines and river banks. Hybrid tiles are animated by copying parts of a regular animated tile into a static tile. The relevant code is located at offsets 0x2CFA and 0x2D2F in the uncompressed "u6mcga.drv". Here is some pseudo-code to demonstrate how this works: // // pointers to the tile data // unsigned byte *tile_pointers[0x800]; // // array of 32 indices into the uncompressed "tileindx.vga" // located at offset 0x2C00 in the uncompressed "u6mcga.drv" // unsigned word sources[0x20]; // // array of 32 indices into the uncompressed "tileindx.vga" // located at offset 0x2C40 in the uncompressed "u6mcga.drv" // unsigned word dests[0x20]; // // the uncompressed "animmask.vga" contains // 32 data blocks (each is 64 bytes long) // these blocks control what parts of a // source tile are copied into the correspoding // destination tile // unsigned byte animmask_vga[0x20][0x40]; for (int i = 0; i < 0x20; i++) { // // because sources[] and dests[] contains pointers into // the uncompressed "tileindx.vga" (which in turn contains // word pointers), the values in sources[] and dests[] // must be divided by 2 before they can be used in C code. // unsigned byte *source_tile = tile_pointers[sources[i] / 2]; unsigned byte *dest_tile = tile_pointers[dests[i] / 2]; // // copy parts of source_tile* into dest_tile* // important: both tiles are assumed to be in "plain" or // "transparent pixels" format, i.e. they must be 256 // bytes long. // int copy_pos = 0; int db_index = 0; int displacement; int bytes2copy; bytes2copy = animmask_vga[i][db_index]; if (bytes2copy != 0) { // copy for (int j = 0; j < bytes2copy; j++) { dest_tile[copy_pos] = source_tile[copy_pos]; copy_pos++; } } db_index++; displacement = animmask_vga[i][db_index]; bytes2copy = animmask_vga[i][db_index+1]; db_index += 2; while ((displacement != 0) && (bytes2copy != 0)) { copy_pos += displacement; // copy for (int j = 0; j < bytes2copy; j++) { dest_tile[copy_pos] = source_tile[copy_pos]; copy_pos++; } displacement = animmask_vga[i][db_index]; bytes2copy = animmask_vga[i][db_index+1]; db_index += 2; } } ========================================================================== Text ========================================================================== Object Names ============ When you 'look' at an object, the game will tell you its name ("Thou dost see a mouse.") The object names are stored in "look.lzd". "look.lzd" is lzw-compressed. After decompression, the file is going to look like this: object_names = set of object_description object_description = object_number, object_name object_number = unsigned word object_name = set of character, terminator=0 || terminator=0 character = unsigned byte terminator = unsigned byte The objects are sorted in ascending order: offset(obj1) > offset(obj2) IFF number(obj1) > number(obj2) The strings may contain special characters: "/" --> singular word ending follows. "\" --> plural word ending follows. Example: "loa/f\ves of bread" Word ending = [a-z]+ The game translates the object name "Avatar" into the Avatar's name. The game translates an empty object name to the string "nothing". The code that extracts strings from the uncompressed "look.lzd" is located at offset 0xCFE0 in the unpacked "game.exe". Conversations ============= I haven't completely decoded the conversation files yet. But here's what I do know: The conversations are stored in two files, "converse.a" and "converse.b". "converse.a" is a lib_32 with the first two entries in the offset table set to 0. Ignore them. "converse.b" is a lib_32. An entry in converse.* looks just like an LZW-compressed file: struct entry { unsigned byte size0; unsigned byte size1; unsigned byte size2; unsigned byte size3; unsigned byte compressed_data[entry_size-4]; }; ========================================================================== To Do ========================================================================== - more info on conversations - books