Reverse engineering Mortal Kombat GRA file format (part 1)

Disclaimer: This post is aimed at retro-gaming preservation and code-archeology. All product names, trademarks and registered trademarks are property of their respective owners.

GRA files are used by the PC DOS version of Mortal Kombat 1 and 2 (available on GOG) to store all kinds of graphics. There are two different types of GRA files:

compressed static images or animations – this is a well-defined self-contained file format that can be easily converted to the PNG/APNG/GIF, the only obstacle is compression which has to be reverse-engineered first. I’ll refer to it as cGRA and cover that format today.
not compressed sprites/fonts/graphic objects/UI elements – this format is kind of a mess, it just contains encoded pixel data without any metadata. All necessary information has to be scavenged from the Mortal Kombat executable (sprites offsets, width, height, palette). I’ll refer to it as uGRA and cover that format in part 2 of this blog post (still have to figure out a few things).

Reverse engineering toolset

MK1.EXE is compiled with Watcom compiler and is using DOS/4GW protected mode extender. This basically means two things:

code is a well-known 32bit x86 assembly – no weird segmentation, all 32-bit disassemblers, and decompilers should work, but…
only super old (IDA Free 4.1, maybe Sourcer but I couldn’t find it atm) or super expensive (IDA Pro) tools support DOS/4GW LE (linear executable) file format.

There is of course DOSBox debugger that can handle this type of files, but I prefer static analysis aided with dynamic tools only when necessary (e.g. I’ve no idea what is happening).

Let’s assume, that the analysis has to be performed with minimal cost (preferably only with freely available tools). Searching for the tools with DOS/4GW or LE support pointed me into some retro-gaming blogs and forums where I’ve found the information that IDA Freeware 4.1 (command line, TurboVision-like interface) contains LE loader. This old version of IDA can be used to create IDB with the properly loaded LE executable and later, the IDB can be opened with the newer IDA 5.0 Freeware. It would be great if that scenario was compatible with IDA Freeware 7.0 (native Linux and MacOS support), but unfortunately, IDA Freeware 7.0 refuses to open 32bit IDBs, only i64 databases are supported.

Another possibility would require writing (adapting from open-source? boomerang decompiler has some LE parsing) minimal loader/mapper for LE files, which would create a flat memory dump of the LE file and this file could be loaded into IDA Freeware 7.0 as a 32-bit binary file (without any file format). I’ll leave that option as an exercise for the reader ;) (here are the specs)

Little bit of reverse engineering

Since I’m reverse-engineering the file format, the easiest approach to find the code responsible for parsing GRA files is to find the places in the code where GRA filenames are referenced. MK1 developers made that step quite easy, all GRA filenames (both compressed and uncompressed) are referenced just once. The place that references them looks like an array (let’s call it gra_entries) of structures describing each file (some fields are unknown to me, some are not relevant from the file parsing perspective):

struct FileEntry {
    char *filename;
    int filesize;
    char flags;
    char padding[3];
    int unk2_0;
    char *buffer;
    int unk2_2;
};

dseg02:074208 gra_entries FileEntry <offset aGraphicsStance, 2DA78h, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsFonts_, 1E5A8h, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsMisc_g, 197FCh, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsVictor, 6308h, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsJcatt_, 0C0FAh, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsKatt_g, 8110h, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsLkatt_, 0F3BAh, 12h, 0, 0, 0, 0>
dseg02:074208             FileEntry <offset aGraphicsRdatt_, 0CD4Eh, 12h, 0, 0, 0, 0>

filesize field is quite important if someone plans to mod the game files. It has to be adjusted in the MK1 executable, otherwise the game will not be able to read it. All further in-game file references are made through the index into that array. The file is identified by a 32bit value that contains both file index and offset in the file, for example:

0x2206CB90:
 -> file index : 0x22
 -> file offset: 0x06CB90

gra_entries array is referenced by three different functions:

0x010818 – quite small, looks just reset functionality for the buffer, unk2_0, and unk2_2 fields
0x011C90 – a bit bigger, after some analysis I figured out that it is related to the players’ animation (might be useful later, to aid reverse engineering of the uncompressed GRA files)
0x011E04 – this is the function that is responsible for reading the file from the disk (no file parsing yet). It takes one argument, which is mentioned earlier 32bit value that encodes file index and file offset. It returns a pointer to the file data at the given offset

Going through the places where the 0x011E04 (which I named getFileBufferAtOffset) function is referenced reveals some interesting parts of the code:

cseg01:01E613   mov    eax, 39000000h ; "LEGAL.GRA"
cseg01:01E618   call   getFileBufferAtOffset
cseg01:01E61D   mov    edi, eax
cseg01:01E61F   call   parseCompressedStream
cseg01:01E624   mov    eax, 0C8h
cseg01:01E629   call   sub_2D576

cseg01:01E633   mov    eax, 0D000000h       ; "ACCLAIM.GRA"
cseg01:01E638   call   getFileBufferAtOffset
cseg01:01E63D   mov    edi, eax
cseg01:01E63F   call   parseCompressedStream
cseg01:01E644   mov    eax, 64h
cseg01:01E649   call   sub_2D576

cseg01:01E67D   mov    eax, 0F000000h       ; "PROBE.GRA"
cseg01:01E682   call   getFileBufferAtOffset
cseg01:01E687   mov    edi, eax
cseg01:01E689   mov    dword_74B54, 0
cseg01:01E693   mov    ecx, 8
cseg01:01E698
cseg01:01E698   loc_1E698:
cseg01:01E698   lea    ebp, [ebp-4]
cseg01:01E69B   mov    [ebp+0], ecx
cseg01:01E69E   mov    eax, dword_74B54
cseg01:01E6A3   call   parseCompressedStream_loop

cseg01:01E746   mov    eax, 0C000000h       ; "MKTITLE1.GRA"
cseg01:01E74B   call   getFileBufferAtOffset
cseg01:01E750   mov    edi, eax
cseg01:01E752   call   parseCompressedStream
cseg01:01E757   mov    edi, offset off_5A45C
cseg01:01E75C   call   sub_16698

cseg01:01E7EB   mov    eax, 0B000000h       ; "GORO1.GRA"
cseg01:01E7F0   call   getFileBufferAtOffset
cseg01:01E7F5   mov    edi, eax
cseg01:01E7F7   call   parseCompressedStream
cseg01:01E7FC   mov    eax, 0C0h
cseg01:01E801   call   sub_2D576

“PROBE.GRA” has a bit different handling since it is an animation:

Immediately after the getFileBufferAtOffset call, there is a call to the function which I named parseCompressedStream (0x01EC3B), all compressed GRA parsing happens there.

Compressed GRA file format

cGRA files consist of two parts, palette data and frames pixel data (palette is the same for all frames encoded in the given file).

Palette

Palette reading is part of the parseCompressedStream function and in the python code (yup, the parser is implemented in python) looks like this:

# br is a BitReader object, getWord just reads 16 bits
def getPalette(br):
  palette = []
  record_num = br.getWord()
  for _ in range(0, record_num):
    record_size = br.getWord()
    for _ in range(0, record_size):
      palette.append(br.getWord())
  return palette

pseudo-C structure for clarity:

struct Palette {
    uint16_t record_num;
    struct {
        uint16_t record_size;
        uint16_t colors[record_size];
    } records[record_num];
};

I don’t know the reason why the palette is stored in smaller chunks instead of one array of 16bit values, I guess it might be related to some implementation details that I’m not aware of. MK1 is using VGA 320×200 256 colors graphics mode, which means each of 256 colors can be encoded as an 18bit RGB value (6 bits for each color, this is the limitation of the VGA graphic mode). Palette stored in cGRA files encodes each color as a 16bit value (highcolor), which would mean 5:6:5 bits R:G:B split. Careful inspection of pallets stored in all cGRA files revealed that the most significant bit is never used and the color is rather stored on 15bits (5:5:5). I’ve used the below function to convert 15bit RGB values to the full 24bit used by PNG/APNG python library:

MULT = 255.0/31
def convert15to24bitRGB(r, g, b):
  return int(round(r*MULT)), int(round(g*MULT)), int(round(b*MULT))

Frames

Each frame has a header and a list of chunks with compressed pixel data:

struct FrameHeader {
    uint16_t width;
    uint16_t height;
    uint8_t compression_parameter;
    struct {
        uint8_t chunk_size;
        uint8_t chunk_data[chunk_size];
    } chunks[];
}

chunks array ends when the chunk_size field is zero. Corresponding python code (annotated with a very professional comment to cover my inability to fully understand where this padding/alignment comes from):

def getCompressedData(br):
  # HACK: skip unknown number of padding 0 bytes
  b = 0
  while b == 0:
    b = br.getBits(8)
  width = b | (br.getBits(8) << 8)
  height = br.getWord()

  c = br.getBits(8)

  blocks = []
  while not br.isEnd():
    block_size = br.getBits(8)	
    if block_size == 0:
      break
    block = br.getBytes(block_size)
    blocks.append(block)
  return width, height, c, b''.join(blocks)

Compression

Decompression function is at address cseg01:000102A8. It is quite complicated because on top of the decompression, it also implements a simple bit-reader, and the output is stored directly(? +/- cache) in the VGA graphic buffer, thus it needs to correctly place the pixels in the 320×200 space. I’ve used DOSBox debugger to trace it for a bit and, believe it or not, I was able to tell that this code is oddly similar to the LZW implementation that I was looking at just a few weeks earlier. Having that knowledge helped with further reverse engineering, even if the exact implementation was different. One of the characteristics of the LZW compression are codes reserved for clearing the code table and marking the end of data (wikipedia):

Further refinements include reserving a code to indicate that the code table should be cleared and restored to its initial state (a “clear code”, typically the first value immediately after the values for the individual alphabet characters), and a code to indicate the end of data (a “stop code”, typically one greater than the clear code). The clear code allows the table to be reinitialized after it fills up, which lets the encoding adapt to changing patterns in the input data.

compression_parameter field in the FrameHeader specifies the number of bits per pixel (it is always 8 in the case of Mortal Kombat), this value is passed as an argument to the decompression routine, and it is used to calculate clear and stop code for the LZW:

; edx is a pointer to the compressed data
cseg01:0102E1   mov    al, [edx]                  ; al = 8
; [...]
cseg01:0102EA   mov    cl, al                     ; cl = 8
; [...]
cseg01:0102ED   mov    ebx, 1
; [...]
cseg01:0102F9   shl    ebx, cl                    ; ebx = 256
; [...]
cseg01:01033B   mov    [esp+14Ch+clear_code], ebx ; clear_code = 256
; [...]
cseg01:010343   add    ebx, 2                     ; ebx = 258
; [...]
cseg01:01034D   mov    eax, [esp+14Ch+clear_code] ; eax = 256
cseg01:010354   mov    [esp+14Ch+next_code], ebx  ; next_code = 258
cseg01:01035B   inc    eax                        ; eax = 257
cseg01:01035C   mov    [esp+14Ch+new_code], ebx   ; new_code = 258
cseg01:010363   mov    [esp+14Ch+eod_code], eax   ; eod_code = 257

which gives the same results as the constants defined in the original LZW implementation:

#define M_CLR	256	/* clear table marker */
#define M_EOD	257	/* end-of-data marker */
#define M_NEW	258	/* new code index */

In the end, I didn’t try to use off-the-shelf implementation and ported the one from MK1 to python (after all it’s just 50 lines of code).

cGRA parser

Source code of the cGRA parser written in python3 is available on github: https://github.com/rwfpl/rewolf-mortal-kombat

$ python3 parse.py --help
usage: parse.py [-h] [--apng true/false] [--png true/false] [--raw true/false]
                [--apng_delay int] [--outdir str]
                input_file

Mortal Kombat GRA files parser.
Copyright (c) 2018 ReWolf
All rights reserved.
http://blog.rewolf.pl

positional arguments:
  input_file

optional arguments:
  -h, --help         show this help message and exit
  --apng true/false  enable/disable APNG generation (default: True)
  --png true/false   enable/disable PNG generation (default: True)
  --raw true/false   enable/disable RAW pixel dumps (default: False)
  --apng_delay int   APNG frame delay in miliseconds (default: 100)
  --outdir str       output directory (default: .)

6 Comments

IT Security Weekend Catch Up – August 25, 2018 – BadCyber August 25, 2018 at 22:11

[…] Reverse engineering Mortal Kombat GRA file format […]

Marcin August 26, 2018 at 00:10

Brawo!

B. August 26, 2018 at 21:28

Nice job, you could post more often :)

Fonic December 7, 2019 at 23:05

https://github.com/fonic/wcdctool

Weekendowa Lektura: odcinek 277 [2018-08-25]. Bierzcie i czytajcie | Zaufana Trzecia Strona January 1, 2020 at 09:09

[…] Inżynieria wsteczna plików GRA używanych w Mortal Kombat […]

Reverse engineering Mortal Kombat GRA file format (part 2) – ReWolf's blog March 14, 2021 at 00:41

[…] days, this is the amount of time that passed since part 1 of that blog post. I had almost all work done back in 2018, I was just missing one small detail […]