According to ForensicWiki (http://www.forensicswiki.org/wiki/SuperFetch):
SuperFetch is a performance enhancement introduced in Microsoft Windows Vista to reduce the time necessary to launch applications (…)
Data for SuperFetch is gathered by the %SystemRoot%\System32\Sysmain.dll, part of the Service Host process, %SystemRoot%\System32\Svchost.exe, and stored in a series of files in the %SystemRoot%\Prefetch directory. These files appear to start with the prefix Ag and have a .db extension. The format of these files is not known…
When I read above statement I just couldn’t resist and I’ve decided to take up a challenge. Below you can read what I’ve found, as a bonus I’ve also prepared simple dumper for SuperFetch .db files (attached at the end of this post).
COMPRESSED CONTAINER
As it was stated on ForensicWiki, SuperFetch mechanism is handled by sysmain.dll, this will be the good place to start the research. Most of Ag*.db files starts with a magic value 0x304D454D (“MEM0”) (at least on Windows 7), most – because there are two files that seems to have different format:
- AgRobust.db – this file will be described later
- AgAppLaunch.db – I didn’t do analysis of this file (but it shouldn’t be hard)
Searching for magic value in sysmain.dll reveals only two places where it is used:
- PfSvCompressBuffer()
- PfSvDecompressBuffer()
I’ve decided to take a look at PfSvDecompressBuffer as it is probably more convenient way to gather information from decompression function, especially if I want to decode given file. Analysis of this function gave me the general information about initial file structure:
offset | type | size | description |
0 | DWORD | 4 | Magic value: 0x304D454D (“MEM0”) or 0x4F4D454D (“MEMO”) |
4 | DWORD | 4 | Total output size (after decompression) |
8 | CHUNK | var_1 | compressed chunk of data |
8 + var_1 | CHUNK | var_2 | compressed chunk of data |
… | … | … | … |
… | CHUNK | var_n | compressed chunk of data |
EOF |
CHUNK is defined as follow:
struct CHUNK { DWORD size; //size of compressed data BYTE data[size]; //compressed data };
There are two different compression algorithms that are distinguished by previously mentioned magic value:
- 0x4F4D454D (“MEMO”) – LZNT1 compression, standard compression available through RtlDecompressBuffer() function with CompressionFormat argument set to COMPRESSION_FORMAT_LZNT1 (http://msdn.microsoft.com/en-us/Library/ff552191(v=VS.85).aspx). For LZNT1 algorithm size field from CHUNK structure is 16-bit value (WORD), and all chunks are decompressed at once by RtlDecompressBuffer function (see attached source code).
- 0x304D454D (“MEM0”) – Xpress compression, the same compression is used in WIM files (http://www.coderforlife.com/wim-compression/), there are some open source implementations available, but I’ve wrote my own based on description from MSDN (http://msdn.microsoft.com/en-us/library/dd644740(v=PROT.13).aspx). Starting with Windows 8 this decompression will be also available through RtlDecompressBuffer API with the CompressionFormat argument set to COMPRESSION_FORMAT_XPRESS or COMPRESSION_FORMAT_XPRESS_HUFF.
Decompression routines from Windows 7:
; int RtlDecompressBufferProcs[] dd 0 dd 0 dd offset _RtlDecompressBufferLZNT1@20 dd offset _RtlDecompressBufferNS@20 dd offset _RtlDecompressBufferNS@20 dd offset _RtlDecompressBufferNS@20 dd offset _RtlDecompressBufferNS@20 dd offset _RtlDecompressBufferNS@20
Decompression routines from Windows 8:
; int RtlDecompressBufferProcs[] dd 0 dd 0 dd offset _RtlDecompressBufferLZNT1@24 dd offset _RtlDecompressBufferXpressLz@24 dd offset _RtlDecompressBufferXpressHuff@24 dd offset _RtlDecompressBufferNS@24 dd offset _RtlDecompressBufferNS@24 dd offset _RtlDecompressBufferNS@24
I’ve gathered some .db files from Windows 7 x86 and x64 edition and it appears that all files are compressed with Xpress compression. Files from Vista x86 are packed by LZNT1 compression.
PROPER STRUCTURE
After decompression structure of the file can be easily analysed in any hex-editor. Mentioned earlier AgRobust.db have the same structure, so the only difference is that it is not compressed. Quick look shows that there is some header at the beginning and file-paths with some additional binary data in the rest of the file. File header can be described by below structure:
struct PfFileHeader { DWORD magic; // = 0xE; magic value DWORD fileSize; DWORD headerSize; // align this value to 8 after read DWORD fileType; // index to PfDbDatabaseParamsForFileType table PfFileParams fileParams; // 9 dwords DWORD volumesCounter; // number of volumes in file DWORD totalEntriesInVolumes; // ?? //rest of the header is unknown at this moment }; struct PfFileParams { DWORD sizes[9]; };
fileType field is an index to the PfDbDatabaseParamsForFileType table that is located in sysmain.dll (dump from Windows 7 x86):
;PfFileParams PfDbDatabaseParamsForFileType[] 00: PfFileParams < 38h, 24h, 3Ch, 8, 8, 8, 8, 0, 0> 01: PfFileParams < 38h, 34h, 44h, 10h, 14h, 8, 8, 0, 0> 02: PfFileParams < 38h, 2Ch, 44h, 10h, 8, 8, 8, 0, 0> 03: PfFileParams < 38h, 24h, 3Ch, 8, 8, 14h, 8, 0, 0> XX: PfFileParams 6 dup(<0, 0, 0, 0, 0, 0, 0, 0, 0>) 0A: PfFileParams < 38h, 24h, 3Ch, 8, 8, 0Ch, 8, 0, 0> 0B: PfFileParams < 38h, 24h, 3Ch, 10h, 10h, 10h, 10h, 0, 0> 0C: PfFileParams < 38h, 24h, 3Ch, 0Ch, 8, 8, 8, 0, 0> 0D: PfFileParams <0, 0, 0, 0, 0, 0, 0, 0, 0> 0E: PfFileParams < 38h, 48h, 64h, 8, 8, 8, 8, 0, 0> 0F: PfFileParams < 40h, 28h, 3Ch, 8, 8, 14h, 8, 0, 0> 10: PfFileParams < 38h, 2Ch, 68h, 10h, 18h, 14h, 1Ch, 0, 0> 11: PfFileParams <0, 0, 0, 0, 0, 0, 0, 0, 0> 12: PfFileParams < 48h, 2Ch, 3Ch, 8, 8, 8, 8, 0, 0>
fileParams field is a table of nine dwords, each dword describes size of different structure that is used by the current file. What is the purpose of such table ? The only reason that comes to my mind is to differentiate structure version and type. Sample output of dumped header looks like this:
magic : 0000000E file size : 0008B944 header size : 000000F0 file type : 0000000B volumes counter: 00000001 unknown : 0000016F param 00: 00000038 param 01: 00000024 param 02: 0000003C param 03: 00000010 param 04: 00000010 param 05: 00000010 param 06: 00000010 param 07: 00000000 param 08: 00000000
As you may notice, file type is 0x0B, and if you will compare PfDbDatabaseParamsForFileType[0x0B] with the dumped fileParams you will see that they’re equal.
Structure that follows main header is basically 3-level tree, at first level there is a volume description:
Volume: (BC1D1716) (00000017) \DEVICE\HARDDISKVOLUME1 Volume ID: XXXX-XXXX Timestamp: 2011-07-02, 01:40:26 (328) 019EA7B0 019EA7B0 00000C79 00020000 ........ 00000000 045BF262 01CC3859 XXXXXXXX 00000000 ........ 00010017 ........ 00000000
I don’t know sense of all values but some of them are addresses (?!?), for the exact fields names you can check attached source code (search for PfVolumeHeader_38 or PfVolumeHeader_48 structures). Known fields are:
– Volume ID
– Timestamp
– Number of file entries (on the second level of the tree)
– Length of volume name
At the second level there are files descriptors, third level describes some chunks of each file (probably it is related to memory mapping of each part of file), but I’ve no idea what is the exact meaning of those values:
File: (4D5FA4DE) (0000001D) \WINDOWS\SYSTEM32\BCDPROV.DLL 019BA611 4D5FA4DE 00000006 000000880269D350 00000000 00800000 00000074 019BA608 035D4DA0 0000D100 00000060 00000000 035D4D90 00000500 00000060 00000000 035D4D70 0000B500 00000060 00000000 03AEBCE1 00001500 00000060 00000000 035D4D80 0000C700 00000060 00000000 03AEBCE1 0000CB00 00000060 00000000
Files records are described by structures called PfRecordHeader (in the source code: PfRecordHeader_24, PfRecordHeader_34, PfRecordHeader_40, PfRecordHeader_48, PfRecordHeader_58, PfRecordHeader_70). Known fields are:
- 32-bit hash of filename (implementation of the hash can be found in source code, function hashStr())
- Number of chunks (on the third level of tree)
- Length of filename
There are also some nuisances about structure alignment inside decompressed .db files, but everyone can check it in the sources (of course!).
END
Above specification doesn’t fulfil SuperFetch topic, I don’t know if I’ll be continuing this research (probably not), so if anyone is interested here are the sources of my SuperFetch dumper:
http://code.google.com/p/rewolf-superfetch-dumper/
Sources are published under GNU GPL v3 license. Enjoy!
Nice finds. Data reversing is fun. ;)
Thanks
[…] Windows SuperFetch file format – partial specification […]
Hi.. I ran the program and found, it is not working on Windows 10 platforms.
Magic Signature: 0x844D414D (MAM). I Think Windows 10 compresses SuperFetch files with Xpress Huffman algorithm (same as Prefetch files). Can you update your program to work on Windows 10 ?
Thanks in Advance !!!
Hi, I’ll probably look into this issue at some point, but I can’t promise anything.