Windows SuperFetch file format – partial specification

According to ForensicWiki (http://www.forensicswiki.org/wiki/SuperFetch):

SuperFetch is a performance enhancement introduced in Microsoft Windows Vista to reduce the time necessary to launch applications (…)
Data for SuperFetch is gathered by the %SystemRoot%\System32\Sysmain.dll, part of the Service Host process, %SystemRoot%\System32\Svchost.exe, and stored in a series of files in the %SystemRoot%\Prefetch directory. These files appear to start with the prefix Ag and have a .db extension. The format of these files is not known…

When I read above statement I just couldn’t resist and I’ve decided to take up a challenge. Below you can read what I’ve found, as a bonus I’ve also prepared simple dumper for SuperFetch .db files (attached at the end of this post).

COMPRESSED CONTAINER

As it was stated on ForensicWiki, SuperFetch mechanism is handled by sysmain.dll, this will be the good place to start the research. Most of Ag*.db files starts with a magic value 0x304D454D (“MEM0″) (at least on Windows 7), most – because there are two files that seems to have different format:

  • AgRobust.db – this file will be described later
  • AgAppLaunch.db – I didn’t do analysis of this file (but it shouldn’t be hard)

Searching for magic value in sysmain.dll reveals only two places where it is used:

  • PfSvCompressBuffer()
  • PfSvDecompressBuffer()

I’ve decided to take a look at PfSvDecompressBuffer as it is probably more convenient way to gather information from decompression function, especially if I want to decode given file. Analysis of this function gave me the general information about initial file structure:

offset type size description
0 DWORD 4 Magic value: 0x304D454D (“MEM0″) or 0x4F4D454D (“MEMO”)
4 DWORD 4 Total output size (after decompression)
8 CHUNK var_1 compressed chunk of data
8 + var_1 CHUNK var_2 compressed chunk of data
CHUNK var_n compressed chunk of data
EOF

CHUNK is defined as follow:

	struct CHUNK
	{
		DWORD size;         //size of compressed data
		BYTE data[size];    //compressed data
	};

There are two different compression algorithms that are distinguished by previously mentioned magic value:

  • 0x4F4D454D (“MEMO”) – LZNT1 compression, standard compression available through RtlDecompressBuffer() function with CompressionFormat argument set to COMPRESSION_FORMAT_LZNT1 (http://msdn.microsoft.com/en-us/Library/ff552191(v=VS.85).aspx). For LZNT1 algorithm size field from CHUNK structure is 16-bit value (WORD), and all chunks are decompressed at once by RtlDecompressBuffer function (see attached source code).
  • 0x304D454D (“MEM0″) – Xpress compression, the same compression is used in WIM files (http://www.coderforlife.com/wim-compression/), there are some open source implementations available, but I’ve wrote my own based on description from MSDN (http://msdn.microsoft.com/en-us/library/dd644740(v=PROT.13).aspx). Starting with Windows 8 this decompression will be also available through RtlDecompressBuffer API with the CompressionFormat argument set to COMPRESSION_FORMAT_XPRESS or COMPRESSION_FORMAT_XPRESS_HUFF.

Decompression routines from Windows 7:

; int RtlDecompressBufferProcs[]
	dd 0
	dd 0
	dd offset _RtlDecompressBufferLZNT1@20
	dd offset _RtlDecompressBufferNS@20
	dd offset _RtlDecompressBufferNS@20
	dd offset _RtlDecompressBufferNS@20
	dd offset _RtlDecompressBufferNS@20
	dd offset _RtlDecompressBufferNS@20

Decompression routines from Windows 8:

; int RtlDecompressBufferProcs[]
	dd 0
	dd 0
	dd offset _RtlDecompressBufferLZNT1@24
	dd offset _RtlDecompressBufferXpressLz@24
	dd offset _RtlDecompressBufferXpressHuff@24
	dd offset _RtlDecompressBufferNS@24
	dd offset _RtlDecompressBufferNS@24
	dd offset _RtlDecompressBufferNS@24

I’ve gathered some .db files from Windows 7 x86 and x64 edition and it appears that all files are compressed with Xpress compression. Files from Vista x86 are packed by LZNT1 compression.

PROPER STRUCTURE

After decompression structure of the file can be easily analysed in any hex-editor. Mentioned earlier AgRobust.db have the same structure, so the only difference is that it is not compressed. Quick look shows that there is some header at the beginning and file-paths with some additional binary data in the rest of the file. File header can be described by below structure:

	struct PfFileHeader
	{
		DWORD magic;                   // = 0xE; magic value
		DWORD fileSize;
		DWORD headerSize;              // align this value to 8 after read
		DWORD fileType;                // index to PfDbDatabaseParamsForFileType table
		PfFileParams fileParams;       // 9 dwords
		DWORD volumesCounter;          // number of volumes in file
		DWORD totalEntriesInVolumes;   // ??
		//rest of the header is unknown at this moment
	};

	struct PfFileParams
	{
		DWORD sizes[9];
	};

fileType field is an index to the PfDbDatabaseParamsForFileType table that is located in sysmain.dll (dump from Windows 7 x86):

;PfFileParams PfDbDatabaseParamsForFileType[]

00: PfFileParams < 38h,  24h,  3Ch,    8,    8,    8,    8, 0, 0>
01: PfFileParams < 38h,  34h,  44h,  10h,  14h,    8,    8, 0, 0>
02: PfFileParams < 38h,  2Ch,  44h,  10h,    8,    8,    8, 0, 0>
03: PfFileParams < 38h,  24h,  3Ch,    8,    8,  14h,    8, 0, 0>
XX: PfFileParams 6 dup(<0, 0, 0, 0, 0, 0, 0, 0, 0>)
0A: PfFileParams < 38h,  24h,  3Ch,    8,    8,  0Ch,    8, 0, 0>
0B: PfFileParams < 38h,  24h,  3Ch,  10h,  10h,  10h,  10h, 0, 0>
0C: PfFileParams < 38h,  24h,  3Ch,  0Ch,    8,    8,    8, 0, 0>
0D: PfFileParams <0, 0, 0, 0, 0, 0, 0, 0, 0>
0E: PfFileParams < 38h,  48h,  64h,    8,    8,    8,    8, 0, 0>
0F: PfFileParams < 40h,  28h,  3Ch,    8,    8,  14h,    8, 0, 0>
10: PfFileParams < 38h,  2Ch,  68h,  10h,  18h,  14h,  1Ch, 0, 0>
11: PfFileParams <0, 0, 0, 0, 0, 0, 0, 0, 0>
12: PfFileParams < 48h,  2Ch,  3Ch,    8,    8,    8,    8, 0, 0>

fileParams field is a table of nine dwords, each dword describes size of different structure that is used by the current file. What is the purpose of such table ? The only reason that comes to my mind is to differentiate structure version and type. Sample output of dumped header looks like this:

magic          : 0000000E
file size      : 0008B944
header size    : 000000F0
file type      : 0000000B
volumes counter: 00000001
unknown        : 0000016F
	param 00: 00000038
	param 01: 00000024
	param 02: 0000003C
	param 03: 00000010
	param 04: 00000010
	param 05: 00000010
	param 06: 00000010
	param 07: 00000000
	param 08: 00000000

As you may notice, file type is 0x0B, and if you will compare PfDbDatabaseParamsForFileType[0x0B] with the dumped fileParams you will see that they’re equal.

Structure that follows main header is basically 3-level tree, at first level there is a volume description:

Volume: (BC1D1716) (00000017) \DEVICE\HARDDISKVOLUME1
Volume ID: XXXX-XXXX
Timestamp: 2011-07-02, 01:40:26 (328)

019EA7B0 019EA7B0 00000C79 00020000 ........ 00000000 045BF262
01CC3859 XXXXXXXX 00000000 ........ 00010017 ........ 00000000

I don’t know sense of all values but some of them are addresses (?!?), for the exact fields names you can check attached source code (search for PfVolumeHeader_38 or PfVolumeHeader_48 structures). Known fields are:

- Volume ID
- Timestamp
- Number of file entries (on the second level of the tree)
- Length of volume name

At the second level there are files descriptors, third level describes some chunks of each file (probably it is related to memory mapping of each part of file), but I’ve no idea what is the exact meaning of those values:

File: (4D5FA4DE) (0000001D) \WINDOWS\SYSTEM32\BCDPROV.DLL
019BA611 4D5FA4DE 00000006 000000880269D350 00000000 00800000 00000074 019BA608

	035D4DA0 0000D100 00000060 00000000
	035D4D90 00000500 00000060 00000000
	035D4D70 0000B500 00000060 00000000
	03AEBCE1 00001500 00000060 00000000
	035D4D80 0000C700 00000060 00000000
	03AEBCE1 0000CB00 00000060 00000000

Files records are described by structures called PfRecordHeader (in the source code: PfRecordHeader_24, PfRecordHeader_34, PfRecordHeader_40, PfRecordHeader_48, PfRecordHeader_58, PfRecordHeader_70). Known fields are:

  • 32-bit hash of filename (implementation of the hash can be found in source code, function hashStr())
  • Number of chunks (on the third level of tree)
  • Length of filename

There are also some nuisances about structure alignment inside decompressed .db files, but everyone can check it in the sources (of course!).

END

Above specification doesn’t fulfil SuperFetch topic, I don’t know if I’ll be continuing this research (probably not), so if anyone is interested here are the sources of my SuperFetch dumper:

http://code.google.com/p/rewolf-superfetch-dumper/

Sources are published under GNU GPL v3 license. Enjoy!

Comments (2)

  1. 10:08, October 5, 2011omeg  / Reply

    Nice finds. Data reversing is fun. ;)

  2. 18:25, October 5, 2011André  / Reply

    Thanks

Leave a Reply

Allowed Tags - You may use these HTML tags and attributes in your comment.

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>