Debugging ring 3 part of PE/PE+ loader

Someone may ask what is the purpose of debugging PE loader, here are a few reasons:

  • checking why executable is not loaded properly (imports, TLS, other initialization related issues)
  • looking for some hidden features (e.g. LdrpCheckNXCompatibility)
  • plain curiosity

Of course debugging ring 3 part of PE/PE+ loader can reveal only part of the truth, for the second part (or rather first part if I want to be strict) there is MiCreateImageFileMap function inside ntoskrnl (source code of this function can be found in Windows Research Kernel: \base\ntos\mm\creasect.c, it is a bit old, but most of the stuff hasn’t changed much). In this short article I’ll cover only x86 and x64 of ring 3 part.

Ring 3 entry point for the new process (and also thread) is located in NTDLL, it is exported as LdrInitializeThunk, more information about this callback can be found at Skywing’s blog: http://www.nynaeve.net/?p=205. Basically above post inspired me to think about some other method to debug process initialization. It was few years ago and I came with a very simple idea (flawed, as it turned out lately when I got back to this project). Initial concept looked like this:

  • Create process with dwCreationFlags set to CREATE_SUSPENDED
  • Allocate one temporary page in the new process (VirtualAllocEx)
  • inject small shellcode which will check PEB.BeingDebugged field in the loop and in case of debugger detection loop will end and int3 will be executed
  • Redirect LdrInitializeThunk to the shellcode
  • Resume process
  • Attach favourite debugger

I was using this scenario and it was sufficient at that time, however it was sometimes failing. Recently I got back to this and finally found the reason. There is a race condition, because during debugger attachment system creates additional thread that should do DbgBreakPoint. So in my case, after resuming application, one of the threads was reaching my shellcode and second one was waiting until I hit ‘step over’ instead of ‘step into’ and in some cases it was taking the initialization process first, leaving me with the already initialized application. Here is new version of the x86 shellcode:

	BITS 32
_begin:
	jmp	_skip
	push	0
	push	0
	mov	eax, 12345678h                  ; NtTerminateThread
	call	eax
_skip:
	call	$+5
	pop	eax
	mov	word [eax - ($ - _begin - 1)], 9090h
 
	mov	eax, [fs:18h]                   ; TEB
	mov	eax, [eax + 30h]                ; PEB
_loop:	pause
	cmp	byte [eax + 2], 0               ; PEB.BeingDebugged
	je	_loop
	int3
 
 
	mov	eax, 12345678h                  ; LdrInitializeThunk
	mov	dword [eax], 12345678h          ; restore original
	mov	word [eax + 4], 1234h           ; code
	jmp	eax

And the x64 version:

	BITS 64
	default rel 
 
_begin:
	jmp	_skip
	xor	rcx, rcx
	xor	rdx, rdx
	mov	rax, 1234567890abcdefh          ; NtTerminateThread
	call	rax
_skip:
	mov	word [_begin], 9090h
 
	mov	rax, [gs:30h]                   ; TEB
	mov	rax, [rax + 60h]                ; PEB
_loop:	pause
	cmp	byte [rax + 2], 0               ; PEB.BeingDebugged
	je	_loop
	int3
 
 
	mov	rax, 1234567890abcdefh          ; LdrInitializeThunk
	mov	dword [rax], 12345678h          ;\
	mov	dword [rax + 4], 12345678h      ;| restore original code
	mov	dword [rax + 8], 12345678h      ;/
	jmp	rax

Above code takes care of the second thread created during debugger attachment, so before entering the loop it overwrites first two bytes of the shellcode (jmp _skip) with NOPs and second thread goes directly to NtTerminateThread.

To make life easier I’ve created small application called LdrDebug that utilize above method. It will detect format of the executable (PE or PE+), inject proper version of shellcode and print PID of the created process:

e:\...\LdrDebug\Release>LdrDebug.exe notepad64.exe
Creating process: notepad64.exe
Arguments       : (null)
Type            : x64
PID             : 6216 (00001848)

e:\...\LdrDebug\Release>LdrDebug.exe notepad.exe
Creating process: notepad.exe
Arguments       : (null)
Type            : x86
PID             : 6988 (00001B4C)

e:\...\LdrDebug\Release>LdrDebug.exe /x64 notepad.exe
Creating process: notepad.exe
Arguments       : (null)
Type            : x86
PID             : 4240 (00001090)

There is additional switch ‘/x64’ that can be used to debug x64 part of x86 process under WOW64 subsystem. Application was tested on Windows 7, so I can’t guarantee that it will work on any other system. It might not work under Windows 8, as it uses wow64ext library and I had some reports that this library is not working on that system.

Link to binary package: http://rewolf-ldrdebug.googlecode.com/files/rewolf.ldrdebug.zip
Link to google code page: http://code.google.com/p/rewolf-ldrdebug/

Enjoy!

4 Comments

    1. Yup, that should do the trick too, but mine works for every debugger :) Anyway, from the debugger point of view it’s even easier, because setting int3 on LdrIntializeThunk would be an ultimate solution (I saw that PEBrowse x64 has such feature).

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *