{"id":463,"date":"2013-01-31T20:34:14","date_gmt":"2013-01-31T19:34:14","guid":{"rendered":"http:\/\/blog.rewolf.pl\/blog\/?p=463"},"modified":"2013-06-28T16:55:19","modified_gmt":"2013-06-28T14:55:19","slug":"debugging-ring-3-part-of-pepe-loader","status":"publish","type":"post","link":"http:\/\/blog.rewolf.pl\/blog\/?p=463","title":{"rendered":"Debugging ring 3 part of PE\/PE+ loader"},"content":{"rendered":"<p style=\"text-align: justify;\">Someone may ask what is the purpose of debugging <strong>PE<\/strong> loader, here are a few reasons:<\/p>\n<ul>\n<li>checking why executable is not loaded properly (imports, <strong>TLS<\/strong>, other initialization related issues)<\/li>\n<li>looking for some hidden features (e.g. <strong>LdrpCheckNXCompatibility<\/strong>)<\/li>\n<li>plain curiosity<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">Of course debugging ring 3 part of <strong>PE<\/strong>\/<strong>PE+<\/strong> loader can reveal only part of the truth, for the second part (or rather first part if I want to be strict) there is <strong>MiCreateImageFileMap<\/strong> function inside <strong>ntoskrnl<\/strong> (source code of this function can be found in <strong>Windows Research Kernel<\/strong>: \\base\\ntos\\mm\\creasect.c, it is a bit old, but most of the stuff hasn&#8217;t changed much). In this short article I&#8217;ll cover only <strong>x86<\/strong> and <strong>x64<\/strong> of ring 3 part.<\/p>\n<p><!--more--><\/p>\n<p style=\"text-align: justify;\">Ring 3 entry point for the new process (and also thread) is located in <strong>NTDLL<\/strong>, it is exported as <strong>LdrInitializeThunk<\/strong>, more information about this callback can be found at <strong>Skywing&#8217;s blog<\/strong>: <a href=\"http:\/\/www.nynaeve.net\/?p=205\" title=\"Skywing's blog\" target=\"_blank\">http:\/\/www.nynaeve.net\/?p=205<\/a>. Basically above post inspired me to think about some other method to debug process initialization. It was few years ago and I came with a very simple idea (flawed, as it turned out lately when I got back to this project). Initial concept looked like this:<\/p>\n<ul>\n<li>Create process with <strong>dwCreationFlags<\/strong> set to <strong>CREATE_SUSPENDED<\/strong><\/li>\n<li>Allocate one temporary page in the new process (<strong>VirtualAllocEx<\/strong>)<\/li>\n<li>inject small shellcode which will check <strong>PEB.BeingDebugged<\/strong> field in the loop and in case of debugger detection loop will end and <strong>int3<\/strong> will be executed<\/li>\n<li>Redirect <strong>LdrInitializeThunk<\/strong> to the shellcode<\/li>\n<li>Resume process<\/li>\n<li>Attach favourite debugger<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">I was using this scenario and it was sufficient at that time, however it was sometimes failing. Recently I got back to this and finally found the reason. There is a race condition, because during debugger attachment system creates additional thread that should do <strong>DbgBreakPoint<\/strong>. So in my case, after resuming application, one of the threads was reaching my shellcode and second one was waiting until I hit &#8216;step over&#8217; instead of &#8216;step into&#8217; and in some cases it was taking the initialization process first, leaving me with the already initialized application. Here is new version of the <strong>x86<\/strong> shellcode:<\/p>\n<pre lang=\"asm\">\tBITS 32\r\n_begin:\r\n\tjmp\t_skip\r\n\tpush\t0\r\n\tpush\t0\r\n\tmov\teax, 12345678h                  ; NtTerminateThread\r\n\tcall\teax\r\n_skip:\r\n\tcall\t$+5\r\n\tpop\teax\r\n\tmov\tword [eax - ($ - _begin - 1)], 9090h\r\n\r\n\tmov\teax, [fs:18h]                   ; TEB\r\n\tmov\teax, [eax + 30h]                ; PEB\r\n_loop:\tpause\r\n\tcmp\tbyte [eax + 2], 0               ; PEB.BeingDebugged\r\n\tje\t_loop\r\n\tint3\r\n\r\n\r\n\tmov\teax, 12345678h                  ; LdrInitializeThunk\r\n\tmov\tdword [eax], 12345678h          ; restore original\r\n\tmov\tword [eax + 4], 1234h           ; code\r\n\tjmp\teax\r\n<\/pre>\n<p style=\"text-align: justify;\">And the <strong>x64<\/strong> version:<\/p>\n<pre lang=\"asm\">\tBITS 64\r\n\tdefault rel \r\n\r\n_begin:\r\n\tjmp\t_skip\r\n\txor\trcx, rcx\r\n\txor\trdx, rdx\r\n\tmov\trax, 1234567890abcdefh          ; NtTerminateThread\r\n\tcall\trax\r\n_skip:\r\n\tmov\tword [_begin], 9090h\r\n\r\n\tmov\trax, [gs:30h]                   ; TEB\r\n\tmov\trax, [rax + 60h]                ; PEB\r\n_loop:\tpause\r\n\tcmp\tbyte [rax + 2], 0               ; PEB.BeingDebugged\r\n\tje\t_loop\r\n\tint3\r\n\r\n\r\n\tmov\trax, 1234567890abcdefh          ; LdrInitializeThunk\r\n\tmov\tdword [rax], 12345678h          ;\\\r\n\tmov\tdword [rax + 4], 12345678h      ;| restore original code\r\n\tmov\tdword [rax + 8], 12345678h      ;\/\r\n\tjmp\trax\r\n<\/pre>\n<p style=\"text-align: justify;\">Above code takes care of the second thread created during debugger attachment, so before entering the loop it overwrites first two bytes of the shellcode (jmp _skip) with <strong>NOPs<\/strong> and second thread goes directly to <strong>NtTerminateThread<\/strong>.<\/p>\n<p style=\"text-align: justify;\">To make life easier I&#8217;ve created small application called <strong>LdrDebug<\/strong> that utilize above method. It will detect format of the executable (<strong>PE<\/strong> or <strong>PE+<\/strong>), inject proper version of shellcode and print <strong>PID<\/strong> of the created process:<\/p>\n<pre>e:\\...\\LdrDebug\\Release>LdrDebug.exe notepad64.exe\r\nCreating process: notepad64.exe\r\nArguments       : (null)\r\nType            : x64\r\nPID             : 6216 (00001848)\r\n\r\ne:\\...\\LdrDebug\\Release>LdrDebug.exe notepad.exe\r\nCreating process: notepad.exe\r\nArguments       : (null)\r\nType            : x86\r\nPID             : 6988 (00001B4C)\r\n\r\ne:\\...\\LdrDebug\\Release>LdrDebug.exe \/x64 notepad.exe\r\nCreating process: notepad.exe\r\nArguments       : (null)\r\nType            : x86\r\nPID             : 4240 (00001090)\r\n<\/pre>\n<p style=\"text-align: justify;\">There is additional switch &#8216;\/x64&#8217; that can be used to debug <strong>x64<\/strong> part of <strong>x86<\/strong> process under <strong>WOW64<\/strong> subsystem. Application was tested on <strong>Windows 7<\/strong>, so I can&#8217;t guarantee that it will work on any other system. It might not work under <strong>Windows 8<\/strong>, as it uses <strong>wow64ext<\/strong> library and I had some reports that this library is not working on that system.<\/p>\n<p>Link to binary package: <a href=\"http:\/\/rewolf-ldrdebug.googlecode.com\/files\/rewolf.ldrdebug.zip\" title=\"Binary package\" target=\"_blank\">http:\/\/rewolf-ldrdebug.googlecode.com\/files\/rewolf.ldrdebug.zip<\/a><br \/>\nLink to google code page: <a href=\"http:\/\/code.google.com\/p\/rewolf-ldrdebug\/\" title=\"Google code\" target=\"_blank\">http:\/\/code.google.com\/p\/rewolf-ldrdebug\/<\/a><\/p>\n<p>Enjoy!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Someone may ask what is the purpose of debugging PE loader, here are a few reasons: checking why executable is not loaded properly (imports, TLS, other initialization related issues) looking for some hidden features (e.g. LdrpCheckNXCompatibility) plain curiosity Of course debugging ring 3 part of PE\/PE+ loader can reveal only part of the truth, for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[12,10,3,5,16,11],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=\/wp\/v2\/posts\/463"}],"collection":[{"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=463"}],"version-history":[{"count":36,"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=\/wp\/v2\/posts\/463\/revisions"}],"predecessor-version":[{"id":667,"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=\/wp\/v2\/posts\/463\/revisions\/667"}],"wp:attachment":[{"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=463"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=463"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.rewolf.pl\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=463"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}