Software Packers - Reverse Engineering Protection and AV Evasion

What even is a packer

Once upon a time, we didn’t have high speed network connections which meant downloading software could take a long time. This lead to the creation of so called “software packers”, software which would take a executable and compress it into a self extracting executable.

Probably the most famous one, UPX is open source and available at GitHub.

Nowadays packers are mostly used for a different purpose, namely making it harder to statically reverse engineer some code. They will also often encrypt and/or compress the code rather than just compressing. Naturally, they are popular for malware as they can help prevent static detection mechanisms. When a packer encrypts code rather than compress it’s often called a “crypter”.

Commercial protectors will also optionally use packing to further protect the code.

Components of a packer

Packers have two components:

Packer

This is the software that will receive an executable as input and then do some operation on it, such as compression and encryption. It will then take the “packed” form of the original executable, join it with the stub (explained next) and create the final self unpacking executable.

Stub

This is pre-written code that will be joined with the compressed/encrypted payload to create the final packed executable.

The stub can do several things:

  • Perform the unpacking process
  • Anti debugging functionality
  • Run the payload via unpacking to disk or (more common for malware) performing process injection

For packers that are intended to prevent malware detection, there are even services that will provide custom stubs on a subscription or per stub basis. This is because if a stub is in common use by malware, it will be detected sooner rather than later.

Designing a packer

No better way to learn than by doing. So today we will write our very own software packer. We won’t do anything too complex but you will need C knowledge and some Win32 API knowledge.

For writing the stub in particular, we don’t want to have any external dependencies as this needs to be as small as possible.

Here is how it will be designed:

  • Only Windows API and MSVCRT will be used to keep size small
  • The packer will receive a executable and compress the code using the LZMS algorithm
  • The payload will be embedded into a copy of the stub on a new PE section called “.packed”
  • The payload will be decompressed and run using manual PE loading technique (more on this in a bit)

We can design a packer in many different ways, I chose to do it like this since it’s simple while remaining interesting.

Writing a stub

Decompressing executable stored in .packed section

This is the simplest part, we will the executable’s own sections to find one named “.packed”, copy it into a buffer we can modify and decompress the data.

For simplicity’s sake I will omit the “CompressionHelper” class I wrote but it’s available on the repository.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
	HANDLE base = GetModuleHandle(nullptr);
	auto dosHeader = (PIMAGE_DOS_HEADER)base;
	auto ntHeaders = (PIMAGE_NT_HEADERS)((char*)dosHeader + dosHeader->e_lfanew);
	auto sections = (PIMAGE_SECTION_HEADER)(ntHeaders + 1);
	auto lastSection = sections[ntHeaders->FileHeader.NumberOfSections - 1];
	auto sectionSize = lastSection.SizeOfRawData;
	auto sectionPointer = (char*)base + lastSection.VirtualAddress;
	char* data = new char[sectionSize];
	::RtlCopyMemory(data, sectionPointer, sectionSize);
	CompressionHelper compression(COMPRESS_ALGORITHM_LZMS);
	if (compression.DecompressData((BYTE*)data, sectionSize))
	{
		Loader loader((char*)compression.DecompressedBuffer);
		loader.Prepare();
		loader.Run();
	}

None of this should be new if you’ve investigated the PE format before. The section containing our packed executable is the last one that the packer added to a copy of the stub.

We then pass that data to our loader which is the most complex part.

PE loading

The way a executable looks on disk and in memory is different. This means you can’t simply copy an executable into a buffer and then jump to its entry point.

To make a executable ready to be run, several tasks will be done by the Windows PE loader. But since we want to run the executable ourselves without it ever touching the disk, we will need to copy what the PE loader normally does entirely in memory.

The bare minimum we need to do to load an executable entire from memory is:

  • Initialize the memory where we will load the executable
  • Copy the PE sections into memory
  • Resolve the program’s imports
  • Perform relocations

Let’s walk through each of those parts. I’ve removed some of the code such as error handling for simplicity, but it can be found on GitHub.

Initializing memory

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
void Loader::InitMemory()
{
	DosHeader = (PIMAGE_DOS_HEADER)Data;

	NtHeaders = (PIMAGE_NT_HEADERS)(Data + DosHeader->e_lfanew);

	Base = (char*)::VirtualAlloc(nullptr, NtHeaders->OptionalHeader.SizeOfImage, MEM_RESERVE | MEM_COMMIT,
		PAGE_READWRITE);

	OptionalHeader = NtHeaders->OptionalHeader;

	::RtlCopyMemory(Base, Data, OptionalHeader.SizeOfHeaders);

	Sections = (PIMAGE_SECTION_HEADER)(NtHeaders + 1);
}

We look into SizeOfImage to know how much memory to allocate.

The size (in bytes) of the image, including all headers, as the image is loaded in memory. It must be a multiple of SectionAlignment.

We start by copying from the uncompressed data SizeOfHeaders.

The combined size of an MS-DOS stub, PE header, and section headers rounded up to a multiple of FileAlignment.

We also set some member variables for the Loader class so we can reference them easily.

Copying sections

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
void Loader::CopySections()
{
	for (auto i = 0; i < NtHeaders->FileHeader.NumberOfSections; i++)
	{
		char* address = GetVA(Sections[i].VirtualAddress);
		::RtlCopyMemory(address, Data + Sections[i].PointerToRawData, Sections[i].SizeOfRawData);
		const DWORD newProtect = GetMemoryProtectionForSection(Sections[i].Characteristics);
		DWORD oldProtect;
		VirtualProtect(address, Sections[i].SizeOfRawData, newProtect, &oldProtect);
	}
	DataDirectory = OptionalHeader.DataDirectory;
	ImportDescriptor = (PIMAGE_IMPORT_DESCRIPTOR)(Base + DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);
}

On a PE file, we have what is called a “relative virtual address”. These are meant to be offset from some place in memory. I am using the utility function GetVA to convert them into an address in our allocated memory.

We iterate through every section in the uncompressed data, copy the section’s data into our newly allocated memory and then set the appropriate memory protection for that section. We can retrieve from memory protection from the Characteristics.

We set some more member variables for later usage by our loader. All of these fields will either be explained later or are explained in the MSDN page I put in references at the end of the article.

Resolving imports

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
void Loader::ResolveImports()
{
	auto importDescriptor = ImportDescriptor;
	while (importDescriptor->OriginalFirstThunk)
	{
		const auto libraryName = Base + importDescriptor->Name;

		auto library = ::LoadLibraryA(libraryName);

		auto originalThunk = (PIMAGE_THUNK_DATA)(GetVA(importDescriptor->OriginalFirstThunk));
		auto thunk = (PIMAGE_THUNK_DATA)(GetVA(importDescriptor->FirstThunk));
		while (originalThunk->u1.AddressOfData)
		{
			DWORD resolvedFunction;

			const auto address = originalThunk->u1.AddressOfData;

			if (originalThunk->u1.Ordinal == IMAGE_ORDINAL_FLAG)
			{
				resolvedFunction = (DWORD_PTR)::GetProcAddress(library, (LPCSTR)address);
			}
			else
			{
				auto importName = (PIMAGE_IMPORT_BY_NAME)(GetVA(address));
				resolvedFunction = (DWORD_PTR)::GetProcAddress(library, importName->Name);
			}
			
			DWORD oldProtect;
			::VirtualProtect(&thunk->u1.Function, sizeof(char*), PAGE_READWRITE, &oldProtect);
			thunk->u1.Function = resolvedFunction;
			::VirtualProtect(&thunk->u1.Function, sizeof(char*), oldProtect, &oldProtect);
			originalThunk++;
			thunk++;
		}
		importDescriptor++;
	}
}

Almost every executable makes usage of functions located in a DLL. These are functions are called via the means of an Import Address Table (IAT). This table is filled by the PE loader when the program runs. Since we are manually loading the executable, we will do that ourselves.

Resolving imports is not a very simple process, and I am skipping some of the edge cases of import loading in this code

At the import directory we have this structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;            // 0 for terminating null import descriptor
        DWORD   OriginalFirstThunk;         // RVA to original unbound IAT (PIMAGE_THUNK_DATA)
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;                  // 0 if not bound,
                                            // -1 if bound, and real date\time stamp
                                            //     in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (new BIND)
                                            // O.W. date/time stamp of DLL bound to (Old BIND)

    DWORD   ForwarderChain;                 // -1 if no forwarders
    DWORD   Name;
    DWORD   FirstThunk;                     // RVA to IAT (if bound this IAT has actual addresses)
} IMAGE_IMPORT_DESCRIPTOR;

Name contains the library’s name. We can make use of the Windows API LoadLibrary function to retrieve a handle to said library.

Within each entry of IMAGE_IMPORT_DESCRIPTOR, we have an array of OriginalFirstThunk and an array of FirstThunk.

FirstThunk points to our IAT that will be filled with function pointers. OriginalFirstThunk points to a Import Lookup Table which in turn points to a hint table.

typedef struct _IMAGE_THUNK_DATA32 {
    union {
        DWORD ForwarderString;      // PBYTE 
        DWORD Function;             // PDWORD
        DWORD Ordinal;
        DWORD AddressOfData;        // PIMAGE_IMPORT_BY_NAME
    } u1;
} IMAGE_THUNK_DATA32;
typedef IMAGE_THUNK_DATA32 * PIMAGE_THUNK_DATA32;

We check u1 union for the IMAGE_ORDINAL_FLAG. This lets us know whether we want to import the function by name or ordinal. GetProcAddress can be used to retrieve the address for each function and then fill the IAT with each requested function.

The hint table will contain the name or ordinal of the current function. All functions for a module are loaded when AddressOfData is 0 and we are done with the imports when OriginalFirstThunk is 0.

Relocations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void Loader::DoRelocations()
{
	auto relocationTable = (PIMAGE_BASE_RELOCATION)GetVA(DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);
	auto delta = (DWORD_PTR)Base - OptionalHeader.ImageBase;

	while (relocationTable->VirtualAddress)
	{
		auto size = (relocationTable->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / 2;
		auto entry = (WORD*)(relocationTable + 1);
		while (size--)
		{
			int type = *entry >> 12;
			int offset = *entry & 0x0fff;
			const auto relocationPtr = (DWORD_PTR*)(Base + relocationTable->VirtualAddress + offset);
			if (type == IMAGE_REL_BASED_HIGHLOW)
			{
				DWORD oldProtect;
				::VirtualProtect(relocationPtr, sizeof(char*), PAGE_READWRITE, &oldProtect);
				*relocationPtr += delta;
				::VirtualProtect(relocationPtr, sizeof(char*), oldProtect, &oldProtect);
			}
			entry++;
		}
		relocationTable = (IMAGE_BASE_RELOCATION*)((DWORD_PTR)relocationTable + relocationTable->SizeOfBlock);
	}
}

PE contain a .reloc section that is used for adjusting addresses depending on the base address of the module.

Here is what it looks like:

1
2
3
4
5
6
typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
//  WORD    TypeOffset[1];
} IMAGE_BASE_RELOCATION;
typedef IMAGE_BASE_RELOCATION UNALIGNED * PIMAGE_BASE_RELOCATION;

For each entry, we want to add the current base address of the module minus the ImageBase. This fixes the addresses that were set with the ImageBase in mind.

The preferred address of the first byte of image when loaded into memory; must be a multiple of 64 K. The default for DLLs is 0x10000000. The default for Windows CE EXEs is 0x00010000. The default for Windows NT, Windows 2000, Windows XP, Windows 95, Windows 98, and Windows Me is 0x00400000.

Running the payload

1
2
3
4
5
6
7
8
9
void Loader::Run()
{
	void* entry = GetVA(OptionalHeader.AddressOfEntryPoint);
	HANDLE hThread = ::CreateThread(nullptr, 0, (LPTHREAD_START_ROUTINE)entry, nullptr, 0, nullptr);
	if (hThread)
	{
		::WaitForSingleObject(hThread, INFINITE);
	}
}

We can read the executable’s entry point at AddressOfEntryPoint and use CreateThread with that. That is really all we need now that we have manually loaded it into memory.

We wait for that thread to finish since we don’t want our stub to exit before the payload finishes executing.

Writing the packer

1
2
3
4
5
6
originalFileData = new char[originalFileSize];

::ReadFile(hOriginalFile, originalFileData, originalFileSize, &bytesRead, nullptr);

CompressionHelper Compressor{ COMPRESS_ALGORITHM_LZMS };
Compressor.CompressData((BYTE*)originalFileData, originalFileSize);

Not much to see here, we read the executable we want to pack and then compress using our helper class.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const auto newSectionSize = Compressor.CompressedDataSize;
const auto packedFileSize = stubFileSize + newSectionSize;

packedFileData = new char[packedFileSize];

::RtlCopyMemory(packedFileData, stubFileData, stubFileSize);

const auto dosHeader = (PIMAGE_DOS_HEADER)packedFileData;

const auto ntHeaders = (PIMAGE_NT_HEADERS)(packedFileData + dosHeader->e_lfanew);
const auto sections = (PIMAGE_SECTION_HEADER)(ntHeaders + 1);

const auto lastSection = &sections[ntHeaders->FileHeader.NumberOfSections - 1];
const auto newSection = &sections[ntHeaders->FileHeader.NumberOfSections];

const auto fileAlignment = ntHeaders->OptionalHeader.FileAlignment;
const auto sectionAlignment = ntHeaders->OptionalHeader.SectionAlignment;

::RtlZeroMemory(newSection, sizeof(IMAGE_SECTION_HEADER));

::RtlCopyMemory(newSection->Name, ".packed", 7);

newSection->Misc.VirtualSize = align(newSectionSize, sectionAlignment, 0);
newSection->VirtualAddress = align(lastSection->Misc.VirtualSize, sectionAlignment, lastSection->VirtualAddress);
newSection->SizeOfRawData = newSectionSize;
newSection->PointerToRawData = align(lastSection->SizeOfRawData, fileAlignment, lastSection->PointerToRawData);
newSection->Characteristics = IMAGE_SCN_MEM_READ;

ntHeaders->OptionalHeader.SizeOfImage = newSection->VirtualAddress + newSection->Misc.VirtualSize;
ntHeaders->FileHeader.NumberOfSections++;

::RtlZeroMemory(packedFileData + newSection->PointerToRawData, newSection->SizeOfRawData);
::RtlCopyMemory(packedFileData + newSection->PointerToRawData, Compressor.CompressedBuffer, Compressor.CompressedDataSize);

Now that we have the original executable compressed, we want to place it in a new section of a copy of the stub. Remember, the stub finds its payload by looking for a section .packed.

If you are wondering about the alignment, PE files require that data in memory be correctly aligned to a particular value. This is done for performance reasons.

We copy the payload data to the end of the new section header and increase the NumberOfSections to account for the new section.

Conclusion

If you are a red teamer or a malware analyst, you will eventually have to either write packers or reverse them, so I hope this will come in useful. Be sure to check the references for more information on software packing.

In addition, this only shows a very basic packer with compression. The payload is ran using manual loading but it could use RunPE or some other technique. I also don’t have any sort of anti analysis features, which means if I was trying to prevent people from reverse engineering, they could easily place a breakpoint after the data is uncompressed and dump the uncompressed payload to disk.

A packer used to prevent reverse engineering would instead encrypt the code and strings and have anti analysis features, but it’s left as an exercise to the reader :)

References

PE Format

Writing a PE packer – Part 2 : imports and relocations

Add a new PE section & Code inside of it

updatedupdated2023-05-122023-05-12