What even is a packer
Once upon a time, we didn’t have high speed network connections which meant downloading software could take a long time. This lead to the creation of so called “software packers”, software which would take a executable and compress it into a self extracting executable.
Probably the most famous one, UPX is open source and available at GitHub.
Nowadays packers are mostly used for a different purpose, namely making it harder to statically reverse engineer some code. They will also often encrypt and/or compress the code rather than just compressing. Naturally, they are popular for malware as they can help prevent static detection mechanisms. When a packer encrypts code rather than compress it’s often called a “crypter”.
Commercial protectors will also optionally use packing to further protect the code.
Components of a packer
Packers have two components:
Packer
This is the software that will receive an executable as input and then do some operation on it, such as compression and encryption. It will then take the “packed” form of the original executable, join it with the stub (explained next) and create the final self unpacking executable.
Stub
This is pre-written code that will be joined with the compressed/encrypted payload to create the final packed executable.
The stub can do several things:
- Perform the unpacking process
- Anti debugging functionality
- Run the payload via unpacking to disk or (more common for malware) performing process injection
For packers that are intended to prevent malware detection, there are even services that will provide custom stubs on a subscription or per stub basis. This is because if a stub is in common use by malware, it will be detected sooner rather than later.
Designing a packer
No better way to learn than by doing. So today we will write our very own software packer. We won’t do anything too complex but you will need C knowledge and some Win32 API knowledge.
For writing the stub in particular, we don’t want to have any external dependencies as this needs to be as small as possible.
Here is how it will be designed:
- Only Windows API and MSVCRT will be used to keep size small
- The packer will receive a executable and compress the code using the LZMS algorithm
- The payload will be embedded into a copy of the stub on a new PE section called “.packed”
- The payload will be decompressed and run using manual PE loading technique (more on this in a bit)
We can design a packer in many different ways, I chose to do it like this since it’s simple while remaining interesting.
Writing a stub
Decompressing executable stored in .packed section
This is the simplest part, we will the executable’s own sections to find one named “.packed”, copy it into a buffer we can modify and decompress the data.
For simplicity’s sake I will omit the “CompressionHelper” class I wrote but it’s available on the repository.
|
|
None of this should be new if you’ve investigated the PE format before. The section containing our packed executable is the last one that the packer added to a copy of the stub.
We then pass that data to our loader which is the most complex part.
PE loading
The way a executable looks on disk and in memory is different. This means you can’t simply copy an executable into a buffer and then jump to its entry point.
To make a executable ready to be run, several tasks will be done by the Windows PE loader. But since we want to run the executable ourselves without it ever touching the disk, we will need to copy what the PE loader normally does entirely in memory.
The bare minimum we need to do to load an executable entire from memory is:
- Initialize the memory where we will load the executable
- Copy the PE sections into memory
- Resolve the program’s imports
- Perform relocations
Let’s walk through each of those parts. I’ve removed some of the code such as error handling for simplicity, but it can be found on GitHub.
Initializing memory
|
|
We look into SizeOfImage
to know how much memory to allocate.
The size (in bytes) of the image, including all headers, as the image is loaded in memory. It must be a multiple of SectionAlignment.
We start by copying from the uncompressed data SizeOfHeaders
.
The combined size of an MS-DOS stub, PE header, and section headers rounded up to a multiple of FileAlignment.
We also set some member variables for the Loader
class so we can reference them easily.
Copying sections
|
|
On a PE file, we have what is called a “relative virtual address”. These are meant to be offset from some place in memory. I am using the utility function GetVA
to convert them into an address in our allocated memory.
We iterate through every section in the uncompressed data, copy the section’s data into our newly allocated memory and then set the appropriate memory protection for that section. We can retrieve from memory protection from the Characteristics
.
We set some more member variables for later usage by our loader. All of these fields will either be explained later or are explained in the MSDN page I put in references at the end of the article.
Resolving imports
|
|
Almost every executable makes usage of functions located in a DLL. These are functions are called via the means of an Import Address Table
(IAT). This table is filled by the PE loader when the program runs. Since we are manually loading the executable, we will do that ourselves.
Resolving imports is not a very simple process, and I am skipping some of the edge cases of import loading in this code
At the import directory we have this structure:
|
|
Name
contains the library’s name. We can make use of the Windows API LoadLibrary
function to retrieve a handle to said library.
Within each entry of IMAGE_IMPORT_DESCRIPTOR
, we have an array of OriginalFirstThunk
and an array of FirstThunk
.
FirstThunk
points to our IAT that will be filled with function pointers. OriginalFirstThunk
points to a Import Lookup Table
which in turn points to a hint table.
typedef struct _IMAGE_THUNK_DATA32 {
union {
DWORD ForwarderString; // PBYTE
DWORD Function; // PDWORD
DWORD Ordinal;
DWORD AddressOfData; // PIMAGE_IMPORT_BY_NAME
} u1;
} IMAGE_THUNK_DATA32;
typedef IMAGE_THUNK_DATA32 * PIMAGE_THUNK_DATA32;
We check u1
union for the IMAGE_ORDINAL_FLAG
. This lets us know whether we want to import the function by name or ordinal. GetProcAddress
can be used to retrieve the address for each function and then fill the IAT with each requested function.
The hint table will contain the name or ordinal of the current function. All functions for a module are loaded when AddressOfData
is 0 and we are done with the imports when OriginalFirstThunk
is 0.
Relocations
|
|
PE contain a .reloc
section that is used for adjusting addresses depending on the base address of the module.
Here is what it looks like:
|
|
For each entry, we want to add the current base address of the module minus the ImageBase
. This fixes the addresses that were set with the ImageBase
in mind.
The preferred address of the first byte of image when loaded into memory; must be a multiple of 64 K. The default for DLLs is 0x10000000. The default for Windows CE EXEs is 0x00010000. The default for Windows NT, Windows 2000, Windows XP, Windows 95, Windows 98, and Windows Me is 0x00400000.
Running the payload
|
|
We can read the executable’s entry point at AddressOfEntryPoint
and use CreateThread
with that. That is really all we need now that we have manually loaded it into memory.
We wait for that thread to finish since we don’t want our stub to exit before the payload finishes executing.
Writing the packer
|
|
Not much to see here, we read the executable we want to pack and then compress using our helper class.
|
|
Now that we have the original executable compressed, we want to place it in a new section of a copy of the stub. Remember, the stub finds its payload by looking for a section .packed
.
If you are wondering about the alignment, PE files require that data in memory be correctly aligned to a particular value. This is done for performance reasons.
We copy the payload data to the end of the new section header and increase the NumberOfSections
to account for the new section.
Conclusion
If you are a red teamer or a malware analyst, you will eventually have to either write packers or reverse them, so I hope this will come in useful. Be sure to check the references for more information on software packing.
In addition, this only shows a very basic packer with compression. The payload is ran using manual loading but it could use RunPE or some other technique. I also don’t have any sort of anti analysis features, which means if I was trying to prevent people from reverse engineering, they could easily place a breakpoint after the data is uncompressed and dump the uncompressed payload to disk.
A packer used to prevent reverse engineering would instead encrypt the code and strings and have anti analysis features, but it’s left as an exercise to the reader :)