Reverse Engineering Emotet

__fastcall
18 min readJan 2, 2021

During October 2020, Greece was targeted by an Emotet malware campaign. A significant number of emails containing Emotet malware was received by almost every organization in Greece. Based on ESET’s report during October 2020, Greece was the country with the largest contribution in Emotet botnet by approximately 18%. What caught our attention, is the infection rate and the level of credibility the related Emotet phishing emails had.

Figure 1 — Countries most target by Emotet during October 2020

We considered it as a great opportunity to enhance our reverse engineering skills by trying to fully analyze Emotet but also identify the operational activities of this specific campaign. In this post we will try to provide sufficient information, aiding the reader to better understand the mechanics but most importantly to provide a tutorial that can be used by other analysts. The findings and the assumptions are not always concrete, as such we are always happy to receive recommendations and comments. It should also be noted that most of this information is not new. The reader can find in this document, several references describing the same observations; however, this is an attempt to consolidate all the information and of course evaluate our findings against these reports.

Dropper

In this section the initial dropper will be briefly analyzed without going into many details. We consider the analysis of the dropper trivial, as the core payload (Emotet) can be easily extracted with the use of any online sandbox.

The initial attack vector is an email usually originating from a seemingly trusted sender (spoofed address) containing a URL pointing to a malicious a macro-enabled Word Document. The macro executes a simple PowerShell that downloads and executes the downloaded binary via WMI:

Figure 2 — Dropper PowerShell

In our case, the downloaded binary is a Visual Basic binary (annoying from an analyst point of view), however with some tools (e.g https://www.unpac.me/) the dropped malware can be easily unpacked. The unpacked malware is the actual Emotet payload.

General Characteristics of Emotet

Before presenting Emotet operational capabilities and functions we will present some general characteristics of these campaigns.

1. Control Flow Flattening

Emotet is using an anti-analysis technique called Control Flow Flattening, in order to randomize the sequence of code blocks. It can be considered as a state machine, where each state represents a code block. The Control Flow Flattening can be easily recognized due to the numerous “switch” cases, as well as, the concentration of different flow paths. Usually, such protections are compiler based mostly using LLVM .

Figure 3 — Control Flow Flattening of initial function

There are many articles describing the de-obfuscation process called “Control Flow Unflattening” like this and this and some of them, are specifically related to IDA Pro and HexRays Microcode. However, the estimated effort of research and development of an Unflattening tool was significantly larger than simply following the code blocks.

Figure 4 — Decompiled Control Flow Flattened Code

It should be noted that the protector in this specific sample, applies additional techniques in order to hinder the analysis. Some examples are junk code or opaque predicates. Opaque predicates are conditions where the results are well known and are not based on variables, therefore the control flow is predefined. Thankfully many decompilers can discard such tricks and provide the clear code. For example:

int a = 5
int b = 10
if ( a * 1 == b )
CODE1;
else
CODE2;

2. Encrypted Strings

As it usually happens with malware, Emotet encrypts all strings. In this case the encryption / decryption routine is a simple XOR based algorithm. For each encrypted block, the first pair of 4-bytes are XORed together, to produce the size of the decrypted block. Then the buffer is dynamically allocated based on that size, decrypted by the XOR loop and finally is deallocated after its use.

Figure 5- XOR String Decryption Loop

3. Dynamic Import Resolution

It is normal for malware to dynamically resolve their imports to avoid straight forward detection. Emotet is using a hash-like function to generate an output which is then XORed by a specific value. It should be noted that the dynamic import routine is the same for all further downloaded modules, however each time the hash is XORed by a different value.

Figure 6 — Import Hash Function

Initial execution

In this section the operations of Emotet will be described. In general, the Greek Emotet case is almost similar with other cases of Emotet recently described in detail by Symantec, Deutsche Telekom and Luca Nagi in his article in Virus Bulletin.

One of the first things the malware does is to collect all the DLL file names under the System32 folder. From that list, it randomly selects two names from which it constructs a path …\dllName1\dllName2.exe (e.g. …\comdlg32\ das.exe). This behavior has not been observed in other articles; in those cases, Emotet had a hardcoded list of keywords to generate the final path. It is assumed that in both cases, given that the name is predictable, it is easier for Emotet to delete older / existing versions and update accordingly. After constructing the path, Emotet copies itself to its destination.

It should be noted that the parent path can either be System32 or Local AppData. This is decided after checking whether the user is running with Administrator rights or not. The check takes place by trying to access Service Manager with full access rights; if it fails that means that the current user does not have Administrator rights.

Figure 7 — Target Path based on permissions

In both cases, Emotet changes the file modification time to sometime in the past going back 0x534E0000 ticks, to hinder disk forensics activities. However, this is not the only reason. After copying itself to the final destination, it executes a new instance of itself while the first one exits. There is a significant junction which is related to the 0x534E0000 offset. If the file modification date is less than that, then Emotet follows the “preparation” path otherwise follows the detonation path. In case the user has administrator rights, a new service is created where the service executable path is set based on the procedure described above

C&C Communication Preparation

The most important step for the Command and Control communication establishment is the communication key generation. This knowledge is extremely important because it can be used by an analyst for further investigation of the exchanged traffic. Emotet has an RSA public key hardcoded in the binary (offset 0xD2D0) but in an encrypted form. The algorithm used for decrypting the key is again XOR based. It should be noted that the same decryption algorithm is also used for decrypting the additional downloaded modules which will be described later.

Figure 8 — Decryption routine

Initially, the algorithm separates the first 8 bytes in a pair of 4byte sequence and XORs them in order to get the required size of the decryption buffer. For enhanced performance Emotet uses MMX registers for the XOR operations thus using of 16 bytes at once

Figure 9- Encrypted buffer
Figure 10 — Decrypted RSA key

The key is decoded and then Imported in the Crypto Context. Then, an AES128 key is randomly generated. This will serve as a session key for the encryption of all future communications between the victim and the C&C. The initial data exchange with the C&C will contain the AES128 key encrypted with the RSA1024 public key.

Figure 11 — Session Key Generation

Another important factor is the IP address list which is hardcoded in the binary without any protection. The extraction of the IP address list can be easily done with the use of a Proxy software such as Fiddler, or any other connection list monitoring application. The only thing required is to disallow all network connections thus forcing the malware to loop through the IP address list. The structure is very simple. For each sequence there are 4 bytes for the IP and two bytes for the port. In between there are 2 bytes of garbage.

Figure 12 — IP / Post list data

Now that the endpoints are defined, Emotet has to define the structure of the HTTP request. First, the URL is constructed randomly, based on the following algorithm. A 64-byte length array is generated, containing a sequence of characters using the charset range [a-z][A-Z][0–9]. The URL path contains two parts and follows the format http://ip:port/part1/part2. Each part can have up to 15 characters. A random generator is used to randomly select the characters from the array based on the GetTickCount as seed. An example URL is qvgu/Hev4vYg5c/avTD9Ic7FPO5wAu/. It should be noted that the output is WCHAR and not CHAR explaining the multiplications by 2.

Figure 13- URL Random Generator

Following the construction of URL, Emotet constructs the rest of the HTTP headers. Most of the parts are hardcoded in the binary as encrypted strings, except for the Referrer which is based on the C&C IP as well as the Boundary which is randomly constructed based on the previously mentioned algorithm (randomized characters).

Figure 14 — HTTP Headers

Communication Protocol

In this section, the communication protocol and what kind of data are being sent and received will be described. In general, based on our observations the C&C sends commands to the victim which are solely related to the download and execution of modules. The modules can either be DLLs or EXEs. Another thing to note , is that the protocol has changed somewhere around late 2019 early 2020 from Google Protobuff to a custom implementation.

Request Packet

The request packets consists of three parts, the actual data, the intermediate layer and the core protocol data. Let’s begin with the actual data first. Most of the information comes from our observations therefore some of them might not be accurate. Initially, Emotet generates a packet with the following information:

struct DataToBeSent
{
int SizeOfVictimID;
char* VictimID;
int SystemDetails;
int SessionID;
int CampaignID;
int Return1000;
int ProcessListLength;
char* ProcessList;
int Unknown;
int PreviousModule;
int HashCurrentFileName;
};
  • SizeofVictimID: The length of the victim identifier
  • VictimID: The victim identifier is a concatenation of Computer Name with the Volume Serial Number of the victim’s drive. We have observed that the C&C can blacklist specific victim identifiers
  • SystemDetails: The collected system details are being encoded in an integer based on the following equation 100.000* System Type (1 = Workstation, 2 = Domain Controller, 3 = Server) + 1.000 * MajorNumber + 100 * Minor Version + CPU Architecture(0 = x86 , 9 = x64, 5 = ARM, 12 = ARM64, 6 = ITANIUM)
  • SessionID: The current user Session ID number
  • CampaignID: The current campaign id. In our case is 0x1343BE0 (20200416). As stated in Symantec article this could be the campaign date 16–04–2020
  • Return1000: A static number 1000 the purpose of which is unknown to us
  • ProcessListLength: The length of the list containing the currently running processes
  • ProcessList: The list of currently running processes. It should be noted that we did not observe that the C&C checks processes in order to detect tools and environments that can be used for malware analysis.
  • Unknown: An unknown value usually set to 4 in our case
  • PreviousModule: The id of the previously executed module
  • HashCurrentFileName: The Hash of the file name of the currently running Emotet instance

After compiling the list, Emotet proceeds in embedding the data to the intermediate packet which contains additional information. Data are compressed with an LZ algorithm and then encrypted by the randomly generated AES Key.

The structure of the intermediate packet is the following:

struct IntermediatePacket
{
int Type;
int DataLen;
int Data;
};
  • Type: is the request type sent to the C&C. In our cases it was always set to 1 (Welcome / Registration)
  • DataLen: the length of the Data
  • Data: the previously compressed data

Finally, Emotet constructs the final packet which is eventually encrypted with the session key and sent to the C&C. The structure is the following:

struct FinalPacket
{
char SessionKey[0x60];
char Hash[0x14];
int EncryptedData;
};

Response Packet
The response packet follows almost the same pattern as the request packet. The major difference is that the response packet contains the signature instead of the session key

struct ReceivedPackage
{
char Signature[0x60];
char Hash[0x14];
char* EncryptedData;
};
  • Signature: signature of the packet
  • Hash: the Hash of the decrypted data
  • EncryptedData: the encrypted data

Response Packet

The response packet follows almost the same pattern as the request packet. The major difference is that the response packet contains the signature instead of the session key.

struct ReceivedPackage
{
char Signature[0x60];
char Hash[0x14];
char* EncryptedData;
};
  • Signature: the signature of the packet for integrity verification
  • Hash: The SHA1 hash of the packet
  • EncryptedData: The encrypted data buffer

The parsing of ReceivePacket is very straightforward. Emotet allocates the required decrypted data buffer, then duplicates the global HashObject inheriting its settings which is then passed later to the decryption cryptographic functions. By using CryptDecrypt with a HashObject it is possible to calculate transparently the hash of the decrypted data. The hash is then verified from the CryptVerifySignature against the signature that came with the packet.

Figure 15 — Decryption Pseudocode

As in the request packet, the received packet has an intermediate layer before reaching to the actual data. Based on the code and our observation, the responses of the C&C can be single modules either EXE or DLLs. However, it seems that Emotet is capable of responding with a list of modules instead of sending only one at a time. Thus, we will describe the intermediate layer with some uncertainty because our sample was always receiving one module. However, our observations seem to agree with what is described by Symantec.

struct DecryptedReceivedPacket
{
int TotalPacketSize;
int Packet1Size
char* Packet1
int Packet2Size
char* Packet2

int PacketNSize
char* PacketN
};

Therefore, the initial information is the total size of the packet which can contain numerous chunks. Each chunk contains the size of the chuck and the chunk data. The chunks are of the following structure:

struct ChunkData
{
int ID;
int ExecutionType
int Length
char* Data
};
  • ID: The ID of the module is a unique number that identifies the module to be loaded or executed. We will list the modules that we received from our sample
  • ExecutionType: The execution type of the downloaded module. The range of the values can range from 1 to 4. More details will be provided later
  • Length: the length of the buffer
  • Data: the actual data which are MZ/PE binaries.

Module Loading

As described before each received commands can describe four types of execution:

  1. Download file, store it on disk and directly execute it
  2. 2. Download file, store it on disk. Check if the current user session running the Emotet instance is the active one and only then execute the executable.
  3. Instead of an executable this time the C&C sends a DLL which will be loaded by the victim. The victim calls the Entry Point of the DLL which is actually the DLLMain. It should be noted that the DLLMain function has an argument called Reason indicating why the DLL is called. The reason values range from 1 to 4 (1: DLL_PROCESS_ATTACH, 2: DLL_PROCESS_DETACH, 3: DLL_THREAD_ATTACH, 4: DLL_THREAD_DETACH). Emotet passes to DLLMain the value 0xA as such traditional DLL loading cannot take place. Another argument is passed in the DLLMain again in a nonstandard way (argument marked as reserved) which is used as a temporary file for storing the output of each DLL.
  4. This is an additional command which has not been seen in any of the referenced articles. This command is a variant of command #1, which performs the same steps with the addition of passing a command line argument on execution which is the current path of Emotet encoded with Base64.

Heaven’s Gate

A special reference to Heaven’s Gate should be given as it is a unique and very interesting technique. Emotet is a 32bit binary and therefore loaded DLLs or injected code can only be 32bit. In case 64bit modules need to be loaded from a 32bit process this is something that cannot be done directly. A technique called Heaven’s Gate (more information here and here) can solve the problem. The technique is based on code segments; by switching code segments it is possible to switch from 32bit to 64bit. Code segment 0x23 is for 32bit and 0x33 is for 64bit.

The picture below is an example of the technique. One can identify Heaven’s gate by the call +5 instruction and the retf. The code below the red line cannot be interpreted correctly by IDA due to the fact that it is actually 64bit code while the rest of the code is 32bit. This is not solely an issue of IDA but also for other debuggers. Only Windbg is capable of dynamically switching between 32bit code to 64bit and vice versa. As such, Heaven’s gate is an excellent anti-analysis technique

Figure 16 — Heaven’s Gate

Modules Encryption

Another thing that should be mentioned is that the download binaries can either be protected or not. Protected binaries are being decoded and loaded in memory by dynamic load (aka Process Hollowing). In such cases Emotet loads an instance of itself, unmaps memory, reallocates and resumes execution after setting up its registers (SetThreadContext). All imports and relocations are taking place dynamically from the code. All the downloaded modules use the same techniques as Emotet (i.e string encryption and dynamic import loading). Modules are decrypted by using an MMX XOR decryption routine (the same that was described previously for RSA key encryption).

Figure 17 — Module decryption loop

Modules used

In this section the list of eight modules used in the campaign will be presented. We did not observe any module related to a second stage malware such as Trickbot, Qakbot, Ryuk etc. Due to time limitation, analysis of the module was limited. Our aim was to identify whether Emotet spreads a secondary malware. The following modules could be captured

Figure 18 — Loaded modules overview
  • 0xA51 Updated Version of Emotet, Execution Type 4

This module was extremely interesting for several of reasons. First of all, in contrast with the other ones is an Executable not a DLL. In all of the referenced articles describing recent campaigns, this behavior was not mentioned. Secondly, it contains multiple layers of encrypted executable code until running the actual one. The module is heavily based on MFC and ATL function calls, most probably for hiding itself.

Figure 18 — MFC & ATL calls
Figure 19 — Loading, Decrypt, Execute Shellcode

Without having interesting strings and imports we made a dynamic analysis of the module. Virustotal and APP.ANY.RUN indicated that this is actually Emotet. Since there were no suspicious imports, we suspected that there should be an encrypted Resource that is dynamically loaded by the module. By using PE Bear we identified a suspicious resource with significantly high entropy (7,9) and high file-ratio (14,89%):

Figure 20— Suspicious resource

Based on the code below, the module loads the resources and allocates the necessary size to store the decryption of the resource. The decryption is a simple XOR based loop. After the decryption, the code is directly called.

Figure 21 — Load, Decrypt, Execute resource

Interestingly, the decrypted shellcode contained not one but two embedded MZ (EXE) files. The first executable executes the second. Once we dumped the shellcode in IDA pro during the entry one, it could be clearly seen that the shellcode loads dynamically and executes the first executable. The first executable has a parsing function looking for MZ/PE headers which then dynamically loads the second one which is the updated Emotet version.

Figure 22 — Parsing routine of second executable
Figure 23 — Offset of two executables

Figure 23 — Comparison of Graph Overviews

Now it is possible to analyze the core executable. Upon loading to IDA pro, it was clearly visible that the binary had implemented control flow flattening. By looking at the graph overview it was possible to find some very close resemblance with the graph overview of our initial Emotet binary. In order to confirm our suspicion that this is also an Emotet binary, we loaded it in a sandbox as well as in VirusTotal. Both indicated that was indeed Emotet.

Figure 24 — Comparison of Graph Overviews

To conclude to a concreate result we used Diaphora (a binary diffing plugin) to confirm whether the binaries have similarities. Apparently 62 out of 105 functions were identical and 8 where partially matched which means 70 out of 105 function were related to our first sample. Finally, we identified that this sample has the same hardcoded campaign ID 0x1343BE0 (20200416).

Figure 26 — Partially matched functions
Figure 27 — Snippet of matched functions
  • 0xA53 Nirsoft MailPassView, Execution Type 3

This is a well-known tool by Nirsoft which is used as password recovery for email clients. The module is protected by XOR decryption.The decrypted module can be identified by the PDB reference

Figure 28— Mail Pass View PDB
  • 0xA55 Nirsoft WebPassView, Execution Type 3

This is a well-known tool by Nirsoft which is used as password recovery for browsers. The module is protected by XOR decryption. The decrypted module can be identified by the PDB reference.

Figure 29— Web Pass View PDB
  • 0xAC7 Network Spreader — Username Enumeration, Execution Type 3

The module is responsible for spreading itself via admin network shares. First, Emotet decrypts a list of passwords (exactly 10.000 in total). This list will be used to brute force the targeted accounts

Figure 30— Stored Passwords

Then it looks for shared resources by checking whether the found resource (NetEnumResource) has different computer name than itself (remote host). After collecting the network resources, Emotet tries to connect to \\IPC$ with null session in order to get the list of users at the selected endpoint.

Once the list of users is gathered, a connection with the ADMIN$ or C$ is attempted in order to get access (using the aforementioned credentials) and copy the module from the local victim PC to another. If the operation is successful, it means that there is administrative access to the remote PC. Therefore, Emotet accesses the Service Manager, creates a new service on the remote system and then starts the services.

Figure 31 — Spreading Flow
  • 0xA57 Network Spreader — Username Brute Force, Execution Type 3

This module is the exact same module as the previously mentioned 0xAC7 with the main difference that it does not enumerate the user list via null session on IPC$. Instead, it uses a hardcoded encrypted list of usernames which then brute-forces in order to get administrative access on the remote PC.

Figure 32— Username List
  • 0xAAC UPNP — C&C Proxy, Execution Type 3

This module is actually a UPNP implementation. Initially it was assumed that this module is responsible for opening ports for additional communication with the C&C server. However, further reading indicated that this module can also be used for making the victim a proxy to the core C&C network. This trick adds an additional tier level in the botnet increasing its resiliency. No further analysis on this module took place. The module can be easily identified by the following strings.

Figure 33 — UPNP Strings
  • 0xA5C Outlook Email Harvester, Execution Type 3

This email harvester is also a very interesting module in our opinion, due to the fact that it is responsible for having so many victims in the current campaign. The module is responsible for stealing emails from a predefined time period. Usually, spam and phishing emails generate increased suspicion when written in English. On the contrary, emails in Greek language which are based on a legitimate email chain and being sent (or appear to be sent) by seemingly (spoofed) trusted persons decrease the suspicion tremendously.

The module contains two encrypted DLLs, one being the 32bit version of the harvester and the other being the 64bit version. In order to utilize the 64bit version of the module and inject it into 64bit processes, the Heavens Gate technique is used as previously described. The harvester can be identified by accessing the HKLM\Software\Clients\Mail\Microsoft Outlook and querying the value DllPathEx to find the msmapi32.dll. Additional details can be found in the excellent article of Kryptos Logic.

  • 0xA5B Outlook Contact Harvester, Execution Type 3

An additional protected module was observed, again targeting Outlook which is used for email contact list harvesting. The module was not further analyzed. It does however access the HKLM\Software\Clients\Mail\Microsoft Outlook and queries the value DllPathEx to find the msmapi32.dll, therefore its functionality is an assumption.

IOCs

Emotet Core Malware

  • SHA256: 439ce9da581f53e475d5d74cdb9c487068fde473073f96f1f978b949e3fcfc86

Nirsoft MailPass View

  • SSDeep Hash: 3072:0X940Z4hbL8wCyQOhIhggS/FBqALovD0NSxD7I333333p7vLp8Fbs:0XbZ+ZnKhggSdBqJL0k70v84
  • SHA256: 8cbc23db751d362a76d3b14d7bb60daa15c29e0e885824e5637b17147f2d012b

Nirsoft Web Browser Pass View

  • SSDeep Hash: 6144:ogvurXVCkt/HskgK+SVYzoVJehG9BqLd+lcQIyfE4Qy+eHhKpvQ3JjLQVkIDiUp:ogvuuLK+SVYExqLMIyfuUHLBdIOUp
  • SHA256: daf537aedc95a282ddcb9d01c8cfe7be560e30f4523fea5bbf1468e7eecd4de2

Network Spreader — Username Enumeration

  • SSDeep Hash: 3072:YHFLJF6boQT1fXA6BYwJadO8iseccMVY5MXW:uOboiRXAiENVi
  • SHA256: f8dd847ab1565aa460875c782f44a003a5b2c20b0e76a6672cfe3cd952a38727

Network Spreader — Username Brute Force

  • SSDeep Hash: 768:cRbk3TZDIuDoV0vyD2qscmszWUC9mt/PaWDZiMM2:c5AgV0wAoqUCGPDZtJ
  • SHA256: d714eca8a485b86cfa40dacfdedcd164877bf9d0e0d704ea325718b14498ba02

UPNP — C&C Proxy

  • SSDeep Hash: 6144:rmy8mkTtkzVj20+uoSZfwpkoRkTklFf1mD+ScWNcVjF:rmy8m4wj20+uPZf3oR1lR1mD+FWNyjF
  • SHA256: 6588f6f29061cd324f0ce6edf2bce7343da33bb61e119464c5b54bbb9fe009db

Outlook Mail Harvester

  • SSDeep Hash: 3072:wV3gkWjOBhDwowHLqfkjyn4e2Hkblr7sZ+U2tEzH6UviZeEU3vS4s:2vDNwHLqfkjm4e2er7Q2tEjhaZeEIvS
  • SHA256: cee571c9a8fe2761f8aaae53486ad0df3aa2e4f27dad84860ab8e96abe84ea26

Outlook Contact Stealer

  • SSDeep Hash: 6144:TZ3ZLDE/sA8q4C+POFSXczseQXpztygndzj2TukOn:qYFSezKTX
  • SHA256: ccdcaa285a390e9836868b11e1754edd7d4be85ca7a80c0ec83b34845cbc7db3

Emotet Updated Version

  • SSDeep Hash: 6144:m2GhItcEo6vJE63bvxNsMBUEaBcM5aOXq2pM1v+W++ScHEEPDG+XM3IcGYm0:QycE1vxIWmaOXq2pa+W++GEPDQGYm0
  • SHA256: 450a39df634cc102c3319bf3be1b579b3c1c2d5818d24b603bae72ecdc66f836

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response