Reverse Engineering Emotet

During October 2020, Greece was targeted by an Emotet malware campaign. A significant number of emails containing Emotet malware was received by almost every organization in Greece. Based on ESET’s report during October 2020, Greece was the country with the largest contribution in Emotet botnet by approximately 18%. What caught our attention, is the infection rate and the level of credibility the related Emotet phishing emails had.

Figure 1 — Countries most target by Emotet during October 2020

We considered it as a great opportunity to enhance our reverse engineering skills by trying to fully analyze Emotet but also identify the operational activities of this specific campaign. In this post we will try to provide sufficient information, aiding the reader to better understand the mechanics but most importantly to provide a tutorial that can be used by other analysts. The findings and the assumptions are not always concrete, as such we are always happy to receive recommendations and comments. It should also be noted that most of this information is not new. The reader can find in this document, several references describing the same observations; however, this is an attempt to consolidate all the information and of course evaluate our findings against these reports.

Dropper

The initial attack vector is an email usually originating from a seemingly trusted sender (spoofed address) containing a URL pointing to a malicious a macro-enabled Word Document. The macro executes a simple PowerShell that downloads and executes the downloaded binary via WMI:

Figure 2 — Dropper PowerShell

In our case, the downloaded binary is a Visual Basic binary (annoying from an analyst point of view), however with some tools (e.g https://www.unpac.me/) the dropped malware can be easily unpacked. The unpacked malware is the actual Emotet payload.

General Characteristics of Emotet

1. Control Flow Flattening

Figure 3 — Control Flow Flattening of initial function

There are many articles describing the de-obfuscation process called “Control Flow Unflattening” like this and this and some of them, are specifically related to IDA Pro and HexRays Microcode. However, the estimated effort of research and development of an Unflattening tool was significantly larger than simply following the code blocks.

Figure 4 — Decompiled Control Flow Flattened Code

It should be noted that the protector in this specific sample, applies additional techniques in order to hinder the analysis. Some examples are junk code or opaque predicates. Opaque predicates are conditions where the results are well known and are not based on variables, therefore the control flow is predefined. Thankfully many decompilers can discard such tricks and provide the clear code. For example:

int a = 5
int b = 10
if ( a * 1 == b )
CODE1;
else
CODE2;

2. Encrypted Strings

Figure 5- XOR String Decryption Loop

3. Dynamic Import Resolution

Figure 6 — Import Hash Function

Initial execution

One of the first things the malware does is to collect all the DLL file names under the System32 folder. From that list, it randomly selects two names from which it constructs a path …\dllName1\dllName2.exe (e.g. …\comdlg32\ das.exe). This behavior has not been observed in other articles; in those cases, Emotet had a hardcoded list of keywords to generate the final path. It is assumed that in both cases, given that the name is predictable, it is easier for Emotet to delete older / existing versions and update accordingly. After constructing the path, Emotet copies itself to its destination.

It should be noted that the parent path can either be System32 or Local AppData. This is decided after checking whether the user is running with Administrator rights or not. The check takes place by trying to access Service Manager with full access rights; if it fails that means that the current user does not have Administrator rights.

Figure 7 — Target Path based on permissions

In both cases, Emotet changes the file modification time to sometime in the past going back 0x534E0000 ticks, to hinder disk forensics activities. However, this is not the only reason. After copying itself to the final destination, it executes a new instance of itself while the first one exits. There is a significant junction which is related to the 0x534E0000 offset. If the file modification date is less than that, then Emotet follows the “preparation” path otherwise follows the detonation path. In case the user has administrator rights, a new service is created where the service executable path is set based on the procedure described above

C&C Communication Preparation

Figure 8 — Decryption routine

Initially, the algorithm separates the first 8 bytes in a pair of 4byte sequence and XORs them in order to get the required size of the decryption buffer. For enhanced performance Emotet uses MMX registers for the XOR operations thus using of 16 bytes at once

Figure 9- Encrypted buffer
Figure 10 — Decrypted RSA key

The key is decoded and then Imported in the Crypto Context. Then, an AES128 key is randomly generated. This will serve as a session key for the encryption of all future communications between the victim and the C&C. The initial data exchange with the C&C will contain the AES128 key encrypted with the RSA1024 public key.

Figure 11 — Session Key Generation

Another important factor is the IP address list which is hardcoded in the binary without any protection. The extraction of the IP address list can be easily done with the use of a Proxy software such as Fiddler, or any other connection list monitoring application. The only thing required is to disallow all network connections thus forcing the malware to loop through the IP address list. The structure is very simple. For each sequence there are 4 bytes for the IP and two bytes for the port. In between there are 2 bytes of garbage.

Figure 12 — IP / Post list data

Now that the endpoints are defined, Emotet has to define the structure of the HTTP request. First, the URL is constructed randomly, based on the following algorithm. A 64-byte length array is generated, containing a sequence of characters using the charset range [a-z][A-Z][0–9]. The URL path contains two parts and follows the format http://ip:port/part1/part2. Each part can have up to 15 characters. A random generator is used to randomly select the characters from the array based on the GetTickCount as seed. An example URL is qvgu/Hev4vYg5c/avTD9Ic7FPO5wAu/. It should be noted that the output is WCHAR and not CHAR explaining the multiplications by 2.

Figure 13- URL Random Generator

Following the construction of URL, Emotet constructs the rest of the HTTP headers. Most of the parts are hardcoded in the binary as encrypted strings, except for the Referrer which is based on the C&C IP as well as the Boundary which is randomly constructed based on the previously mentioned algorithm (randomized characters).

Figure 14 — HTTP Headers

Communication Protocol

Request Packet

The request packets consists of three parts, the actual data, the intermediate layer and the core protocol data. Let’s begin with the actual data first. Most of the information comes from our observations therefore some of them might not be accurate. Initially, Emotet generates a packet with the following information:

struct DataToBeSent
{
int SizeOfVictimID;
char* VictimID;
int SystemDetails;
int SessionID;
int CampaignID;
int Return1000;
int ProcessListLength;
char* ProcessList;
int Unknown;
int PreviousModule;
int HashCurrentFileName;
};
  • SizeofVictimID: The length of the victim identifier
  • VictimID: The victim identifier is a concatenation of Computer Name with the Volume Serial Number of the victim’s drive. We have observed that the C&C can blacklist specific victim identifiers
  • SystemDetails: The collected system details are being encoded in an integer based on the following equation 100.000* System Type (1 = Workstation, 2 = Domain Controller, 3 = Server) + 1.000 * MajorNumber + 100 * Minor Version + CPU Architecture(0 = x86 , 9 = x64, 5 = ARM, 12 = ARM64, 6 = ITANIUM)
  • SessionID: The current user Session ID number
  • CampaignID: The current campaign id. In our case is 0x1343BE0 (20200416). As stated in Symantec article this could be the campaign date 16–04–2020
  • Return1000: A static number 1000 the purpose of which is unknown to us
  • ProcessListLength: The length of the list containing the currently running processes
  • ProcessList: The list of currently running processes. It should be noted that we did not observe that the C&C checks processes in order to detect tools and environments that can be used for malware analysis.
  • Unknown: An unknown value usually set to 4 in our case
  • PreviousModule: The id of the previously executed module
  • HashCurrentFileName: The Hash of the file name of the currently running Emotet instance

After compiling the list, Emotet proceeds in embedding the data to the intermediate packet which contains additional information. Data are compressed with an LZ algorithm and then encrypted by the randomly generated AES Key.

The structure of the intermediate packet is the following:

struct IntermediatePacket
{
int Type;
int DataLen;
int Data;
};
  • Type: is the request type sent to the C&C. In our cases it was always set to 1 (Welcome / Registration)
  • DataLen: the length of the Data
  • Data: the previously compressed data

Finally, Emotet constructs the final packet which is eventually encrypted with the session key and sent to the C&C. The structure is the following:

struct FinalPacket
{
char SessionKey[0x60];
char Hash[0x14];
int EncryptedData;
};

Response Packet
The response packet follows almost the same pattern as the request packet. The major difference is that the response packet contains the signature instead of the session key

struct ReceivedPackage
{
char Signature[0x60];
char Hash[0x14];
char* EncryptedData;
};
  • Signature: signature of the packet
  • Hash: the Hash of the decrypted data
  • EncryptedData: the encrypted data

Response Packet

The response packet follows almost the same pattern as the request packet. The major difference is that the response packet contains the signature instead of the session key.

struct ReceivedPackage
{
char Signature[0x60];
char Hash[0x14];
char* EncryptedData;
};
  • Signature: the signature of the packet for integrity verification
  • Hash: The SHA1 hash of the packet
  • EncryptedData: The encrypted data buffer

The parsing of ReceivePacket is very straightforward. Emotet allocates the required decrypted data buffer, then duplicates the global HashObject inheriting its settings which is then passed later to the decryption cryptographic functions. By using CryptDecrypt with a HashObject it is possible to calculate transparently the hash of the decrypted data. The hash is then verified from the CryptVerifySignature against the signature that came with the packet.

Figure 15 — Decryption Pseudocode

As in the request packet, the received packet has an intermediate layer before reaching to the actual data. Based on the code and our observation, the responses of the C&C can be single modules either EXE or DLLs. However, it seems that Emotet is capable of responding with a list of modules instead of sending only one at a time. Thus, we will describe the intermediate layer with some uncertainty because our sample was always receiving one module. However, our observations seem to agree with what is described by Symantec.

struct DecryptedReceivedPacket
{
int TotalPacketSize;
int Packet1Size
char* Packet1
int Packet2Size
char* Packet2

int PacketNSize
char* PacketN
};

Therefore, the initial information is the total size of the packet which can contain numerous chunks. Each chunk contains the size of the chuck and the chunk data. The chunks are of the following structure:

struct ChunkData
{
int ID;
int ExecutionType
int Length
char* Data
};
  • ID: The ID of the module is a unique number that identifies the module to be loaded or executed. We will list the modules that we received from our sample
  • ExecutionType: The execution type of the downloaded module. The range of the values can range from 1 to 4. More details will be provided later
  • Length: the length of the buffer
  • Data: the actual data which are MZ/PE binaries.

Module Loading

  1. Download file, store it on disk and directly execute it
  2. 2. Download file, store it on disk. Check if the current user session running the Emotet instance is the active one and only then execute the executable.
  3. Instead of an executable this time the C&C sends a DLL which will be loaded by the victim. The victim calls the Entry Point of the DLL which is actually the DLLMain. It should be noted that the DLLMain function has an argument called Reason indicating why the DLL is called. The reason values range from 1 to 4 (1: DLL_PROCESS_ATTACH, 2: DLL_PROCESS_DETACH, 3: DLL_THREAD_ATTACH, 4: DLL_THREAD_DETACH). Emotet passes to DLLMain the value 0xA as such traditional DLL loading cannot take place. Another argument is passed in the DLLMain again in a nonstandard way (argument marked as reserved) which is used as a temporary file for storing the output of each DLL.
  4. This is an additional command which has not been seen in any of the referenced articles. This command is a variant of command #1, which performs the same steps with the addition of passing a command line argument on execution which is the current path of Emotet encoded with Base64.

Heaven’s Gate

The picture below is an example of the technique. One can identify Heaven’s gate by the call +5 instruction and the retf. The code below the red line cannot be interpreted correctly by IDA due to the fact that it is actually 64bit code while the rest of the code is 32bit. This is not solely an issue of IDA but also for other debuggers. Only Windbg is capable of dynamically switching between 32bit code to 64bit and vice versa. As such, Heaven’s gate is an excellent anti-analysis technique

Figure 16 — Heaven’s Gate

Modules Encryption

Figure 17 — Module decryption loop

Modules used

Figure 18 — Loaded modules overview
  • 0xA51 Updated Version of Emotet, Execution Type 4

This module was extremely interesting for several of reasons. First of all, in contrast with the other ones is an Executable not a DLL. In all of the referenced articles describing recent campaigns, this behavior was not mentioned. Secondly, it contains multiple layers of encrypted executable code until running the actual one. The module is heavily based on MFC and ATL function calls, most probably for hiding itself.

Figure 18 — MFC & ATL calls
Figure 19 — Loading, Decrypt, Execute Shellcode

Without having interesting strings and imports we made a dynamic analysis of the module. Virustotal and APP.ANY.RUN indicated that this is actually Emotet. Since there were no suspicious imports, we suspected that there should be an encrypted Resource that is dynamically loaded by the module. By using PE Bear we identified a suspicious resource with significantly high entropy (7,9) and high file-ratio (14,89%):

Figure 20— Suspicious resource

Based on the code below, the module loads the resources and allocates the necessary size to store the decryption of the resource. The decryption is a simple XOR based loop. After the decryption, the code is directly called.

Figure 21 — Load, Decrypt, Execute resource

Interestingly, the decrypted shellcode contained not one but two embedded MZ (EXE) files. The first executable executes the second. Once we dumped the shellcode in IDA pro during the entry one, it could be clearly seen that the shellcode loads dynamically and executes the first executable. The first executable has a parsing function looking for MZ/PE headers which then dynamically loads the second one which is the updated Emotet version.

Figure 22 — Parsing routine of second executable
Figure 23 — Offset of two executables

Figure 23 — Comparison of Graph Overviews

Now it is possible to analyze the core executable. Upon loading to IDA pro, it was clearly visible that the binary had implemented control flow flattening. By looking at the graph overview it was possible to find some very close resemblance with the graph overview of our initial Emotet binary. In order to confirm our suspicion that this is also an Emotet binary, we loaded it in a sandbox as well as in VirusTotal. Both indicated that was indeed Emotet.

Figure 24 — Comparison of Graph Overviews

To conclude to a concreate result we used Diaphora (a binary diffing plugin) to confirm whether the binaries have similarities. Apparently 62 out of 105 functions were identical and 8 where partially matched which means 70 out of 105 function were related to our first sample. Finally, we identified that this sample has the same hardcoded campaign ID 0x1343BE0 (20200416).

Figure 26 — Partially matched functions
Figure 27 — Snippet of matched functions
  • 0xA53 Nirsoft MailPassView, Execution Type 3

This is a well-known tool by Nirsoft which is used as password recovery for email clients. The module is protected by XOR decryption.The decrypted module can be identified by the PDB reference

Figure 28— Mail Pass View PDB
  • 0xA55 Nirsoft WebPassView, Execution Type 3

This is a well-known tool by Nirsoft which is used as password recovery for browsers. The module is protected by XOR decryption. The decrypted module can be identified by the PDB reference.

Figure 29— Web Pass View PDB
  • 0xAC7 Network Spreader — Username Enumeration, Execution Type 3

The module is responsible for spreading itself via admin network shares. First, Emotet decrypts a list of passwords (exactly 10.000 in total). This list will be used to brute force the targeted accounts

Figure 30— Stored Passwords

Then it looks for shared resources by checking whether the found resource (NetEnumResource) has different computer name than itself (remote host). After collecting the network resources, Emotet tries to connect to \\IPC$ with null session in order to get the list of users at the selected endpoint.

Once the list of users is gathered, a connection with the ADMIN$ or C$ is attempted in order to get access (using the aforementioned credentials) and copy the module from the local victim PC to another. If the operation is successful, it means that there is administrative access to the remote PC. Therefore, Emotet accesses the Service Manager, creates a new service on the remote system and then starts the services.

Figure 31 — Spreading Flow
  • 0xA57 Network Spreader — Username Brute Force, Execution Type 3

This module is the exact same module as the previously mentioned 0xAC7 with the main difference that it does not enumerate the user list via null session on IPC$. Instead, it uses a hardcoded encrypted list of usernames which then brute-forces in order to get administrative access on the remote PC.

Figure 32— Username List
  • 0xAAC UPNP — C&C Proxy, Execution Type 3

This module is actually a UPNP implementation. Initially it was assumed that this module is responsible for opening ports for additional communication with the C&C server. However, further reading indicated that this module can also be used for making the victim a proxy to the core C&C network. This trick adds an additional tier level in the botnet increasing its resiliency. No further analysis on this module took place. The module can be easily identified by the following strings.

Figure 33 — UPNP Strings
  • 0xA5C Outlook Email Harvester, Execution Type 3

This email harvester is also a very interesting module in our opinion, due to the fact that it is responsible for having so many victims in the current campaign. The module is responsible for stealing emails from a predefined time period. Usually, spam and phishing emails generate increased suspicion when written in English. On the contrary, emails in Greek language which are based on a legitimate email chain and being sent (or appear to be sent) by seemingly (spoofed) trusted persons decrease the suspicion tremendously.

The module contains two encrypted DLLs, one being the 32bit version of the harvester and the other being the 64bit version. In order to utilize the 64bit version of the module and inject it into 64bit processes, the Heavens Gate technique is used as previously described. The harvester can be identified by accessing the HKLM\Software\Clients\Mail\Microsoft Outlook and querying the value DllPathEx to find the msmapi32.dll. Additional details can be found in the excellent article of Kryptos Logic.

  • 0xA5B Outlook Contact Harvester, Execution Type 3

An additional protected module was observed, again targeting Outlook which is used for email contact list harvesting. The module was not further analyzed. It does however access the HKLM\Software\Clients\Mail\Microsoft Outlook and queries the value DllPathEx to find the msmapi32.dll, therefore its functionality is an assumption.

IOCs

  • SHA256: 439ce9da581f53e475d5d74cdb9c487068fde473073f96f1f978b949e3fcfc86

Nirsoft MailPass View

  • SSDeep Hash: 3072:0X940Z4hbL8wCyQOhIhggS/FBqALovD0NSxD7I333333p7vLp8Fbs:0XbZ+ZnKhggSdBqJL0k70v84
  • SHA256: 8cbc23db751d362a76d3b14d7bb60daa15c29e0e885824e5637b17147f2d012b

Nirsoft Web Browser Pass View

  • SSDeep Hash: 6144:ogvurXVCkt/HskgK+SVYzoVJehG9BqLd+lcQIyfE4Qy+eHhKpvQ3JjLQVkIDiUp:ogvuuLK+SVYExqLMIyfuUHLBdIOUp
  • SHA256: daf537aedc95a282ddcb9d01c8cfe7be560e30f4523fea5bbf1468e7eecd4de2

Network Spreader — Username Enumeration

  • SSDeep Hash: 3072:YHFLJF6boQT1fXA6BYwJadO8iseccMVY5MXW:uOboiRXAiENVi
  • SHA256: f8dd847ab1565aa460875c782f44a003a5b2c20b0e76a6672cfe3cd952a38727

Network Spreader — Username Brute Force

  • SSDeep Hash: 768:cRbk3TZDIuDoV0vyD2qscmszWUC9mt/PaWDZiMM2:c5AgV0wAoqUCGPDZtJ
  • SHA256: d714eca8a485b86cfa40dacfdedcd164877bf9d0e0d704ea325718b14498ba02

UPNP — C&C Proxy

  • SSDeep Hash: 6144:rmy8mkTtkzVj20+uoSZfwpkoRkTklFf1mD+ScWNcVjF:rmy8m4wj20+uPZf3oR1lR1mD+FWNyjF
  • SHA256: 6588f6f29061cd324f0ce6edf2bce7343da33bb61e119464c5b54bbb9fe009db

Outlook Mail Harvester

  • SSDeep Hash: 3072:wV3gkWjOBhDwowHLqfkjyn4e2Hkblr7sZ+U2tEzH6UviZeEU3vS4s:2vDNwHLqfkjm4e2er7Q2tEjhaZeEIvS
  • SHA256: cee571c9a8fe2761f8aaae53486ad0df3aa2e4f27dad84860ab8e96abe84ea26

Outlook Contact Stealer

  • SSDeep Hash: 6144:TZ3ZLDE/sA8q4C+POFSXczseQXpztygndzj2TukOn:qYFSezKTX
  • SHA256: ccdcaa285a390e9836868b11e1754edd7d4be85ca7a80c0ec83b34845cbc7db3

Emotet Updated Version

  • SSDeep Hash: 6144:m2GhItcEo6vJE63bvxNsMBUEaBcM5aOXq2pM1v+W++ScHEEPDG+XM3IcGYm0:QycE1vxIWmaOXq2pa+W++GEPDQGYm0
  • SHA256: 450a39df634cc102c3319bf3be1b579b3c1c2d5818d24b603bae72ecdc66f836