Every now and then, some group innovates the Malware market, and it seems that the BabbleLoader developers are willing to do this, but not by discovering new evasion techniques, but rather by knowing how to use them to evade detection products that contain Machine Learning (AI).

This research will cover the following topics:

Threat Intelligence information (up to the time of publication of this research), that is, which Malware is responsible for delivering this Loader, and which family of Malware it is loading into memory;
Analysis of how BabbleLoader implements a certain technique, with the purpose of Evading Endpoint Protection Software with Machine Learning (AI);
Analysis of String Decryption and Hashing Algorithm;
Analysis of Techniques to Evade Endpoint Detection and Response Software Hooks;
Yara Rules for BabbleLoader.

Below is the SHA256 of the sample that will be analyzed in this research.

{
    "SHA256": "a08db4c7b7bacc2bacd1e9a0ac7fbb91306bf83c279582f5ac3570a90e8b0f87"
}

First, let’s try to understand who might be handing out BabbleLoader out there!

Threat Intelligence Information – Possible Attributions to Threat Actors

During the intelligence gathering process on the BabbleLoader threat, it was identified that the samples (SHA256 above) were delivered to victims through a C&C infrastructure, which is also used by the operators of Amadey. On Unpac.me, you can see Intelligence sources that indicate URLs where this sample was delivered.

Below, we can see the output of VirusTotal’s IP Address analysis (185[.]215[.]113[.]117), which allows us to identify the country (Seychelles) which is located in East Africa, and the Autonomous System being identified by ID 51381 and named 1337team Limited.

Continuing with the analysis of this IP address in VirusTotal, it is possible to observe several samples identified as malicious by VirusTotal, which carry out communications with this same IP address.

When analyzing one of these samples, we identified that it was a sample from Amadey.

Through Unpac.me, it was possible to create a visualization where we can observe the attribution of several samples, containing (almost all) the same Imphash and assigning them the same signature of the Amadey family, having as C&C IP address the same IP address that delivers BabbleLoader. This Pivot view I built is available on Unpac.me.

Through ThreatFox, it is possible to observe that the Autonomus System has been categorized as malicious and Amadey campaigns attributed to this infrastructure are being monitored.

And finally, through Shodan, we can identify that another IP address that is part of the same Autonomous System 1337TEAM LIMITED, is located in Russia.

With the information obtained during the analysis above, it is possible to state that BabbleLoader being delivered by Amadey, and having its infrastructure attributed to this malware family, we can state that BabbleLoader has its origins in Russian Threat Actors.

Reverse Engineering BabbleLoader’s Evasion Capabilities

Starting in this section, we will look at BabbleLoader’s Defense Evasion capabilities.

When we used Capa to collect screening information from the sample, a large number of capabilities were identified that matched the Capa rules, producing the image below, which allows us to observe the following capabilities:

Use of XOR operations for possible decoding of data, or strings;
Parsing PE files;
Stack Strings, possibly encrypted.

And believe me, the vast majority of the capabilities not mentioned above and present in the image come from BabbleLoader’s ability to contain a large amount of Junk Code, with several meaningless flows, unused strings, and which have the purpose of making it difficult for researchers or Endpoint Protection Software based on Machine Learning to analyze.

Anti-Analysis Techniques – The Diabolical use of junk code

The big innovation in the development of this sample seems to be the ability of each sample to have partially unique Junk Code blocks, according to Intezer’s post.

This is quite impressive, as it means that protections based on Machine Learning, that is, on learning the behavior of a given threat, can be evaded by the difference in the behavior pattern of Junk Codes. Below, it is possible to see strings that will never be used, and that (according to the Intezer post) are partially unique for each sample.

Another major impact of this capability is the difficulty researchers have in performing analyses on their samples. Below, we can see that IDA Freeware was unable to produce a pseudocode for the ‘main‘ function, identified after prior analysis by IDA.

And even using only the IDA Freeware Disassembler, some nodes are not resolved, making it difficult to understand what is happening, as we can see below.

Binary Ninja also has difficulty analyzing the same functions, however, it is possible to force the analysis and have the code content through the Disassembler and Decompiler available.

To understand the level of some of the Junk Code put in the sample, below is the macro view of the Main function, in which it is practically useless, as nothing happens most of the time, just a large flow of meaningless operations.

String Decryption, NtDLL Analysis and Manual Collection of API Function Addresses

Basically, from the beginning, BabbleLoader implements a long looping Junk Code stream. This long stream basically consists of moving data to some addresses in memory, and performing XOR operations where the results will always be zero. Below, you can see an example of this Junk Code flow.

Above we can see a large sequence of MOVs to a specific address, which will never be used, and below we can see the sequence of MOVs followed by an XOR operation in which the result will always be zero. Basically this is the Junk Code pattern present in this BabbleLoader sample.

Below you can see one of the implemented loops, which do not perform any operations, other than the pattern mentioned above.

After this sequence of Junk Codes, BabbleLoader finally begins its true execution, through the two functions highlighted below.

The first function has the following pattern:

Declaration of an array (implemented via Stack String) with encoded bytes;
Decode of the array bytes, through an XOR operation, using the initial XOR key 0x375b879a;
Collection of the Handle of the name of the DLL discovered after the decode above;
Manual PE Parsing.

In the Decompiler below, it is possible to observe the flow mentioned in a summarized manner above.

I made a diagram, with the aim of improving understanding of the string decode algorithm through an XOR operation, with a change in the XOR key each turn of the loop, multiplying the XOR key by 0x4f. That is, each byte in the encoded array is decoded using a different key.

I implemented this simple algorithm in Python to get the decoded string. Below is my implementation of the algorithm in Python.

def rorb(value, shift, bits=8):
    shift %= bits
    return ((value >> shift) | (value << (bits - shift))) & ((1 << bits) - 1)

def str_decryption(encrypted_data, xor_key):
    str_decrypted = []
    for i in range(len(encrypted_data)):
        raw_encrypted_data = encrypted_data[i] ^ (xor_key & 0xFF)
        rorb_encrypted_data = rorb(raw_encrypted_data, xor_key & 0xFF, bits=8)
        str_decrypted.append(rorb_encrypted_data)
        xor_key = (xor_key * 0x4F) & 0xFFFFFFFF
    return str_decrypted

encrypted_data_array = [0x23, 0x9b, 0xcb, 0xdd, 0xab, 0x8d, 0x4b, 0x5d, 0x2b, 0x86]
xor_key = 0x375b879a

str_decrypted = str_decryption(encrypted_data_array, xor_key)
decrypted_string = ''.join(chr(byte) for byte in str_decrypted)
print("\nString Decrypted:", decrypted_string)

When executed, the script output returned the string ntdll.dll.

Now let's move on to the second part of the function. So that we don't have to ask you to upload it, review it and memorize it, and much less have to put the print here again, below is the second half of the pseudocode of the function we are currently analyzing (sub_1400017b0). Let's analyze it next.

HMODULE rax_8 = GetModuleHandleA(&lpModuleName);

if (rax_8 == 0)
    return 0;

if (zx.d(rax_8->unused.w) != 0x5a4d)
    return 0;

void* rcx_5 = rax_8 + sx.q(rax_8->__offset(0x3c).d);

if (*rcx_5 != 0x4550)
    return 0;

void* rcx_8 = rax_8 + zx.q(*(rcx_5 + 0x88));

if (rcx_8 == 0)
    return 0;

arg1[4] = rax_8;
arg1[3].d = *(rcx_8 + 0x18);
arg1[1] = rax_8 + zx.q(*(rcx_8 + 0x20));
*arg1 = rax_8 + zx.q(*(rcx_8 + 0x1c));
arg1[2] = rax_8 + zx.q(*(rcx_8 + 0x24));

if (arg1[4] != 0 && arg1[3].d != 0 && arg1[1] != 0 && *arg1 != 0 && arg1[2] != 0)
    return 1;

The second half of the sub_1400017b0 function performs the NtDLL parsing process and stores some information in a specific Struct in memory, which will be used later. First, the function clearly identifies the presence of the DOS Header and the NT Header, manually accessing the _IMAGE_DOS_HEADER and _IMAGE_NT_HEADERS64 structures, in addition to other structures that we will observe in detail. Due to the compilation, disassemble and decompiling process, these structures can get lost and result in code that is initially confusing at first. But just follow the process of adding addresses, as we will do next.

Below we can see the result of accessing the MZ Header and PE Header, identified by accessing the first DWORD 0x5a4d (MZ) at the beginning of the NtDLL obtained by the GetModuleHandleA API, which collected a Handle (the memory address) of the NtDLL, followed by the information that is present 0x3c bytes from the offset where we collected the MZ Header (0x5a4d). 0x3c bytes after the MZ Header, we collected the address for the PE Header, which is at address 0xe8.

Below we can validate exactly the flow of the pseudocode logic of this second half of the sub_1400017b0 function, where we can observe exactly where the PE Header is located.

After validating the existence of the PE and MZ headers, the function will continue its NtDLL Parsing process, this time collecting the VirtualAddress object that is inside the IMAGE_DATA_DIRECTORY structure, through the _IMAGE_OPTIONAL_HEADER64 structure. The VirtualAddress object returns a DWORD that is the address of the NtDLL Exports Table, that is, the list of APIs. This entire process can be observed in the pseudocode, through the operation rcx_5 + 0x88, where rcx_5 is equal to the address of the PE header, that is, the real operation is 0xe8 + 0x88 which results in 0x170, which is the exact address of the VirtualAddress, represented in the image below by PE-Bear as Export Directory.

Upon reaching the NtDLL Export Table, the function will collect some information that will be stored in memory and used later as its own structure. This information is collected from the sequence of calculations present at the end of the function, and illustrated in the following image.

The structure that BabbleLoader assembles with this information contains information regarding the NtDLL Handle and information regarding the Functions (APIs) of the NtDLL Export Table. Below is a prototype of the structure.


struct _BabbleLoader_NtDLL_Parse
{
    DWORD** NtDLL_AddressOfFuntions;
    DWORD* NtDLL_AddressOfNames;
    DWORD* NtDLL_AddressOfNamesOrdinals;
    DWORD* NtDLL_NumberOfNames;
    HMODULE* NtDLL_Handler;
};

With all this information, we can restructure the pseudocode so that it more faithfully represents the way the developer implemented this function.

A Custom Hash Algorithm Implementation

Now that we have analyzed and understood the purpose of this function, let's move on to the next function, which receives as an argument the NtDLL structure that BabbleLoader creates with information regarding the NtDLL Export Table.

When we enter the sub_1400019b0 function, we can identify that there are seven calls to the sub_140001080 function, which receives four arguments, the first being Hashes of possible NtDLL APIs, and the second argument being a pointer to the previously created structure.

When we enter the sub_140001080 function, we can see that it is long and possibly performs some type of manipulation on structures and APIs manually, similar to what we saw in the analysis of the NtDLL export table collection function.

With the help of the structure we identified and created previously, it is possible to quickly identify that this first part of the sub_140001080 function creates a for loop through the entire NtDLL Export Table, and checks to identify whether the name of the API currently collected is equal to the Hash placed as an argument, through the sub_140001010 function.

When we enter the sub_140001010 function, we can identify that it is a custom hash algorithm.

The Python implementation of this custom hash algorithm is as follows.

def calculate_api_hash(api_name: str) -> str:
    final_hash = 0
    for char in api_name:
        char_orded = ord(char)
        final_hash = (final_hash + char_orded) * (char_orded + 0x4af1e366)
        final_hash &= 0xFFFFFFFF

    return hex(final_hash)

So, understanding that BabbleLoader at this stage is doing a for loop through the entire export table, collecting the name of each API and submitting it to its custom hash algorithm, and checking if the hash of the currently collected and submitted API matches the one it is looking for, I did the same thing through Python scripts. First, I extracted all the APIs from NtDLL and dumped them into a file, using the Python script below.

import pefile

def list_exported_apis(dll_path, output_file):
    try:
        pe = pefile.PE(dll_path)

        if not hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
            print("The DLL does not have an export table.")
            return

        with open(output_file, 'w') as f:
            f.write(f"Exported APIs from DLL '{dll_path}':\n")
            print(f"Exported APIs from DLL '{dll_path}':")

            for export in pe.DIRECTORY_ENTRY_EXPORT.symbols:
                if export.name:
                    api_name = export.name.decode('utf-8')
                    f.write(f"{api_name}\n")
                    print(api_name)
                else:
                    unnamed_api = f"Unnamed API (ordinal: {export.ordinal})"
                    f.write(f"{unnamed_api}\n")
                    print(unnamed_api)

        print(f"\nThe API names have been saved to the file: {output_file}")

    except FileNotFoundError:
        print(f"File '{dll_path}' not found.")
    except pefile.PEFormatError:
        print(f"The file '{dll_path}' is not a valid DLL or is corrupted.")
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    dll_path = r"C:\\Windows\\System32\\ntdll.dll"
    output_file = "api_hashes.txt"
    list_exported_apis(dll_path, output_file)

After that, I created another Python script to read each API from the file, subjected the API to the hashing algorithm I implemented in Python, and concatenated all the results into a single file.

import chardet

def calculate_api_hash(api_name: str) -> str:
    final_hash = 0
    for char in api_name:
        char_orded = ord(char)
        final_hash = (final_hash + char_orded) * (char_orded + 0x4af1e366)
        final_hash &= 0xFFFFFFFF

    return hex(final_hash)

def process_api_list_hashing(input_file: str, output_file: str) -> None:
    try:
        with open(input_file, 'rb') as infile:
            raw_data = infile.read()
            detected = chardet.detect(raw_data)
            encoding = detected['encoding']

        if not encoding:
            raise ValueError("Could not detect the file encoding.")

        with open(input_file, 'r', encoding=encoding) as infile:
            api_list = [line.strip() for line in infile if line.strip()]

        results = [f"'{api}': {calculate_api_hash(api)}" for api in api_list]

        with open(output_file, 'w', encoding='utf-8') as outfile:
            outfile.write('\n'.join(results) + '\n')

        print(f"Hashes calculated and saved to: {output_file}")

    except FileNotFoundError:
        print(f"Error: File {input_file} not found.")
    except Exception as e:
        print(f"Unexpected error: {e}")

if __name__ == "__main__":
    input_file = "C:\\Users\\0x0d4y\\Desktop\\ntdll_exports.txt"
    output_file = "C:\\Users\\0x0d4y\\Desktop\\api_hashes.txt"

    process_api_list_hashing(input_file, output_file)

Below is the initial piece of the created file, containing the 'API_Name': Hash.

And with a Find, I copied one of the hashes placed as arguments in the sub_1400019b0 function, and identified that this hash refers to the NtCreateSection API.

So with this process done, the hashes that BabbleLoader resolves at runtime and collects manually are as follows.

{
    "0x1abec790": "NtCreateSection"
    "0x993c0058": "NtMapViewOfSection"
    "0x92263458": "NtUnmapViewOfSection"
    "0x9da1d253": "NtClose"
    "0x6af3f390": "NTQuerySystemInformation"
    "0xa96ab0e4": "RtlAllocateHeap"
    "0x8a21a480": "RtlFreeHeap"
}

After this discovery, I sent a Pull Request to HashDB, and now this Hash is part of their database, being available for HashDB Plugins for Binary Ninja, IDA and Ghidra.

Evasion of Endpoint Detection and Response Software Through Halo's Gate

After finding the API that matches a given hash, the sub_14001080 function starts a whole checking process, in which it is not possible to demonstrate the entire pseudocode in a printout. Therefore, we will analyze it in parts below.

First, it is important to note how the arg3 variable is used as a custom structure, where it collects information and stores it. For example, in the code before the hash algorithm function call, it stores the Hash that will be tested in position arg3[1], and the address of the API function (rax_15) in arg3 + 8. In other words, the hash would be the second position being a DWORD, and the address of the API function would be in the next position also as a DWORD.

arg3[1] = api_hash

for (int64_t i = 0; i u< zx.q(ntdll_module->NtDLL_NumberOfNames.d); i += 1)
    void* api_name =
        zx.q(ntdll_module->NtDLL_AddressOfNames[i]) + ntdll_module->NtDLL_Handler
    void* rax_15 = zx.q(*(ntdll_module->NtDLL_AddressOfFuntions
        + (zx.q(*(ntdll_module->NtDLL_AddressOfNamesOrdinals + (i << 1))) << 2)))
        + ntdll_module->NtDLL_Handler
    *(arg3 + 8) = rax_15

After executing the hash algorithm function, if the fourth argument is different from 0, the code checks to see if these two positions in the structure have content.

if (babbleloader_hashing_algorithm(api_name) == api_hash)
    if (arg4 != 0)
        if (*(arg3 + 8) != 0 && zx.q(arg3[1]) != 0)
            return 1

Going by the flow, the following code may seem confusing, with lots of calculations and hexadecimal numbers, but it is the implementation of Halo's Gate, with the goal of evading EDRs and other types of Endpoint Protection Softwares.

if (zx.d(*rax_15) != 0x4c || zx.d(*(rax_15 + 1)) != 0x8b
 		|| zx.d(*(rax_15 + 2)) != 0xd1 || zx.d(*(rax_15 + 3)) != 0xb8 || zx.d(*(rax_15 + 6)) != 0 || zx.d(*(rax_15 + 7)) != 0)
    if (zx.d(*rax_15) == 0xe9)
        int16_t var_48_1 = 1
        
        while (zx.d(var_48_1) s<= 0x1f4)
            if (zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20))) == 0x4c && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 1))
                    == 0x8b && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 2)) == 0xd1 && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 3))
                    == 0xb8 && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 6)) == 0 && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 7)) == 0)
                *arg3 = zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 5)) << 8 | (zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0x20) + 4)) - zx.d(var_48_1))
                break
            
            if (zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0))) == 0x4c &&  zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 1)) == 0x8b
                    && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 2)) == 0xd1 && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 3)) == 0xb8
                    && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 6)) == 0 && zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 7)) == 0)
                *arg3 = zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 5)) << 8 | (zx.d(*(rax_15 + sx.q(zx.d(var_48_1) * 0xffffffe0) + 4))
                    + zx.d(var_48_1))
                break
            var_48_1 += 1
    
    if (zx.d(*(rax_15 + 3)) == 0xe9)
        int16_t var_44_1 = 1
        
        while (zx.d(var_44_1) s<= 0x1f4)
            if (zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20))) == 0x4c && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 1)) == 0x8b
                    && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 2)) 0xd1 && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 3)) == 0xb8
                    && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 6)) == 0 && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 7)) == 0)
                *arg3 = zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 5)) << 8 | (zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 4)) - zx.d(var_44_1))
                    break
            
            if (zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0))) == 0x4c && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 1)) == 0x8b
                    && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 2)) == 0xd1 && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 3)) == 0xb8
                    && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 6)) == 0 && zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 7)) == 0)
                *arg3 = zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 5)) << 8 | ( zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0xffffffe0) + 4))
                    + zx.d(var_44_1))
                break
            var_44_1 += 1
else
    *arg3 = zx.d(*(rax_15 + 5)) << 8 | zx.d(*(rax_15 + 4))
break

I won't go into detail about how Halo's Gate works, as there are excellent and comprehensive materials online that have already done this work, such as Alice Climent-Pommeret's. I will just give a basic overview, about identifying that it is in fact an implementation of Halo's Gate.

Halo's Gate is a kind of patch of the Hell's Gate technique. Basically, both techniques have the purpose of identifying the Syscall Stub that is not Hooked, by identifying each standard opcode for the stub. They are:

0x4c 0x8b 0xd1 0xb8 eax syscall_id 0x00 0x00
// In other words:
mov r10,rcx //		 0x4c 0x8b 0xd1
mov eax, SyscallID // 0xb8 eax syscall_SSN 0x00 0x00

And this check is exactly what we see in the previous pseudocode, where there is a large loop that checks for the existence of these bytes in this position. Why? Because if they are not exactly in the position indicated in the pseudocode, and in their place there is 0xe9 (opcode jmp, that is, an unconditional jump), it means that the function is Hooked.

What Halo's Gate does, unlike Hell's Gate, is implement an algorithm that checks the Syscall IDs (System Service Numbers - SSN) of APIs that are not Hooked in the neighborhood of the target API. Why? Since the Syscall IDs are organized in order, that is, by identifying the neighboring non-Hooked Syscall IDs, it is possible to calculate what the Syscall ID of the target API is and, therefore, execute it without falling into the unconditional Jump (0xe9) defined by the EDRs. We were able to identify this in the previous snippet of pseudocode.

Below we can see a practical example, where we can see the incremental order of the Syscalls.

That is, the Syscalls being ordered, the Halo's Gate algorithm allows the search for Syscalls with intact Stub above and below the Hooked Syscall.

The entire loop implemented by the Halo's Gate algorithm can be illustrated as follows.

It is also interesting to note that arg3 is again used here to store the Syscall IDs. However, the pseudocode does not understand that it is storing it in any position in the structure, which makes us believe that it is storing the Syscall ID in position arg3[0].

*arg3 = zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 5)) << 8 | (zx.d(*(rax_15 + sx.q(zx.d(var_44_1) * 0x20) + 4)) - zx.d(var_44_1))

After that, the function collects more information and stores it in a new position in the structure, arg[3], ending with the process of checking whether the entire content of the structure is filled and not zero.

if (*(arg3 + 8) == 0)
    return 0
int64_t rax_191 = *(arg3 + 8) + 0xff
int32_t i_1 = 0
int32_t var_28_1 = 1

while (i_1 u<= 0x1f4)
    if (zx.d(*(rax_191 + zx.q(i_1))) == 0xf && zx.d(*(rax_191 + zx.q(var_28_1))) == 5)
        *(arg3 + 0x10) = rax_191 + zx.q(i_1)
        break
    i_1 += 1
    var_28_1 += 1

if (zx.q(*arg3) != 0 && *(arg3 + 8) != 0 && zx.q(arg3[1]) != 0 && *(arg3 + 0x10) != 0)
    return 1
return 0

Dynamically through x64dbg, I identified that the last position is occupied by the address of the ZwResumeThread Syscall. Below is how the structure is stacked in memory.

In other words, the structure created to store this information is as follows:

struct _BabbleLoader_Table_Entry_SyscallID
{
    DWORD API_Syscall_ID;
    DWORD API_Hash;
    PVOID API_Address;
    DWORD NtResumeThread_Syscall_ID;
};

And finally, below is all the restructured pseudocode, with all the information we were able to acquire.

babbleloader_custom_halos_gate(int32_t api_hash, BabbleLoader_NtDLL_Parse* ntdll_module_structure, 
  PBabbleLoader_Table_Entry_SyscallID bloader_table, int32_t flag_zero_one)

// This function has a custom Halo's Gate implementation

    if (ntdll_module_structure->NtDLL_Handler == 0)
        return 0
    
    if (zx.q(api_hash) == 0)
        return 0
    
    bloader_table->API_Hash = api_hash
    
    for (int64_t counter_exports = 0; 
            counter_exports u< zx.q(ntdll_module_structure->NtDLL_NumberOfNames.d); counter_exports += 1)
        void* ntdll_addr_apis_names = zx.q(ntdll_module_structure->NtDLL_AddressOfNames[counter_exports]) + ntdll_module_structure->NtDLL_Handler
        void* api_addr = zx.q(*(ntdll_module_structure->NtDLL_AddressOfFuntions + (zx.q(*(ntdll_module_structure->NtDLL_AddressOfNamesOrdinals
            + (counter_exports << 1))) << 2))) + ntdll_module_structure->NtDLL_Handler
        bloader_table->API_Address = api_addr
        
        if (babbleloader_hashing_algorithm(ntdll_addr_apis_names) == api_hash)
            if (flag_zero_one != 0)
                if (bloader_table->API_Address != 0 && zx.q(bloader_table->API_Hash) != 0)
                    return 1
                
                return 0
            
        	// Below, checks for the presence of the Syscall Stub 
	      	// 0x4c 0x8b 0xd1
	       	// 0xb8 eax syscall_id 0x00 0x00

	       	// mov r10, rcx
	       	// mov eax, SyscallNumber


            if (zx.d(*api_addr) != 0x4c || zx.d(*(api_addr + 1)) != 0x8b || zx.d(*(api_addr + 2)) != 0xd1 || zx.d(*(api_addr + 3)) != 0xb8
                    || zx.d(*(api_addr + 6)) != 0 || zx.d(*(api_addr + 7)) != 0)
                
                 // If it identifies that the Syscall Stub is
                 // Hooked, it starts looking for Syscall Stubs from
                 // neighbors that are not Hooked.

                if (zx.d(*api_addr) == 0xe9)
                    int16_t idx_id_syscall_UP = 1
                    
                    while (zx.d(idx_id_syscall_UP) s<= 0x1f4)
                        if (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20)))
                                == 0x4c && zx.d(
                                *(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 1)) == 0x8b
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 2)) == 0xd1
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 3)) == 0xb8
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 6)) == 0
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 7)) == 0)
                            
                            // Collect High or Low Syscall ID from UP neighbors

                            *bloader_table = zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 5)) << 8 | (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0x20) + 4)) - zx.d(idx_id_syscall_UP))
                            break
                        
                        if (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0))) == 0x4c && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 1)) == 0x8b
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 2)) == 0xd1 && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 3)) == 0xb8
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 6)) == 0 && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 7)) == 0)
                        	
                            // Collect High or Low Syscall ID from UP neighbors

                            *bloader_table = zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 5)) << 8 | (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_UP) * 0xffffffe0) + 4)) + zx.d(idx_id_syscall_UP))
                            break
                        
                        idx_id_syscall_UP += 1
                
                if (zx.d(*(api_addr + 3)) == 0xe9)
                    int16_t idx_id_syscall_DOWN = 1
                    
                    while (zx.d(idx_id_syscall_DOWN) s<= 0x1f4)
                        if (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20))) == 0x4c && zx.d(*(api_addr  + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 1)) == 0x8b
                        && zx .d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 2)) == 0xd1 && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 3)) == 0xb8
                        && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 6)) == 0 && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 7)) == 0)
                            
                            // Collect High or Low Syscall ID from DOWN neighbors

                            *bloader_table = zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 5)) << 8 | (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0x20) + 4)) - zx.d(idx_id_syscall_DOWN))
                            break
                        
                        if (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0))) == 0x4c && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 1)) == 0x8b
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 2)) == 0xd1 && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 3)) == 0xb8
                                && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 6)) == 0 && zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 7)) == 0)
                            
                            // Collect High or Low Syscall ID from DOWN neighbors

                            *bloader_table = zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 5)) << 8 | (zx.d(*(api_addr + sx.q(zx.d(idx_id_syscall_DOWN) * 0xffffffe0) + 4)) + zx.d(idx_id_syscall_DOWN))
                            break
                        
                        idx_id_syscall_DOWN += 1
            else
                *bloader_table = zx.d(*(api_addr + 5)) << 8 | zx.d(*(api_addr + 4))
            
            break
    
    if (bloader_table->API_Address == 0)
        return 0
    
    int64_t rax_190 = bloader_table->API_Address + 0xff
    int32_t counter_I = 0
    int32_t counter_II = 1
    
    while (counter_I u<= 0x1f4)
        if (zx.d(*(rax_190 + zx.q(counter_I))) == 0xf && zx.d(*(rax_190 + zx.q(counter_II))) == 5)
            bloader_table->NtResumeThread_Syscall_ID.q = rax_190 + zx.q(counter_I)
            break
        
        counter_I += 1
        counter_II += 1
    
    if (zx.q(bloader_table->API_Syscall_ID.d) != 0 && bloader_table->API_Address != 0 && zx.q(bloader_table->API_Hash) != 0 && bloader_table->NtResumeThread_Syscall_ID.q != 0)
        return 1
    
    return 0

And so, BabbleLoader can bypass the most common method of dynamic EDR scans, not falling into the Hook jumps implemented by them.

Syscall Offset Collection and Direct Syscall Execution

Also with the goal of evading defenses, BabbleLoader also implements the direct execution of Syscalls, with the goal of executing them simply by jumping to the Syscall's Offset. To do this, BabbleLoader implements two functions.

One change and collect the Offset in the fourth structure object it created (and which we discussed at the beginning).

And the other function simply jumps to execute the Syscall.

Below, we can observe in practice that the jump of the second function takes the BabbleLoader flow directly to the NtTerminateThread Syscall.

This way, BabbleLoader can execute certain Syscalls without the need to call a low-level API.

YARA Rule for BabbleLoader

In the Yara rule below, I identified that there are custom algorithms that may be unique to this family, I placed them in addition to the evasion technique algorithms that BabbleLoader implements.

rule babbleloader_112024 {
  meta:
      author = "0x0d4y"
      description = "This rule detects intrinsic patterns of BabbleLoader."
      date = "2025-01-27"
      score = 100
      reference = "https://0x0d4y.blog/babbleloader-technical-malware-analysis/"
      yarahub_reference_md5 = "fa3d03c319a7597712eeff1338dabf92"
      yarahub_uuid = "b2f18ab3-b4df-4e2f-aa23-de8694beb221"
      yarahub_license = "CC BY 4.0"
      yarahub_rule_matching_tlp = "TLP:WHITE"
      yarahub_rule_sharing_tlp = "TLP:WHITE"
    strings:
    $str_decryption_algorithm = { 48 63 44 24 ?? 48 8b 4c 24 ?? 0f b6 04 ?? 33 44 ?? ?? 0f b6 4c ?? ?? d2 c8 48 63 4c ?? ?? 48 8b 54 ?? ?? 88 04 0a 6b 44 24 ?? ?? 89 44 ?? ?? 8b 44 24 ?? ff c0 89 44 24 }
    $hashing_algorithm = { 48 8b 44 24 ?? 0f be ?? 89 44 24 ?? 8b 44 24 ?? 89 44 24 ?? 48 8b 44 24 ?? 48 ff c0 48 89 44 24 ?? 83 7c 24 08 ?? ?? ?? 8b 44 24 ?? 8b 0c ?? 03 c8 8b c1 89 04 24 8b 44 24 ?? 05 ?? ?? ?? ?? 8b 0c 24 0f af c8 8b c1 89 04 }
    $halos_gate = { 48 8b 44 24 ?? 0f b6 ?? 83 f8 4c 0f ?? ?? ?? ?? ?? 48 8b 44 ?? ?? 0f b6 ?? ?? 3d 8b ?? ?? ?? 75 ?? 48 8b 44 ?? ?? 0f b6 40 ?? 3d d1 ?? ?? ?? 75 ?? 48 8b 44 ?? ?? 0f b6 40 ?? 3d b8 ?? ?? ?? 75 ?? 48 8b 44 ?? ?? 0f b6 40 ?? 85 c0 75 ?? 48 8b 44 ?? ?? 0f b6 40 ?? 85c0 75 ?? 48 8b 44 ?? ?? 0f b6 40 ?? 88 44 ?? ?? 48 8b 44 24 ?? 0f b6 40 ?? 88 44 ?? ?? 0f b6 44 ?? ?? c1 e0 08 0f b6 4c ?? ?? 0b c1 48 8b 8c ?? ?? ?? ?? ?? 89 01 ?? ?? ?? ?? ?? 48 8b 44 ?? ?? 0f b6 00 3d e9 }
    $get_syscall_offset = { 4d 33 db 4c 8b d9 c3 }
    $jump_syscall_offset = { 4c 8b d1 41 8b 03 41 ff 63 ?? }
    condition:
        uint16(0) == 0x5a4d and
        $str_decryption_algorithm and $hashing_algorithm and (1 of ($halos_gate, $get_syscall_offset, $jump_syscall_offset))
}

This and other Yara rules are available on my Github.

With this detection rule, it was possible to detect three more samples, through the Yara Hunt feature of Unpac.me. Here you can access the Shared Yara Hunt.

Conclusion

I hope you enjoyed reading this and that I have contributed in some way to your journey! Until next time.

References

I would not have been able to do this research without standing on the shoulders of giants.

[BabbleLoader] A Deep Dive into EDR and Machine Learning-Based Endpoint Protection Evasion