October 6, 2016 (last edited March 27, 2020) · 5 minute read

XORing Strings in the Twenty-First Century

Statically encoding strings with runtime decryption

When attempting to remain out of sight from string-reliant detection systems, whether it be anti-cheat engines or even just the novice reverse engineer, XORing is a commonly used technique. This is not a new-fangled technique by far and truly only serves to defeat naïve static analysis; not only can the cipher text be trivially decoded by hand, but the plaintext is stored as clear as day in memory after first use. Rather, this is a way to escape signature scans of, e.g., .rodata.

Such techniques have been employed for quite a while in regard to mainly unsavory programs, including malware and game cheats. Traditionally, this involved a pseudo-preprocessor build step that would replace wrapped strings with encoded equivalents and include a runtime decoding function. With the advent of C++11 and modern template intricacies however, it is now possible to carry out this encoding process at compile-time without any external tooling.

First, we can generate ourselves a single byte key to XOR against. Here, it would be best to rely on a variable, somewhat random source. I have encountered implementations that seed using bytes of the __TIME__ preprocessor macro, but for a little more work we can achieve a bit better. Using cmake, we are able to generate a string consisting of random numbers and feed this into the preprocessor. We now have, in our header file,

const uint8_t xor_key = _XOR_KEY_;

And in CMakeLists.txt,

string(RANDOM LENGTH 2 ALPHABET 0123456789ABCDEF XOR_KEY)
add_definitions(-D_XOR_KEY_=0x${XOR_KEY})

Regardless of how the key is chosen, we must craft a container for our encoded data. In order to be able to iterate over a string and apply a XOR operation to each character at compile-time, we need some sort of incrementing index. This can be achieved by emulating a loop using the C++11 feature of parameter packing. n integer value parameters can be passed for a null-terminated string of byte length n+1, starting at 0 and incrementing to n-1. Prior to C++14, this had to be done through an even more ridiculous manner, though the index_sequence template eases our pain. We can then use make_index_sequence to produce such a sequence to match the length of our string.

template <typename Is>
class xor_string;

template <size_t... Is>
class xor_string<std::index_sequence<Is...>> {

For efficiency, we can store the ciphertext directly in a field and perform the decoding process right in the very same memory. Not only does this speed up future access attempts, but no excess memory is allocated either.

    bool decrypted = false;
    char ciphertext[sizeof...(Is)+1];

The class constructor is the magic responsible for actually XORing the string at compilation time. This is made possible once again through parameter packing, now however expanding out the template data. Together with list initialization, ciphertext can be populated with the “encrypted” bytes as if done at runtime inside of an explicit loop. Static initialization only introduces a minor binary overhead compared to traditional string literal storage.

public:
    constexpr xor_string(char const * const str) noexcept 
        ciphertext{ static_cast<char>(str[Is] ^ xor_key + Is)... } {}

Finally, we have the meat of the approach: the decoding/decryption process. There’s nothing underhanded about this, and if anything, the lack of dense modern C++ functionality makes this routine stand out of place. If the message has already been decoded, then it is returned from memory. Otherwise, the XOR is carried out in place of the ciphertext.

    char const *decrypt() {
        if (decrypted) {
            return ciphertext;
        }

        for (auto i = 0; i < sizeof...(Is); i++) {
            ciphertext[i] = ciphertext[i] ^ xor_key + i;
        }
        ciphertext[sizeof...(Is)] = '\0';

        decrypted = true;
        return ciphertext;
    }
};

Unfortunately, this all leaves us with a snake of an expression we have to wrangle every time we use a string. Not only does an instance of xor_string have to be created, but we must also create an std::index_sequence to match the length of the unterminated string. To keep all of that nasty template hacking out of sight, we can rely on the good ol' preprocessor.

#define $(str) xor_string<std::make_index_sequence<sizeof(str) - 1>>(str).decrypt()

It can also prove useful to combine this macro with a string formatting utility such as fmtlib. Squint a little bit, and we have ourselves one of those fancy .NET features.

#define $(str, ...) fmt::format(xor_string<std::make_index_sequence<sizeof(str) - 1>>(str).decrypt(), __VA_ARGS__)

// ...

std::cout << $("{} is {}", "foo", 12) << std::endl; // "foo is 12"

So, what’s the output look like? After all, our efforts would be in vain if the compiler still sneaks a copy of our string inside somewhere. Let’s compile a snippet under MSVC to verify:

std::cout << $("Hello, world!") << std::endl;

Luckily, our Release mode binary¹ is clear of any signs of our original message, rather only holding our compile-time encoded string and some inlined code from our decrypt method. Keeping in mind that we weren’t intending to fool any human adversaries, the simplicity of the decryption routine should contribute to a negligible overhead, especially when strings are utilized only once.

mov	DWORD PTR [ebp],   0x3b331d00  ; load our encoded string
mov	DWORD PTR [ebp+4], 0x7b763634
mov	DWORD PTR [ebp+8], 0x332c322b
mov	WORD PTR [ebp+12], 0x4004
mov	BYTE PTR [ebp+14], 0x00

xor	eax, eax                     ; eax ← 0
loop:
lea	ecx, DWORD PTR [eax+0x55]    ; ecx ← 0x55 + eax
xor	BYTE PTR [ebp+eax+001], cl   ; ebp[eax+1] ← ebp[eax+1] ^ ecx
inc	eax                          ; eax ← eax + 1
cmp	eax, 0xD                     ; if eax < 13:
jb	SHORT loop                   ;     goto loop

This is a must: compiling with debug symbols attached may inevitably leave our original string intact to ease the debugging process. ↩︎