__sse2_available

Today one of my colleagues noticed that his C++ project compiled by Visual Studio is using SSE instructions even with Enable Enhanced Instruction Set set to Not Set in Code Generation options of the compiler. That was strange enough to spend a few minutes and figure out what’s happening under the hood. It is well known fact tat CRT can use SSE2 to speed up some memory related functions like memcpy or memset, but I was always thinking that this optimization is only used when /arch:SSE or /arch:SSE2 compiler options are set. Well… I was wrong, there is one magic variable that is checked at runtime and if it is set to true, memcpy, memset and memmove will use SSE2 optimizations. Mentioned magic variable is called __sse2_available, it is initialized during C runtime initialization:

int __cdecl __sse2_available_init()
{
    __sse2_available = _get_sse2_info();
    return 0;
}

_get_sse2_info() is using plain CPUID opcode to check availability of SSE2 instruction set, newer versions of Visual Studio (ie 2010) are using WinAPI called IsProcessorFeaturePresent from Kernel32.dll:

int __cdecl __sse2_available_init()
{
    __sse2_available = IsProcessorFeaturePresent(PF_XMMI64_INSTRUCTIONS_AVAILABLE);
    return 0;
}

It doesn’t matter which version of CRT will be used (dynamic or static), memcpy/memset/memmove will take advantage of CPU features anyway.

Countermeasure

There is a trick that can be used to avoid SSE2 optimizations inside memcpy/memset/memmove, however it will work only for projects that statically links CRT:

extern "C" int __sse2_available;
 
int main()
{
    __sse2_available = 0;
    return 0;
}

Setting __sse2_available to 0 at the begining of the program will prevent SSE2 optimizations.

2 Comments

Leave a Reply to ReWolf Cancel reply

Your email address will not be published. Required fields are marked *