rep movsb isn’t memcpy()

Some of you probably noticed that HexRays translates rep movsb opcode to memcpy() function from standard C library. In most cases this is perfectly correct behaviour, but there is at least one example when it will not work as it should.

Case 1 (correct):

Two different buffers (not overlapping), the simplest situation, rep movsb can be translated to memcpy() without any troubles.

Case 2 (incorrect):

Behaviour of memcpy() for overlapping regions is undefined (according to C standard library specification), documentaion suggests to use memmove() function which will check for overlapping buffers. Neither memcpy() nor memmove() can be used as a replacement for rep movsb. Below example will show exactly why:

    char tab[] = "qwertyuiopasdfghjklzxcvbnm";
    char* ptr_src = tab;
    char* ptr_dst = tab + 1;
    size_t sz = 5;

Now we want to execute rep movsb:

    mov    esi, ptr_src
    mov    edi, ptr_dst
    mov    ecx, sz
    rep    movsb

after execution of this snippet, tab[] will contain “qqqqqquiopasdfghjklzxcvbnm”. Above situation is very common in various decompression algorithms.

Let’s check what we will achieve with memcpy() function:

    memcpy(ptr_dst, ptr_src, sz);

http://codepad.org/194EP8Oa

After execution under GCC 4.1.2, tab[] contains “qqwerruiopasdfghjklzxcvbnm”, which means that compiler optimized memory operation, and at first it copied 4 bytes with movsd, and then 5th byte was copied with movsb. Different compiler may produce different results, because (as it was mentioned earlier) behaviour of memcpy() is undefined for overlapping buffers.

Last check will use memmove() function:

    memmove(ptr_dst, ptr_src, sz);

http://codepad.org/ZYBD474T

After execution under GCC 4.1.2, tab[] contains “qqwertuiopasdfghjklzxcvbnm”, which was expected.

Summing up, if you’re trying to decompile some code with HexRays and it still doesn’t work correctly, better check all occurences of memcpy() and if there will be some overlapping buffers, change memcpy() to plain loop (for, while) that will just copy bytes. Of course similar problems might appear for rep movsw and rep movsd instructions.

2 Comments

  1. When describing the memmove() behaviour, you’ve written:

    After execution under GCC 4.1.2, tab[] contains “qqwertuiopasdfghjklzxcvbnm”, which was expected.

    This si the same as the memcpy() case. Did you mean to write: “qqqqqquiopasdfghjklzxcvbnm”?

    Reply

    1. @2of1
      It isn’t the same as in memcpy(), look:

      “qqwer r uiopasdfghjklzxcvbnm” - memcpy()
      “qqwer t uiopasdfghjklzxcvbnm” - memmove()

      So, memmove() properly moved “qwert” part and memcpy() failed because it doesn’t support overlapping buffers (it basically doubled the “r” letter due to described optimization).

      I wrote that it was expected, because in normal situation source buffer should be properly moved to the destination buffer and memmove() fulfils this task perfectly.

      The main problem here was that neither memcpy() nor memmove() can be used as a replacement for rep movsb for overlapping buffers.

      Reply

Leave a Reply to ReWolf Cancel reply

Your email address will not be published. Required fields are marked *