Monday, June 01, 2009

Duplication

Its been a while since ive encountered two weird C++ bugs in one week.

The first was on OSX, where the compiler generated two copies of the same class which both contained a union but with different layouts.


template <typename _Kind_>
class Object
{
public:

enum ObjectType
{
TYPE_VOID,
TYPE_KIND,
};

union ObjectData;
{
void * Void;
_Kind_ * Kind;
};

ObjectType Type;
ObjectData Data;

void * operator & () const
{
switch(Type)
{
case TYPE_VOID: return (void*)Data.Void;
scase TYPE_KIND: return (void*)Data.Kind;
}

return 0;
}
};


The resulting bug was exposed when a non const version of the object was used to call the const operator.


int value = 8;

Object<int> obj;
obj.Type = Object<int>::TYPE_VOID;
obj.Data.Kind = &value;

// This calls the const operator which may return an invalid
// memory location. When dereferenced this will cause an error.

int * result = &obj;

if (*result == 4)
{
// ...
}


Strange but true, obviously the code producing the bug was a little more complicated but semantically exactly the same as that above.

The second bug was not quite so serious, but still very confusing as it relates to one of those obscure C++ standard "undefined behaviors". Sometimes i wish the standards body would realise that the language may actually benefit from having some consistency and predictability.

This time the code was to perform some decoding from ascii to binary hex values on Windows and used a lookup table to convert 2 ascii bytes into 1 binary byte.


char entry[] = {0x01,0x02,0x03,0x04,0x05};
int table[] = {0x01,0x02,0x03,0x04,0x05};

int decode[1];
decode[0] = (table[*entry++]<<4)|(table[*entry++]);



However the result was not what you would expect. The assembly generated looked something like this. With the result being that the two post increments did not
occur until after the assignment. Weird... ?


009603A7 mov eax,dword ptr [entry]
009603AA movsx ecx,byte ptr [eax]
009603AD movsx edx,byte ptr table (0BC4A60h)[ecx]
009603B4 shl edx,4
009603B7 mov eax,dword ptr [entry]
009603BA movsx ecx,byte ptr [eax]
009603BD movsx eax,byte ptr table (0BC4A60h)[ecx]
009603C4 or edx,eax
009603C6 mov byte ptr [decode],dl
009603C9 mov ecx,dword ptr [entry]
009603CC add ecx,1
009603CF mov dword ptr [entry],ecx
009603D2 mov edx,dword ptr [entry]
009603D5 add edx,1
009603D8 mov dword ptr [entry],edx


But it turns out there is an obscure C++ feature (see section 5, paragraph 4 of the ISO C++ standard) whereby you are not allowed to modify the same variable twice in a single expression. Doing so results in the above "undefined behavior". Great.

Why this needs to be so i dont really understand.

So there you go, two annoying bugs in one week. Both required some pretty heavy problem solving to see.

C++ is a hard language to love.