poniedziałek, 3 lutego 2014

Fundamental types initialization

Consider something as complicated as:

struct A
{
   int f; 
};

the problem is value of member 'f'.
Structure A is of course POD therefore rules for member initialization can be really tricky.
Differences comes from three sources:
- way of initalization,
- C++ standard version,
- compiler vendor.

Possible ways of initialization are:
A;
A();
A{};
which should be same to version with dynamic allocation:
new A;
new A();
new A{};

Note two things:
- it is impossible to use second case directly i.e.:
A a();
because in C++ this is function declaration (declaration can appear also inside function body).
To use second initialization method, syntax can be:
A a = A();
- third way of initialization is from C++11 and allows for:
A a{};

Initialization of fundamental type variables and PODs is subject of constant changes and obscurity between C++ standards.
In C++98 initialization of non-POD object without constructor (e.g. struct with desctructor only or inherited) does not initialize fundamental type fields to 0 (but as it is visible below it is not truth for current compilers).
This was changed in C++03.
Also compilers can introduce surprising behaviours. VSC++ (before version 2013) was famous of not initializing PODs with A() syntax (still visible for some cases).

Below are test results of initialization results for different types and compilers.
Cases are
  1. fundamental type,
  2. simplest POD struct,
  3. simplest non-POD struct (destructor added),
  4. non-POD struct with field default initialization in constructor,
  5. non-POD struct without field initialization in constructor,
  6. non-POD struct with inheritance and no constructors,
  7. non-POD struct with inheritance and field default initialization in base class constructor,
  8. non-POD struct with inheritance and field default initialization in derived class constructor.

Compilers used
  • gcc 4.8.2
  • clang 3.3
  • VisualStudio C++ 2013

1. fundamental type
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
int a; - - - - - - -
int a = int(); 0 0 0 0 0 0 0
int a{}; 0 n.a. n.a. 0 n.a. n.a. 0
int *a = new int; - - - - - - -
int *a = new int(); 0 0 0 0 0 0 0
int *a = new int{}; 0 n.a. n.a. 0 n.a. n.a. 0


2. simplest POD struct
struct A
{
   int f; 
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; - - - - - - -
A a = A(); 0 0 0 0 0 0 0
A a{}; 0 n.a. n.a. 0 n.a. n.a. 0
A *a = new A; - - - - - - -
A *a = new A(); 0 0 0 0 0 0 0
A *a = new A{}; 0 n.a. n.a. 0 n.a. n.a. 0


3. simplest non-POD struct (destructor added)
struct A
{
   ~A() {}
   int f; 
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; - - - - - - -
A a = A(); 0 0 0 0 0 0 -
A a{}; 0 n.a. n.a. 0 n.a. n.a. 0
A *a = new A; - - - - - - -
A *a = new A(); 0 0 0 0 0 0 -
A *a = new A{}; 0 n.a. n.a. 0 n.a. n.a. 0


4. non-POD struct with field default initialization in constructor
struct A
{
   A() : f() {}
   int f; 
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; 0 0 0 0 0 0 0
A a = A(); 0 0 0 0 0 0 0
A a{}; 0 n.a. n.a. 0 n.a. n.a. 0
A *a = new A; 0 0 0 0 0 0 0
A *a = new A(); 0 0 0 0 0 0 0
A *a = new A{}; 0 n.a. n.a. 0 n.a. n.a. 0


5. non-POD struct without field initialization in constructor
struct A
{
   A() {}
   int f; 
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; - - - - - - -
A a = A(); - - - - - - -
A a{}; - n.a. n.a. - n.a. n.a. -
A *a = new A; - - - - - - -
A *a = new A(); - - - - - - -
A *a = new A{}; - n.a. n.a. - n.a. n.a. -


6. non-POD struct with inheritance and no constructors
struct P
{
   int g;
};
struct A : public P
{
   int f;
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; f:-, g:- f:-, g:- f:-, g:- f:-, g:- f:-, g:- f:-, g:- f:-, g:-
A a = A(); f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:-, g:-
A a{}; f:0, g:0 n.a. n.a. f:0, g:0 n.a. n.a. f:-, g:-
A *a = new A; f:-, g:- f:-, g:- f:-, g:- f:-, g:- f:-, g:- f:-, g:- f:-, g:-
A *a = new A(); f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0
A *a = new A{}; f:0, g:0 n.a. n.a. f:0, g:0 n.a. n.a. f:-, g:-


7. non-POD struct with inheritance and field default initialization in base class constructor
struct P
{
   P() : g() {}
   int g;
};
struct A : public P
{
   int f;
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0
A a = A(); f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:-, g:0
A a{}; f:0, g:0 n.a. n.a. f:0, g:0 n.a. n.a. f:-, g:0
A *a = new A; f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0 f:-, g:0
A *a = new A(); f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:0, g:0 f:-, g:0
A *a = new A{}; f:0, g:0 n.a. n.a. f:0, g:0 n.a. n.a. f:-, g:0


8. non-POD struct with inheritance and field default initialization in derived class constructor
struct P
{
   int g;
};
struct A : public P
{
   A() : f() {}
   int f;
};
init method g++ std=c++11 g++ std=c++03 g++ std=c++98 clang++ std=c++11 clang++ std=c++03 clang++ std=c++98 VSC++2013
A a; f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:-
A a = A(); f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:-
A a{}; f:0, g:- n.a. n.a. f:0, g:- n.a. n.a. f:0, g:-
A *a = new A; f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:-
A *a = new A(); f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:- f:0, g:-
A *a = new A{}; f:0, g:- n.a. n.a. f:0, g:- n.a. n.a. f:0, g:-


Legend
  • "0" - initialized to 0
  • "-" - not initialized
  • "n.a." - available only in C++11

And what lesson comes form data above - to be really sure field is initialized, you have to explicitly use initialization construct for the field (in constructor initialization list).

Test application available for downlowad here. Please note placement new used to avoid false 0-positive answers. For stack allocated items it is not that easy - there is try with recursive function (to pollute stack with non 0 values) but it is not always enough (false 0 answers can happen).

Note that in VSC++2013 "__cplusplus" is defined as 199711 (like for C++98 standard) but compiler allows for C++11 usage (perhaps not complete and therefore "__cplusplus" is not defined as 201103).

niedziela, 5 stycznia 2014

Switch-case statement

Lets consider long switch-case statement like:

switch(value)
{
  case VALUE_0:
    do_sth_0();
  break;
  case VALUE_1:
    do_sth_1();
  break;
  case VALUE_2:
    do_sth_2();
  break;

  ...
  
  case VALUE_N:
    do_sth_N();
  break;
}

it may be tempting to change long list of cases into loop performing linear search for value e.g.

static const ValueFunPtrPair tab[] =
{
  {VALUE_0, do_sth_0},
  {VALUE_1, do_sth_1},
  {VALUE_2, do_sth_2},
  ...
  {VALUE_N, do_sth_N},
};

for (i = 0; i < sizeof(tab)/sizeof(tab[0]); ++i)
{
  if (tab[i].value == value)
  {
    tab[i].do_sth_ptr();
    break;
  }
}
but is this better? And what better means - faster or maybe shorter source code or maybe output code or maybe less error prone? It is of course possible to change linear-complexity implementation into binary search (log2(N) time) or jump table/perfect hash (constant time), but in this case Open-Closed Principle is violated - change in enum definition requires changes in another place(s). Such approaches are just harder to maintain. Question is - do we really need to optimize? Good old rule says "premature optimization is the root of all evil". First we have to know our enemy - the compiler generated code. It sometimes appear that compiler can be better on providing faster code than programmer.
Let's look at example - enumeration type of 1000+1 values, all values stated as cases in test function. If we stay with standard switch-case implementation - see what is compiler output.
Test function looks like:
void f(int value)
{
  switch(value)
  {
  case VALUE_0:
    printf("value: %d\n", VALUE_0);
  break;
  case VALUE_1:
    printf("value: %d\n", VALUE_1);
  break;
  case VALUE_2:
    printf("value: %d\n", VALUE_2);
  break;

  ...
  
  case VALUE_1000:
    printf("value: %d\n", VALUE_1000);
  break;
  }
}
First case is when enum values are subsequent values from 0 to 1000.
Below is gcc 4.8.2 output (without optimization) for x86-64. Even without optimization compiler generates code that contains only one jump to find proper case (constant time).
0000000000400550 <f>:

  400550: 55                    push   %rbp
  400551: 48 89 e5              mov    %rsp,%rbp
  400554: 48 83 ec 10           sub    $0x10,%rsp
  400558: 89 7d fc              mov    %edi,-0x4(%rbp)
  40055b: 81 7d fc e8 03 00 00  cmpl   $0x3e8,-0x4(%rbp)
  400562: 0f 87 bb 61 00 00     ja     406723 <f x61d3="">
  400568: 8b 45 fc              mov    -0x4(%rbp),%eax
  40056b: 48 8b 04 c5 d0 67 40  mov    0x4067d0(,%rax,8),%rax
  400572: 00 
  400573: ff e0                 jmpq   *%rax
  400575: be 00 00 00 00        mov    $0x0,%esi
  40057a: bf c0 67 40 00        mov    $0x4067c0,%edi
  40057f: b8 00 00 00 00        mov    $0x0,%eax
  400584: e8 87 fe ff ff        callq  400410 <printf plt="">
  400589: e9 95 61 00 00        jmpq   406723 <f x61d3="">
  40058e: be 01 00 00 00        mov    $0x1,%esi
  400593: bf c0 67 40 00        mov    $0x4067c0,%edi
  400598: b8 00 00 00 00        mov    $0x0,%eax
  40059d: e8 6e fe ff ff        callq  400410 <printf plt="">
  4005a2: e9 7c 61 00 00        jmpq   406723 <f x61d3="">
  4005a7: be 02 00 00 00        mov    $0x2,%esi
  4005ac: bf c0 67 40 00        mov    $0x4067c0,%edi
  4005b1: b8 00 00 00 00        mov    $0x0,%eax
  4005b6: e8 55 fe ff ff        callq  400410 <printf plt="">
  4005bb: e9 63 61 00 00        jmpq   406723 <f x61d3="">
  ...

Second case - reversed enum values (first 1000, last 0) - output code generated by compiler is almost the same, only cases are reversed (listing omitted).

Third case - random enum values - compiler generates code that realizes binary search (without loop - unrolled):
0000000000400550 <f>:
  400550: 55                    push   %rbp
  400551: 48 89 e5              mov    %rsp,%rbp
  400554: 48 83 ec 10           sub    $0x10,%rsp
  400558: 89 7d fc              mov    %edi,-0x4(%rbp)
  40055b: 8b 45 fc              mov    -0x4(%rbp),%eax
  40055e: 3d a0 40 00 00        cmp    $0x40a0,%eax
  400563: 0f 84 46 3c 00 00     je     4041af <f x3c5f="">
  400569: 3d a0 40 00 00        cmp    $0x40a0,%eax
  40056e: 0f 8f e5 1b 00 00     jg     402159 <f x1c09="">
  400574: 3d 3e 20 00 00        cmp    $0x203e,%eax
  400579: 0f 84 e8 60 00 00     je     406667 <f x6117="">
  40057f: 3d 3e 20 00 00        cmp    $0x203e,%eax
  400584: 0f 8f dc 0d 00 00     jg     401366 <f xe16="">
  40058a: 3d 45 10 00 00        cmp    $0x1045,%eax
  40058f: 0f 84 e0 98 00 00     je     409e75 <f x9925="">
  400595: 3d 45 10 00 00        cmp    $0x1045,%eax
  40059a: 0f 8f dd 06 00 00     jg     400c7d <f x72d="">
  ...

Above examples show that for the cases it has not much sens to "optimize" implementation.
Of course there are cases when compiler generated code is slower compared to hand-crafted, but implementation has to be carefully checked.
Additionally please note that compiler generated binary code is unpredictable (language standard does not define switch-case implementation rules). It is up to compiler developers to provide good solutions to different cases of values included in switch cases.
It is obvious output code depends on compiler, compiler version, optimization level, therefore in time-critical fragments output code shall be checked.

There is number of different techniques can be incorporated by compiler developers (see [1]): jump tables, range tests, balanced binary trees, bit tests, tabular methods, index mapping, condition codes, imperfect hashing, perfect hashing, safe hashing, conditional moves and superoptimization. Uff, still sure you can make it better than compiler?

Reference

[1] ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf

piątek, 3 stycznia 2014

Short-circuit evaluation

Something to start with - short-circuit evaluation.

Feature present in many programming languages (esp. "die" invocation in Perl), C and C++ are no exception to this.

Let's take first example:

if ((1 == a) && (2 == b))
{
   // do sth.
}

The point here is - when 'a' is not equal to 1 then second condition is not checked at all.
Therefore first outcome of SCE is minimization of operation and there is no need for explicit constructions like:

if (1 == a)
{
   if (2 == b)
   {
      // do sth.
   }
}

Similarly for "or" operation e.g.:
if ((1 == a) || (2 == b))
{
   // do sth.
}

When 'a' is equal to 1 second is not checked.
Above examples shows only minor effect of run-time optimization.

More difficult situation can appear when condition contains function invocation (or expression) that has side effect, e.g.:
if ((1 == a) || change_value(&b))
{
   // do sth.
}
bool change_value(int *b)
{
   if (*b > 0)
   {
      --*b;
   }
   return *b > 0;
}

When 'a' is equal to 1 then there will be no function invocation and consequently no change to 'b' value.



SCE behavior is a must for C and C++ compilers as it is stated explicitly in standard:
  • for C
  1. "Logical AND operator" - "If the first operand compares equal to 0, the second operand is not evaluated."
  2. "Logical OR operator" - "If the first operand compares unequal to 0, the second operand is not evaluated."
  • for C++
  1. "Logical AND operator" - "Unlike &, && guarantees left-to-right evaluation: the second operand is not evaluated if the first operand is false."
  2. "Logical OR operator" - "Unlike |, || guarantees left-to-right evaluation; moreover, the second operand is not evaluated if the first operand evaluates to true."

References


People

Some most influential (for me) programming names (and reasons for that):