niedziela, 5 stycznia 2014

Switch-case statement

Lets consider long switch-case statement like:

switch(value)
{
  case VALUE_0:
    do_sth_0();
  break;
  case VALUE_1:
    do_sth_1();
  break;
  case VALUE_2:
    do_sth_2();
  break;

  ...
  
  case VALUE_N:
    do_sth_N();
  break;
}

it may be tempting to change long list of cases into loop performing linear search for value e.g.

static const ValueFunPtrPair tab[] =
{
  {VALUE_0, do_sth_0},
  {VALUE_1, do_sth_1},
  {VALUE_2, do_sth_2},
  ...
  {VALUE_N, do_sth_N},
};

for (i = 0; i < sizeof(tab)/sizeof(tab[0]); ++i)
{
  if (tab[i].value == value)
  {
    tab[i].do_sth_ptr();
    break;
  }
}
but is this better? And what better means - faster or maybe shorter source code or maybe output code or maybe less error prone? It is of course possible to change linear-complexity implementation into binary search (log2(N) time) or jump table/perfect hash (constant time), but in this case Open-Closed Principle is violated - change in enum definition requires changes in another place(s). Such approaches are just harder to maintain. Question is - do we really need to optimize? Good old rule says "premature optimization is the root of all evil". First we have to know our enemy - the compiler generated code. It sometimes appear that compiler can be better on providing faster code than programmer.
Let's look at example - enumeration type of 1000+1 values, all values stated as cases in test function. If we stay with standard switch-case implementation - see what is compiler output.
Test function looks like:
void f(int value)
{
  switch(value)
  {
  case VALUE_0:
    printf("value: %d\n", VALUE_0);
  break;
  case VALUE_1:
    printf("value: %d\n", VALUE_1);
  break;
  case VALUE_2:
    printf("value: %d\n", VALUE_2);
  break;

  ...
  
  case VALUE_1000:
    printf("value: %d\n", VALUE_1000);
  break;
  }
}
First case is when enum values are subsequent values from 0 to 1000.
Below is gcc 4.8.2 output (without optimization) for x86-64. Even without optimization compiler generates code that contains only one jump to find proper case (constant time).
0000000000400550 <f>:

  400550: 55                    push   %rbp
  400551: 48 89 e5              mov    %rsp,%rbp
  400554: 48 83 ec 10           sub    $0x10,%rsp
  400558: 89 7d fc              mov    %edi,-0x4(%rbp)
  40055b: 81 7d fc e8 03 00 00  cmpl   $0x3e8,-0x4(%rbp)
  400562: 0f 87 bb 61 00 00     ja     406723 <f x61d3="">
  400568: 8b 45 fc              mov    -0x4(%rbp),%eax
  40056b: 48 8b 04 c5 d0 67 40  mov    0x4067d0(,%rax,8),%rax
  400572: 00 
  400573: ff e0                 jmpq   *%rax
  400575: be 00 00 00 00        mov    $0x0,%esi
  40057a: bf c0 67 40 00        mov    $0x4067c0,%edi
  40057f: b8 00 00 00 00        mov    $0x0,%eax
  400584: e8 87 fe ff ff        callq  400410 <printf plt="">
  400589: e9 95 61 00 00        jmpq   406723 <f x61d3="">
  40058e: be 01 00 00 00        mov    $0x1,%esi
  400593: bf c0 67 40 00        mov    $0x4067c0,%edi
  400598: b8 00 00 00 00        mov    $0x0,%eax
  40059d: e8 6e fe ff ff        callq  400410 <printf plt="">
  4005a2: e9 7c 61 00 00        jmpq   406723 <f x61d3="">
  4005a7: be 02 00 00 00        mov    $0x2,%esi
  4005ac: bf c0 67 40 00        mov    $0x4067c0,%edi
  4005b1: b8 00 00 00 00        mov    $0x0,%eax
  4005b6: e8 55 fe ff ff        callq  400410 <printf plt="">
  4005bb: e9 63 61 00 00        jmpq   406723 <f x61d3="">
  ...

Second case - reversed enum values (first 1000, last 0) - output code generated by compiler is almost the same, only cases are reversed (listing omitted).

Third case - random enum values - compiler generates code that realizes binary search (without loop - unrolled):
0000000000400550 <f>:
  400550: 55                    push   %rbp
  400551: 48 89 e5              mov    %rsp,%rbp
  400554: 48 83 ec 10           sub    $0x10,%rsp
  400558: 89 7d fc              mov    %edi,-0x4(%rbp)
  40055b: 8b 45 fc              mov    -0x4(%rbp),%eax
  40055e: 3d a0 40 00 00        cmp    $0x40a0,%eax
  400563: 0f 84 46 3c 00 00     je     4041af <f x3c5f="">
  400569: 3d a0 40 00 00        cmp    $0x40a0,%eax
  40056e: 0f 8f e5 1b 00 00     jg     402159 <f x1c09="">
  400574: 3d 3e 20 00 00        cmp    $0x203e,%eax
  400579: 0f 84 e8 60 00 00     je     406667 <f x6117="">
  40057f: 3d 3e 20 00 00        cmp    $0x203e,%eax
  400584: 0f 8f dc 0d 00 00     jg     401366 <f xe16="">
  40058a: 3d 45 10 00 00        cmp    $0x1045,%eax
  40058f: 0f 84 e0 98 00 00     je     409e75 <f x9925="">
  400595: 3d 45 10 00 00        cmp    $0x1045,%eax
  40059a: 0f 8f dd 06 00 00     jg     400c7d <f x72d="">
  ...

Above examples show that for the cases it has not much sens to "optimize" implementation.
Of course there are cases when compiler generated code is slower compared to hand-crafted, but implementation has to be carefully checked.
Additionally please note that compiler generated binary code is unpredictable (language standard does not define switch-case implementation rules). It is up to compiler developers to provide good solutions to different cases of values included in switch cases.
It is obvious output code depends on compiler, compiler version, optimization level, therefore in time-critical fragments output code shall be checked.

There is number of different techniques can be incorporated by compiler developers (see [1]): jump tables, range tests, balanced binary trees, bit tests, tabular methods, index mapping, condition codes, imperfect hashing, perfect hashing, safe hashing, conditional moves and superoptimization. Uff, still sure you can make it better than compiler?

Reference

[1] ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf

piątek, 3 stycznia 2014

Short-circuit evaluation

Something to start with - short-circuit evaluation.

Feature present in many programming languages (esp. "die" invocation in Perl), C and C++ are no exception to this.

Let's take first example:

if ((1 == a) && (2 == b))
{
   // do sth.
}

The point here is - when 'a' is not equal to 1 then second condition is not checked at all.
Therefore first outcome of SCE is minimization of operation and there is no need for explicit constructions like:

if (1 == a)
{
   if (2 == b)
   {
      // do sth.
   }
}

Similarly for "or" operation e.g.:
if ((1 == a) || (2 == b))
{
   // do sth.
}

When 'a' is equal to 1 second is not checked.
Above examples shows only minor effect of run-time optimization.

More difficult situation can appear when condition contains function invocation (or expression) that has side effect, e.g.:
if ((1 == a) || change_value(&b))
{
   // do sth.
}
bool change_value(int *b)
{
   if (*b > 0)
   {
      --*b;
   }
   return *b > 0;
}

When 'a' is equal to 1 then there will be no function invocation and consequently no change to 'b' value.



SCE behavior is a must for C and C++ compilers as it is stated explicitly in standard:
  • for C
  1. "Logical AND operator" - "If the first operand compares equal to 0, the second operand is not evaluated."
  2. "Logical OR operator" - "If the first operand compares unequal to 0, the second operand is not evaluated."
  • for C++
  1. "Logical AND operator" - "Unlike &, && guarantees left-to-right evaluation: the second operand is not evaluated if the first operand is false."
  2. "Logical OR operator" - "Unlike |, || guarantees left-to-right evaluation; moreover, the second operand is not evaluated if the first operand evaluates to true."

References


People

Some most influential (for me) programming names (and reasons for that):