switch(value) { case VALUE_0: do_sth_0(); break; case VALUE_1: do_sth_1(); break; case VALUE_2: do_sth_2(); break; ... case VALUE_N: do_sth_N(); break; }
it may be tempting to change long list of cases into loop performing linear search for value e.g.
static const ValueFunPtrPair tab[] = { {VALUE_0, do_sth_0}, {VALUE_1, do_sth_1}, {VALUE_2, do_sth_2}, ... {VALUE_N, do_sth_N}, }; for (i = 0; i < sizeof(tab)/sizeof(tab[0]); ++i) { if (tab[i].value == value) { tab[i].do_sth_ptr(); break; } }but is this better? And what better means - faster or maybe shorter source code or maybe output code or maybe less error prone? It is of course possible to change linear-complexity implementation into binary search (log2(N) time) or jump table/perfect hash (constant time), but in this case Open-Closed Principle is violated - change in enum definition requires changes in another place(s). Such approaches are just harder to maintain. Question is - do we really need to optimize? Good old rule says "premature optimization is the root of all evil". First we have to know our enemy - the compiler generated code. It sometimes appear that compiler can be better on providing faster code than programmer.
Let's look at example - enumeration type of 1000+1 values, all values stated as cases in test function. If we stay with standard switch-case implementation - see what is compiler output.
Test function looks like:
void f(int value) { switch(value) { case VALUE_0: printf("value: %d\n", VALUE_0); break; case VALUE_1: printf("value: %d\n", VALUE_1); break; case VALUE_2: printf("value: %d\n", VALUE_2); break; ... case VALUE_1000: printf("value: %d\n", VALUE_1000); break; } }First case is when enum values are subsequent values from 0 to 1000.
Below is gcc 4.8.2 output (without optimization) for x86-64. Even without optimization compiler generates code that contains only one jump to find proper case (constant time).
0000000000400550 <f>: 400550: 55 push %rbp 400551: 48 89 e5 mov %rsp,%rbp 400554: 48 83 ec 10 sub $0x10,%rsp 400558: 89 7d fc mov %edi,-0x4(%rbp) 40055b: 81 7d fc e8 03 00 00 cmpl $0x3e8,-0x4(%rbp) 400562: 0f 87 bb 61 00 00 ja 406723 <f x61d3=""> 400568: 8b 45 fc mov -0x4(%rbp),%eax 40056b: 48 8b 04 c5 d0 67 40 mov 0x4067d0(,%rax,8),%rax 400572: 00 400573: ff e0 jmpq *%rax 400575: be 00 00 00 00 mov $0x0,%esi 40057a: bf c0 67 40 00 mov $0x4067c0,%edi 40057f: b8 00 00 00 00 mov $0x0,%eax 400584: e8 87 fe ff ff callq 400410 <printf plt=""> 400589: e9 95 61 00 00 jmpq 406723 <f x61d3=""> 40058e: be 01 00 00 00 mov $0x1,%esi 400593: bf c0 67 40 00 mov $0x4067c0,%edi 400598: b8 00 00 00 00 mov $0x0,%eax 40059d: e8 6e fe ff ff callq 400410 <printf plt=""> 4005a2: e9 7c 61 00 00 jmpq 406723 <f x61d3=""> 4005a7: be 02 00 00 00 mov $0x2,%esi 4005ac: bf c0 67 40 00 mov $0x4067c0,%edi 4005b1: b8 00 00 00 00 mov $0x0,%eax 4005b6: e8 55 fe ff ff callq 400410 <printf plt=""> 4005bb: e9 63 61 00 00 jmpq 406723 <f x61d3=""> ...
Second case - reversed enum values (first 1000, last 0) - output code generated by compiler is almost the same, only cases are reversed (listing omitted).
Third case - random enum values - compiler generates code that realizes binary search (without loop - unrolled):
0000000000400550 <f>: 400550: 55 push %rbp 400551: 48 89 e5 mov %rsp,%rbp 400554: 48 83 ec 10 sub $0x10,%rsp 400558: 89 7d fc mov %edi,-0x4(%rbp) 40055b: 8b 45 fc mov -0x4(%rbp),%eax 40055e: 3d a0 40 00 00 cmp $0x40a0,%eax 400563: 0f 84 46 3c 00 00 je 4041af <f x3c5f=""> 400569: 3d a0 40 00 00 cmp $0x40a0,%eax 40056e: 0f 8f e5 1b 00 00 jg 402159 <f x1c09=""> 400574: 3d 3e 20 00 00 cmp $0x203e,%eax 400579: 0f 84 e8 60 00 00 je 406667 <f x6117=""> 40057f: 3d 3e 20 00 00 cmp $0x203e,%eax 400584: 0f 8f dc 0d 00 00 jg 401366 <f xe16=""> 40058a: 3d 45 10 00 00 cmp $0x1045,%eax 40058f: 0f 84 e0 98 00 00 je 409e75 <f x9925=""> 400595: 3d 45 10 00 00 cmp $0x1045,%eax 40059a: 0f 8f dd 06 00 00 jg 400c7d <f x72d=""> ...
Above examples show that for the cases it has not much sens to "optimize" implementation.
Of course there are cases when compiler generated code is slower compared to hand-crafted, but implementation has to be carefully checked.
Additionally please note that compiler generated binary code is unpredictable (language standard does not define switch-case implementation rules). It is up to compiler developers to provide good solutions to different cases of values included in switch cases.
It is obvious output code depends on compiler, compiler version, optimization level, therefore in time-critical fragments output code shall be checked.
There is number of different techniques can be incorporated by compiler developers (see [1]): jump tables, range tests, balanced binary trees, bit tests, tabular methods, index mapping, condition codes, imperfect hashing, perfect hashing, safe hashing, conditional moves and superoptimization. Uff, still sure you can make it better than compiler?
Brak komentarzy:
Prześlij komentarz