Reversing the Interview Process

As you may know, we recently brought Rolf Rolles on board the team here at Exodus. We all met at our Austin office and Rolf spent a week working alongside us. Our interview process doesn’t consist of contrived questions intended to observe the interviewee’s capacity for mental acrobatics. Traditionally, when we bring someone in for consideration we are already familiar with their past work and skillset. What we are more interested in is evaluating their capacity to work as part of our team. So, Rolf spent his time auditing code and writing some instrumentation tools for some of the problems we were facing at the time. It went very well, and we’re thrilled that he decided to join us.

One night during that week we were chatting with Rolf about random programming problems and he recalled the story of a past interview whereby he was asked to implement a strlen() function in C that, when compiled, would not contain any conditional branches. He didn’t pose the problem as a challenge but Brandon, Zef, and I all found it intriguing and took a shot at solving it. Leave it to Rolf Rolles to reverse the interview process itself…

Spoiler alert: what follows are our independently created solutions.


Brandon’s Solution:


#include <stdio.h>
#define f(b) ((-b)>>31)&1
typedef unsigned int (*funcptr)(unsigned int x);
funcptr functable[2];
unsigned char *p;
unsigned int done(unsigned int x)
{
    return x;
}
unsigned int counter(unsigned int x)
{
    return(functable[f(*(p+x+1))](x+1));
}
int  main(int argc, char *argv[])
    unsigned int len;
    p = (unsigned char *)argv[argc-1];
    functable[0] = (funcptr)&done;
    functable[1] = counter;
    len = functable[f(*p)](0);
    printf("len is %u\n", len);
    return 0;
}


Zef’s Solution:


/*
*
* strlen without conditional branch
* compiles with -Wall -ansi
*/

#include <stdio.h>

int _gtfo(char *s);
int _str_len(char *s);
int (*f[])(char *s) = {_gtfo, _str_len};

int _gtfo(char *s)
{
    return -1; /* set to '0' to include trailing null */
}

int _str_len(char *s){
    char c = *s;
    return f[((c & 0x01))|
    ((c & 0x02) >> 1)|
    ((c & 0x04) >> 2)|
    ((c & 0x08) >> 3)|
    ((c & 0x10) >> 4)|
    ((c & 0x20) >> 5)|
    ((c & 0x40) >> 6)|
    ((c & 0x80) >> 7)](++s) +1 ;

}

int main(int argc, char *argv[])
{
    if(argc > 1 ) printf("strlen(\"%s\") = %d\n", argv[1], _str_len(argv[1]));
    return 0;
}

Zef’s description:

“So, my immediate thought was to use function pointers to ‘conditionally’ execute code without a conditional branch. There are two possible states for each member of a string when performing a ‘strlen’-type operation. ‘Terminator’ and ‘Not Terminator’. In this case the ‘Terminator’ for a C-string is ‘NULL’ (0×00). This of course is the only value with 0 bits set; by masking each bit in the 8 bit value and shifting to the lsb then combining the values with a ‘|’ operation, a binary state is created allowing for the indepedent execution of the two defined states ‘Terminator’ and ‘Not Terminator’”.


Aaron’s Solution:

As I admittedly suck at C, I approached the problem in straight assembly (I know, that’s cheating. And yes, this could be achieved with a rep scasb, but that’s just too easy). However, I was able to solve the problem in 27 bytes:


[BITS 32]

section .text

global _start

_start:
    pop eax
    pop eax
    xor eax, eax
    xor ebx, ebx
    pop esi

_continue:
    mov al, [esi]
    add al, 0xFF
    salc
    inc al
    lea ecx, [0x8048097+eax*4]
    jmp ecx
inc ebx
inc esi
jmp _continue
int 0x80

The three pops that occur within _start are to get access to argv[1] (the string to be measured, provided on the command line). The last pop esi puts a pointer to the string into the esi register.

The mov al, [esi] grabs a single byte off the string. Then, the add al, 0xFF is used to determine whether the byte is NULL or not. If the value is non-NULL, the add to the 8-bit register al will set the Carry flag. If it is NULL, it will not set the CF.

The next instruction is actually considered undocumented (even objdump shows the mnemonic as ‘bad’). What the salc instruction does is sets the al register to 0xFF if the Carry flag is set, otherwise it sets it to 0×00. This is the method I used to implement a binary state to determine if the character is NULL or not.

The inc al instruction then increments al, which was either 0xFF or 0×00. After the inc it will either be 0×00 or 0×01.

The lea ecx, [0x8048097+eax*4] instruction loads into ecx either the address 0×8048097 or 0x804809b. These addresses are significant and can be observed by objdump’ing the assembled binary:


$ objdump -d strlen_no_conditionals -M intel

strlen_no_conditionals: file format elf32-i386

Disassembly of section .text:

08048080 :
 8048080:       58                      pop    eax
 8048081:       58                      pop    eax
 8048082:       31 c0                   xor    eax,eax
 8048084:       31 db                   xor    ebx,ebx
 8048086:       5e                      pop    esi

08048087 :
 8048087:       8a 06                   mov    al,BYTE PTR [esi]
 8048089:       04 ff                   add    al,0xff
 804808b:       d6                      (bad)
 804808c:       fe c0                   inc    al
 804808e:       8d 0c 85 97 80 04 08    lea    ecx,[eax*4+0x8048097]
 8048095:       ff e1                   jmp    ecx
 8048097:       43                      inc    ebx
 8048098:       46                      inc    esi
 8048099:       eb ec                   jmp    8048087 
 804809b:       cd 80                   int    0x80
$

So, if the character is not NULL, the code will jmp ecx to 0×8048097 which increments the string length counter (ebx) and increments the string pointer (esi) and then branches unconditionally to _continue.

If the value was NULL, the jmp ecx will land directly at the int 0×80. As the size of the inc ebx and inc esi and jmp _continue is exactly 4 bytes, the lea instruction very conveniently can load either the address of the inc ebx or directly at the int 0×80, thus removing the need for any NOP-like instructions.

The last convenient optimization to note is that the int 0×80 will execute the syscall specified by the eax register. Well, because the result of the add/salc/inc condition will set eax to 1 only when a NULL is found, the int 0×80 will execute syscall #1 which on Linux is exit(). Additionally, the exit code is specified by the ebx register. That is why I used the ebx register as my counter to hold the string length. So, upon execution of the interrupt, the exit code will contain the length of the string as can be observed by running the assembled binary and inspecting the return value:


$ nasm strlen_no_conditionals.asm -f elf -o a.o
$ ld -o strlen_no_conditionals a.o
$ ./strlen_no_conditionals "ExodusIntel" ; echo $?
11
$ ./strlen_no_conditionals "should return 16" ; echo $?
16
$


Rolf’s Solution:

“Basically, the fundamental problem to overcome with this challenge is to ‘make a decision’ — that is to say, decide when to terminate the iteration upon reaching a NULL character — without using an explicit jcc-style conditional branch. A few minutes’ reflection upon this problem yields that we could use recursion into a function pointer table with 256 entries, where 255 of the entries increased some counter variable, and the entry at 0 terminates the procedure and returns the counter. In doing so, we have replaced all conditional jumps with one indexed, switch jump. Some further reflection provides the reduction of the table size from 256 entries down to two.”


typedef int (*ctr)(char *);
int func(char *);
int func_x(char *c) { return 1+func(c); }
int func_0(char *c) { return 0; }
ctr table[2] = { &func_0, &func_x };
int func(char *c) { return table[!!*c](c+1); }


If you’ve come up with an interesting approach, we’d love to see it. Feel free to leave a comment or some such.


Aaron Portnoy
@aaronportnoy

36 thoughts on “Reversing the Interview Process

  1. If a rep scasb is not considered as a branch instruction and we can cheat like Aaron by submitting an assembly solution (yes that’s a lot of conditions but gcc can’t produce rep scasb instruction even with th -Os option) then I’ve a solution in 18 bytes :P

    [BITS 32]

    section .text

    global _start

    _start:
    pop eax
    pop eax
    xor eax, eax
    xor ecx, ecx
    dec ecx
    pop edi
    repne scasb
    inc eax
    mov ebx, ecx
    not ebx
    dec ebx
    int 0×80

  2. Well that is in fact kind of cheating! :P

    And since you got to cheat, I get to be picky :D You don’t set the direction register, so your code could return something that’s not the length of the string!

    Jokes aside, I like your solution. It’s the simplest strlen possible really.

  3. 2 other **UGLY** solutions, first one use gcc variable pointers (not standard) second one is pure standard C and use a big switch and force the compiler to use an table.
    Both compiled with -O3 flag produce function without conditional branches (the first one use conditional mov but it could be replaced by non conditional instructions …)

    code : http://pastebin.com/WnjwbDzQ

    Ok this is ugly hacks that may not work with other compiler / architecture but it doesn’t use asm inline :P.

  4. 25 bytes with cld, so 24 bytes without cld. AFAIR windows clears the direction flag before the entry point, dunno about linux :p

    [BITS 32]

    section .text

    global _start

    _start:
    pop eax
    pop ebx
    pop esi
    xor ebx, ebx
    mov edx, _continue
    lea ecx, [edx+_gtfo-_continue]
    cld
    _continue:
    inc ebx
    lodsb
    test al, al
    cmovz edx, ecx
    jmp edx
    _gtfo:
    inc eax
    dec ebx
    int 0×80

    • Looks like 30 bytes by my count. What does objdump display on that?

      08048080 :
      8048080: 58 pop eax
      8048081: 5b pop ebx
      8048082: 5e pop esi
      8048083: 31 db xor ebx,ebx
      8048085: ba 91 80 04 08 mov edx,0×8048091
      804808a: 8d 8a 09 00 00 00 lea ecx,[edx+0x9]
      8048090: fc cld

      08048091 :
      8048091: 43 inc ebx
      8048092: ac lods al,BYTE PTR ds:[esi]
      8048093: 84 c0 test al,al
      8048095: 0f 44 d1 cmove edx,ecx
      8048098: ff e2 jmp edx

      0804809a :
      804809a: 40 inc eax
      804809b: 4b dec ebx
      804809c: cd 80 int 0×80

      • p@ubuntu:~/strlen$ objdump -d -M intel cmov

        cmov: file format elf32-i386

        Disassembly of section .text:

        08048060 :
        8048060: 58 pop eax
        8048061: 5b pop ebx
        8048062: 5e pop esi
        8048063: 31 db xor ebx,ebx
        8048065: ba 6e 80 04 08 mov edx,0x804806e
        804806a: 8d 4a 09 lea ecx,[edx+0x9]
        804806d: fc cld

        0804806e :
        804806e: 43 inc ebx
        804806f: ac lods al,BYTE PTR ds:[esi]
        8048070: 84 c0 test al,al
        8048072: 0f 44 d1 cmove edx,ecx
        8048075: ff e2 jmp edx

        08048077 :
        8048077: 40 inc eax
        8048078: 4b dec ebx
        8048079: cd 80 int 0×80

        so it’s 27, (or 26 without cld) — forgot to add 2 last bytes :o

  5. I decided to cheat in a different way: by reimplimenting jz! Although that’s more being a smartass than actually solving the problem, but still! :)

    Sorry about the formatting and stuff. Here’s a pastebin: http://pastebin.com/Z6MTgLTr

    Thanks for the awesome mental exercise! It was really fun trying to figure out how to do this without traditional methods. :)

    rude_strlen:
    %define JMP_TABLE dword [ebp-8]
    %define JMP_REG edi
    %define JMP_ONE dword [JMP_REG]
    %define JMP_ZERO dword [JMP_REG+4]

    %macro jayz 1
    test %1,%1 ; set/don’t set the zero flag
    pushf ; push EFLAGS onto the stack
    shr dword [esp],6 ; shift until the zeroflag is the end bit
    and dword [esp],1 ; get the zero flag
    pop %1 ; now it’s an offset to our jumptable!
    jmp dword [edi+%1*4] ; and now we have a jz instruction! :D
    %endmacro

    push ebp
    mov ebp,esp
    sub esp,8 ; for our jumptable. hee hee.

    push JMP_REG
    push esi

    xor eax,eax
    lea JMP_REG,JMP_TABLE
    mov JMP_ONE,str_one
    mov JMP_ZERO,str_zero
    add JMP_ONE,0×401000 ; I’m a windows person… sorry! :D
    add JMP_ZERO,0×401000
    mov esi,[ebp+8] ; string pointer!

    str_loop:
    movzx ecx,byte [esi]
    jayz ecx ; THIS JUMP CRAY

    str_one:
    inc eax
    inc esi
    jmp str_loop

    str_zero:
    pop esi
    pop edi
    add esp,8
    pop ebp
    retn 4

  6. In many C compilers your can store the address of some labels in an array, do this in a for/while loop, where you add one to an accumulator, and use goto to jump out of the loop when you are done. See the js0n JSO parser which gave me the inspiration. (it’s on github). This avoids stack consumption with recursive calls, though with optimization it may go away anyhow.

    • #include
      int main(int argc, char** argv) {
      char * string = argv[argc-1];
      void *states[] = {&&done, &&cont};
      int size = 0;

      while(1) {
      goto *states[!!*(string+size)];
      cont: ++size;
      }
      done:
      printf(“%s is %d bytes\n”, string, size);

      return 0;
      }

  7. Sleazy cheating in assembly language:

    strlen proc near
    mov esi, [esp+4]
    xor ecx, ecx
    @loop:
    lodsb
    inc ecx
    test al, al
    pushf
    pop ebx
    movzx ebx, bl
    and bl, 40h
    shr bl, 6
    jmp dword ptr tbl[ebx*4]
    @ret:
    mov eax, ecx
    dec eax
    retn 4
    tbl dd offset @loop, offset @ret
    strlen endp

  8. Can do it without recursion if you don’t mind breaking a few taboos…

    int mystrlen(const char *s) {
    void *my_labels[] = {&&loop, &&done};

    int result = -1;

    loop:
    ++result;
    goto *my_labels[*(s++) == ''];

    done:
    return result;
    }

    Works on GCC 4.6.3 on Ubuntu on x86-64. Not sure how portable that bool->int work is, but it consistently works as expected on Intel gear.

  9. My last comment was messed up by the blog software. The boolean test was written against a NULL byte, but the escaped zero got filtered somehow.

  10. From an anonymous contributor:

    “My personal goal here was to do it without an additional
    function call and not in asm.

    #include //for printf

    unsigned int no_cond_branch_strlen(char *s) {
    void *jmp_table[] = {&&end, &&increment};
    unsigned int length = 0;
    char num_ones;

    while(1){
    // calculate the number of ones in a byte
    num_ones = *(s+length) – ((*(s+length) >> 1) & ‘\x55′);
    num_ones = (num_ones & ‘\x33′) + ((num_ones >> 2) & ‘\x33′);
    num_ones = (num_ones + (num_ones >> 4) & ‘\x0f’);
    goto *jmp_table[ num_ones > 0 ];
    increment:
    length++;
    }
    end:
    return length;
    }

    int
    main(int argc, char **argv)
    {
    printf(“length of %s is %d\n”, argv[1], no_cond_branch_strlen(argv[1]));
    return 0;
    }

    This will only work for gcc. To make this work in msvc you’ll need to
    use inline asm to get the address of the labels via something like
    http://stackoverflow.com/questions/6421433/address-of-labels-msvc

  11. This is what I came up with. Written before reading any of your solutions. Fun challenge!

    int finish(const char *s) { return -1; }
    int length(const char *s) {
    int (*functions[])(const char*) = { length, finish };
    return 1 + functions[!*s](s + 1);
    }

  12. Pingback: strlen without conditionals « mlindgren.ca
  13. Here’s mine. It’s pretty much the same as the ones in the original post.

    #include

    int mystrlen(char *str, int count);
    int mystrlenret(char *str, int count);

    int (*funcs[2])(char *, int) = {mystrlen, mystrlenret};

    int mystrlen(char *str, int count)
    {
    return funcs[(int)(!*str)](++str, ++count);
    }

    int mystrlenret(char *str, int count)
    {
    return count – 1;
    }

    int main(int argc, char *argv[])
    {
    printf(“%i\n”, mystrlen(argv[argc - 1], 0));
    }

  14. It seems Rofl’s solution can be improved upon slightly: you can eliminate the need to negate *c twice by reversing the function pointer table:

    ctr table[2] = { &func_x, &func_0 };
    int func(char *c) { return table[!*c](c+1); }

    Nitpicky,I know, but I just thought I’d point it out anyway. :)

  15. This one is ugly, convoluted, inefficient, non-threadsafe, and only works on POSIX systems :-)

    #include
    #include
    #include
    #include

    static sigjmp_buf fpe_env;
    static void _fpe_handler(int signal, siginfo_t *w, void *uap) {
    siglongjmp(fpe_env, w->si_code);
    }

    int _done(const char * buffer, int * ptr_length) {
    int length = *ptr_length;
    free(ptr_length);
    return length;
    }

    int _not_done(const char * buffer, int * ptr_length) {
    struct sigaction act;
    act.sa_sigaction = _fpe_handler;
    sigemptyset(&act.sa_mask);
    act.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &act, NULL);
    int accum = 0;
    *ptr_length = 0;
    startloop:
    accum += 255 / *buffer;
    buffer++;
    (*ptr_length)++;
    goto startloop;
    }

    int strlen0(const char * buffer) {
    int * ptr_length = (int*)malloc(sizeof(int));
    int (*workers[])(const char *, int * ptr_length) = { _not_done, _done };
    return workers[sigsetjmp(fpe_env, 1)](buffer, ptr_length);
    }

    int main(int argc, char **argv) {
    if (argc != 2) {
    fprintf(stdout, “usage: %s STRING\n”, argv[0]);
    } else {
    const char * buffer = (const char *)argv[1];
    int length = strlen0(buffer);
    fprintf(stdout, “strlen(%s) = %d\n”, buffer, length);
    }
    return 0;
    }

  16. Looks like someone else thought to count by fail as well… Cheers.

    #include
    #include
    #include

    int count = 0;

    void weee(int signal)
    {
    printf(“count %i\n”,count);
    exit(0);
    }

    int main()
    {
    char* string =”This should be 17″;

    signal(SIGFPE,weee);

    bad:
    *string = *string/(*string++);
    count++;
    goto bad;

    return 42;
    }

  17. Tail-recursive version (i.e. the compiler replaces the call with an unconditional jump):

    typedef int (*ctr)(char *, int);
    static int func_0(char *c, int len) { return len-1; }

    static int func(char *c, int len)
    {
    static ctr table[2] = { &func_0, &func };
    return table[!!*c](c+1, len+1);
    }

    int str_len(char *c) { return func(c, 0); }

  18. This one involves C++ exception handling and compiled with MSVC.

    int div0_strlen(char* str)
    {
    char* str_end = str;

    __try {
    for(int trash;;*str_end++, trash = *str_end / *str_end) {}
    } __except (EXCEPTION_EXECUTE_HANDLER) {}

    return str_end – str;
    }

    int main(int argc, char* argv[])
    {
    printf(“len = %d\n”, div0_strlen(“test string”));

    return 0;
    }

  19. I think everyone has missed the point of the question. Every one of these solutions just find a way to trick the compiler (or CPU) to make the decision for you. Tail recursion and computed jumps are technicalities to avoid jcc or equivalent, as is evident when the compiler transforms your beautiful Ocaml style tail recursion back into a loop Rolf!

    The only real way to solve this without making a decision at any point is something like:

    int crazy_strlen(char *str) {
    int non_null = 1;
    int count = 0;

    non_null &= !!str[0]; count += non_null;
    non_null &= !!str[1]; count += non_null;
    non_null &= !!str[2]; count += non_null;
    non_null &= !!str[3]; count += non_null;

    non_null &= !!str[MAX_LEN]; count += non_null;

    return count;
    }

    Of course this is just because all the decisions (ie. max string length) have been made before compilation.

    • This will almost certainly segfault. Your code doesn’t stop counting at the NUL terminator; it will merrily trample onwards through the rest of the machine’s address space until it hits something it’s not allowed to read.

      • Correct. I’ve been spending most of my time in environments without memory protection hardware but my point stands. Besides there are ways around this eg. allocate a separate page for each string per page and set MAX_LEN to the page size.

        Every other solution has an explicit loop, just not actually using the branch instructions. If this is the criterion then the simplest solution on x86 is just “rep scasb” since there is no explicit branch or loop here, the loop is performed in microcode.

  20. Of course although the previous version is data storage optimal, it is not time optimal unless string length happens to equal MAX_LEN!

    If we take no “conditional branches” to mean no loops we can build some logic to perform this operation optimally (and introduce it as an instruction in our soft-core processor):

    library ieee;
    use ieee.std_logic_1164.all;
    use ieee.numeric_std.all;

    – Calculate string length, fetching string from asynchronous memory
    entity strlen is
    port
    (
    clk : in std_logic;

    — Start strlen calculation
    start : in std_logic;
    — String memory address
    str_addr : in std_logic_vector(31 downto 0);

    – Memory
    mem_addr : out std_logic_vector(31 downto 0);
    mem_data : in std_logic_vector(7 downto 0);
    mem_read : out std_logic;

    — Result
    count : out std_logic_vector(31 downto 0);
    done : out std_logic
    );

    end entity strlen;

    architecture SYN of strlen is
    signal incr : unsigned(count’high downto count’low);
    begin

    count_proc : process (clk, start)
    variable count_u : unsigned(count’high downto count’low);
    variable mem_addr_u : unsigned(mem_addr’high downto mem_addr’low);
    begin
    – Note for timing reasons start should be synchronous with clk even though
    – it is used as async reset
    if start = ’1′ then
    count_u := (others => ’0′);
    done <= '0';
    incr <= to_unsigned(1, incr'length);
    mem_addr_u <= unsigned(str_addr);
    elsif rising_edge(clk) then
    incr(0) <= incr(0) and (mem_data(0) or mem_data(1) or mem_data(2) or mem_data(3) or
    mem_data(4) or mem_data(5) or mem_data(6) or mem_data(7));
    count_u := count_u + incr;
    end if;

    count <= std_logic_vector(count_u);
    mem_addr <= std_logic_vector(mem_addr_u);
    end process;

    done <= not incr(0);

    end architecture SYN;

  21. #include

    int return_cnt(char *s);
    int str_len1(char *s);
    int str_len(char *s);

    int (*fn[])() = {
    return_cnt,
    str_len1
    };
    int cnt = 0;

    int
    str_len1(char *s)
    {
    int c = !!*s;

    cnt += c;
    return fn[c](s+1);
    }

    int
    return_cnt(char *s)
    {
    return cnt;
    }

    int
    str_len(char *s)
    {
    int c = !!*s;
    cnt = c;

    return fn[c](s+1);
    }

    int main(int argc, char **argv)
    {
    if (argc > 1)
    argv++;

    printf(“%i\n”, str_len(*argv));
    exit(1);
    }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s