As you may know, we recently brought Rolf Rolles on board the team here at Exodus. We all met at our Austin office and Rolf spent a week working alongside us. Our interview process doesn’t consist of contrived questions intended to observe the interviewee’s capacity for mental acrobatics. Traditionally, when we bring someone in for consideration we are already familiar with their past work and skillset. What we are more interested in is evaluating their capacity to work as part of our team. So, Rolf spent his time auditing code and writing some instrumentation tools for some of the problems we were facing at the time. It went very well, and we’re thrilled that he decided to join us.
One night during that week we were chatting with Rolf about random programming problems and he recalled the story of a past interview whereby he was asked to implement a strlen() function in C that, when compiled, would not contain any conditional branches. He didn’t pose the problem as a challenge but Brandon, Zef, and I all found it intriguing and took a shot at solving it. Leave it to Rolf Rolles to reverse the interview process itself…
Spoiler alert: what follows are our independently created solutions.
Brandon’s Solution:
#include <stdio.h>
#define f(b) ((-b)>>31)&1
typedef unsigned int (*funcptr)(unsigned int x);
funcptr functable[2];
unsigned char *p;
unsigned int done(unsigned int x)
{
return x;
}
unsigned int counter(unsigned int x)
{
return(functable[f(*(p+x+1))](x+1));
}
int main(int argc, char *argv[])
unsigned int len;
p = (unsigned char *)argv[argc-1];
functable[0] = (funcptr)&done;
functable[1] = counter;
len = functable[f(*p)](0);
printf("len is %u\n", len);
return 0;
}
Zef’s Solution:
/*
*
* strlen without conditional branch
* compiles with -Wall -ansi
*/
#include <stdio.h>
int _gtfo(char *s);
int _str_len(char *s);
int (*f[])(char *s) = {_gtfo, _str_len};
int _gtfo(char *s)
{
return -1; /* set to '0' to include trailing null */
}
int _str_len(char *s){
char c = *s;
return f[((c & 0x01))|
((c & 0x02) >> 1)|
((c & 0x04) >> 2)|
((c & 0x08) >> 3)|
((c & 0x10) >> 4)|
((c & 0x20) >> 5)|
((c & 0x40) >> 6)|
((c & 0x80) >> 7)](++s) +1 ;
}
int main(int argc, char *argv[])
{
if(argc > 1 ) printf("strlen(\"%s\") = %d\n", argv[1], _str_len(argv[1]));
return 0;
}
Zef’s description:
“So, my immediate thought was to use function pointers to ‘conditionally’ execute code without a conditional branch. There are two possible states for each member of a string when performing a ‘strlen’-type operation. ‘Terminator’ and ‘Not Terminator’. In this case the ‘Terminator’ for a C-string is ‘NULL’ (0×00). This of course is the only value with 0 bits set; by masking each bit in the 8 bit value and shifting to the lsb then combining the values with a ‘|’ operation, a binary state is created allowing for the indepedent execution of the two defined states ‘Terminator’ and ‘Not Terminator’”.
Aaron’s Solution:
As I admittedly suck at C, I approached the problem in straight assembly (I know, that’s cheating. And yes, this could be achieved with a rep scasb, but that’s just too easy). However, I was able to solve the problem in 27 bytes:
[BITS 32]
section .text
global _start
_start:
pop eax
pop eax
xor eax, eax
xor ebx, ebx
pop esi
_continue:
mov al, [esi]
add al, 0xFF
salc
inc al
lea ecx, [0x8048097+eax*4]
jmp ecx
inc ebx
inc esi
jmp _continue
int 0x80
The three pops that occur within _start are to get access to argv[1] (the string to be measured, provided on the command line). The last pop esi puts a pointer to the string into the esi register.
The mov al, [esi] grabs a single byte off the string. Then, the add al, 0xFF is used to determine whether the byte is NULL or not. If the value is non-NULL, the add to the 8-bit register al will set the Carry flag. If it is NULL, it will not set the CF.
The next instruction is actually considered undocumented (even objdump shows the mnemonic as ‘bad’). What the salc instruction does is sets the al register to 0xFF if the Carry flag is set, otherwise it sets it to 0×00. This is the method I used to implement a binary state to determine if the character is NULL or not.
The inc al instruction then increments al, which was either 0xFF or 0×00. After the inc it will either be 0×00 or 0×01.
The lea ecx, [0x8048097+eax*4] instruction loads into ecx either the address 0×8048097 or 0x804809b. These addresses are significant and can be observed by objdump’ing the assembled binary:
$ objdump -d strlen_no_conditionals -M intel
strlen_no_conditionals: file format elf32-i386
Disassembly of section .text:
08048080 :
8048080: 58 pop eax
8048081: 58 pop eax
8048082: 31 c0 xor eax,eax
8048084: 31 db xor ebx,ebx
8048086: 5e pop esi
08048087 :
8048087: 8a 06 mov al,BYTE PTR [esi]
8048089: 04 ff add al,0xff
804808b: d6 (bad)
804808c: fe c0 inc al
804808e: 8d 0c 85 97 80 04 08 lea ecx,[eax*4+0x8048097]
8048095: ff e1 jmp ecx
8048097: 43 inc ebx
8048098: 46 inc esi
8048099: eb ec jmp 8048087
804809b: cd 80 int 0x80
$
So, if the character is not NULL, the code will jmp ecx to 0×8048097 which increments the string length counter (ebx) and increments the string pointer (esi) and then branches unconditionally to _continue.
If the value was NULL, the jmp ecx will land directly at the int 0×80. As the size of the inc ebx and inc esi and jmp _continue is exactly 4 bytes, the lea instruction very conveniently can load either the address of the inc ebx or directly at the int 0×80, thus removing the need for any NOP-like instructions.
The last convenient optimization to note is that the int 0×80 will execute the syscall specified by the eax register. Well, because the result of the add/salc/inc condition will set eax to 1 only when a NULL is found, the int 0×80 will execute syscall #1 which on Linux is exit(). Additionally, the exit code is specified by the ebx register. That is why I used the ebx register as my counter to hold the string length. So, upon execution of the interrupt, the exit code will contain the length of the string as can be observed by running the assembled binary and inspecting the return value:
$ nasm strlen_no_conditionals.asm -f elf -o a.o
$ ld -o strlen_no_conditionals a.o
$ ./strlen_no_conditionals "ExodusIntel" ; echo $?
11
$ ./strlen_no_conditionals "should return 16" ; echo $?
16
$
Rolf’s Solution:
“Basically, the fundamental problem to overcome with this challenge is to ‘make a decision’ — that is to say, decide when to terminate the iteration upon reaching a NULL character — without using an explicit jcc-style conditional branch. A few minutes’ reflection upon this problem yields that we could use recursion into a function pointer table with 256 entries, where 255 of the entries increased some counter variable, and the entry at 0 terminates the procedure and returns the counter. In doing so, we have replaced all conditional jumps with one indexed, switch jump. Some further reflection provides the reduction of the table size from 256 entries down to two.”
typedef int (*ctr)(char *);
int func(char *);
int func_x(char *c) { return 1+func(c); }
int func_0(char *c) { return 0; }
ctr table[2] = { &func_0, &func_x };
int func(char *c) { return table[!!*c](c+1); }
If you’ve come up with an interesting approach, we’d love to see it. Feel free to leave a comment or some such.
–
Aaron Portnoy
@aaronportnoy
If a rep scasb is not considered as a branch instruction and we can cheat like Aaron by submitting an assembly solution (yes that’s a lot of conditions but gcc can’t produce rep scasb instruction even with th -Os option) then I’ve a solution in 18 bytes :P
[BITS 32]
section .text
global _start
_start:
pop eax
pop eax
xor eax, eax
xor ecx, ecx
dec ecx
pop edi
repne scasb
inc eax
mov ebx, ecx
not ebx
dec ebx
int 0×80
Well that is in fact kind of cheating! :P
And since you got to cheat, I get to be picky :D You don’t set the direction register, so your code could return something that’s not the length of the string!
Jokes aside, I like your solution. It’s the simplest strlen possible really.
I had a… different sort of idea. Forgive my terrible rusty C. https://gist.github.com/3746130
Only conditional jump generated is the assert() in main. Pretty sure this only has a hope of compiling on unix-likes with gcc, but otherwise it should work. :)
2 other **UGLY** solutions, first one use gcc variable pointers (not standard) second one is pure standard C and use a big switch and force the compiler to use an table.
Both compiled with -O3 flag produce function without conditional branches (the first one use conditional mov but it could be replaced by non conditional instructions …)
code : http://pastebin.com/WnjwbDzQ
Ok this is ugly hacks that may not work with other compiler / architecture but it doesn’t use asm inline :P.
25 bytes with cld, so 24 bytes without cld. AFAIR windows clears the direction flag before the entry point, dunno about linux :p
[BITS 32]
section .text
global _start
_start:
pop eax
pop ebx
pop esi
xor ebx, ebx
mov edx, _continue
lea ecx, [edx+_gtfo-_continue]
cld
_continue:
inc ebx
lodsb
test al, al
cmovz edx, ecx
jmp edx
_gtfo:
inc eax
dec ebx
int 0×80
Looks like 30 bytes by my count. What does objdump display on that?
08048080 :
8048080: 58 pop eax
8048081: 5b pop ebx
8048082: 5e pop esi
8048083: 31 db xor ebx,ebx
8048085: ba 91 80 04 08 mov edx,0×8048091
804808a: 8d 8a 09 00 00 00 lea ecx,[edx+0x9]
8048090: fc cld
08048091 :
8048091: 43 inc ebx
8048092: ac lods al,BYTE PTR ds:[esi]
8048093: 84 c0 test al,al
8048095: 0f 44 d1 cmove edx,ecx
8048098: ff e2 jmp edx
0804809a :
804809a: 40 inc eax
804809b: 4b dec ebx
804809c: cd 80 int 0×80
p@ubuntu:~/strlen$ objdump -d -M intel cmov
cmov: file format elf32-i386
Disassembly of section .text:
08048060 :
8048060: 58 pop eax
8048061: 5b pop ebx
8048062: 5e pop esi
8048063: 31 db xor ebx,ebx
8048065: ba 6e 80 04 08 mov edx,0x804806e
804806a: 8d 4a 09 lea ecx,[edx+0x9]
804806d: fc cld
0804806e :
804806e: 43 inc ebx
804806f: ac lods al,BYTE PTR ds:[esi]
8048070: 84 c0 test al,al
8048072: 0f 44 d1 cmove edx,ecx
8048075: ff e2 jmp edx
08048077 :
8048077: 40 inc eax
8048078: 4b dec ebx
8048079: cd 80 int 0×80
so it’s 27, (or 26 without cld) — forgot to add 2 last bytes :o
I think Aaron forgot to add the last two bytes too, so it’s 29 instead of 27.
What version of nasm do you have? NASM version 2.09.08 compiled on Apr 30 2011
I decided to cheat in a different way: by reimplimenting jz! Although that’s more being a smartass than actually solving the problem, but still! :)
Sorry about the formatting and stuff. Here’s a pastebin: http://pastebin.com/Z6MTgLTr
Thanks for the awesome mental exercise! It was really fun trying to figure out how to do this without traditional methods. :)
rude_strlen:
%define JMP_TABLE dword [ebp-8]
%define JMP_REG edi
%define JMP_ONE dword [JMP_REG]
%define JMP_ZERO dword [JMP_REG+4]
%macro jayz 1
test %1,%1 ; set/don’t set the zero flag
pushf ; push EFLAGS onto the stack
shr dword [esp],6 ; shift until the zeroflag is the end bit
and dword [esp],1 ; get the zero flag
pop %1 ; now it’s an offset to our jumptable!
jmp dword [edi+%1*4] ; and now we have a jz instruction! :D
%endmacro
push ebp
mov ebp,esp
sub esp,8 ; for our jumptable. hee hee.
push JMP_REG
push esi
xor eax,eax
lea JMP_REG,JMP_TABLE
mov JMP_ONE,str_one
mov JMP_ZERO,str_zero
add JMP_ONE,0×401000 ; I’m a windows person… sorry! :D
add JMP_ZERO,0×401000
mov esi,[ebp+8] ; string pointer!
str_loop:
movzx ecx,byte [esi]
jayz ecx ; THIS JUMP CRAY
str_one:
inc eax
inc esi
jmp str_loop
str_zero:
pop esi
pop edi
add esp,8
pop ebp
retn 4
In many C compilers your can store the address of some labels in an array, do this in a for/while loop, where you add one to an accumulator, and use goto to jump out of the loop when you are done. See the js0n JSO parser which gave me the inspiration. (it’s on github). This avoids stack consumption with recursive calls, though with optimization it may go away anyhow.
#include
int main(int argc, char** argv) {
char * string = argv[argc-1];
void *states[] = {&&done, &&cont};
int size = 0;
while(1) {
goto *states[!!*(string+size)];
cont: ++size;
}
done:
printf(“%s is %d bytes\n”, string, size);
return 0;
}
Sleazy cheating in assembly language:
strlen proc near
mov esi, [esp+4]
xor ecx, ecx
@loop:
lodsb
inc ecx
test al, al
pushf
pop ebx
movzx ebx, bl
and bl, 40h
shr bl, 6
jmp dword ptr tbl[ebx*4]
@ret:
mov eax, ecx
dec eax
retn 4
tbl dd offset @loop, offset @ret
strlen endp
Can do it without recursion if you don’t mind breaking a few taboos…
int mystrlen(const char *s) {
void *my_labels[] = {&&loop, &&done};
int result = -1;
loop:
++result;
goto *my_labels[*(s++) == ''];
done:
return result;
}
Works on GCC 4.6.3 on Ubuntu on x86-64. Not sure how portable that bool->int work is, but it consistently works as expected on Intel gear.
My last comment was messed up by the blog software. The boolean test was written against a NULL byte, but the escaped zero got filtered somehow.
From an anonymous contributor:
“My personal goal here was to do it without an additional
function call and not in asm.
#include //for printf
unsigned int no_cond_branch_strlen(char *s) {
void *jmp_table[] = {&&end, &&increment};
unsigned int length = 0;
char num_ones;
while(1){
// calculate the number of ones in a byte
num_ones = *(s+length) – ((*(s+length) >> 1) & ‘\x55′);
num_ones = (num_ones & ‘\x33′) + ((num_ones >> 2) & ‘\x33′);
num_ones = (num_ones + (num_ones >> 4) & ‘\x0f’);
goto *jmp_table[ num_ones > 0 ];
increment:
length++;
}
end:
return length;
}
int
main(int argc, char **argv)
{
printf(“length of %s is %d\n”, argv[1], no_cond_branch_strlen(argv[1]));
return 0;
}
This will only work for gcc. To make this work in msvc you’ll need to
use inline asm to get the address of the labels via something like
http://stackoverflow.com/questions/6421433/address-of-labels-msvc“
This is what I came up with. Written before reading any of your solutions. Fun challenge!
int finish(const char *s) { return -1; }
int length(const char *s) {
int (*functions[])(const char*) = { length, finish };
return 1 + functions[!*s](s + 1);
}
Quick clarification – Would bool result = a < b; be considered a conditional? I mean it is a comparision and not a conditional.
Pingback: strlen without conditionals « mlindgren.ca
Here’s mine. It’s pretty much the same as the ones in the original post.
#include
int mystrlen(char *str, int count);
int mystrlenret(char *str, int count);
int (*funcs[2])(char *, int) = {mystrlen, mystrlenret};
int mystrlen(char *str, int count)
{
return funcs[(int)(!*str)](++str, ++count);
}
int mystrlenret(char *str, int count)
{
return count – 1;
}
int main(int argc, char *argv[])
{
printf(“%i\n”, mystrlen(argv[argc - 1], 0));
}
int strlen(char *str) { int n = 0; return (*str && (n = strlen(str + 1) + 1), n); }
Oops. That will most likely still compile to a branch.
Doesn’t using ‘!!’ effectively introduce conditional branches in the compiled code?
It seems Rofl’s solution can be improved upon slightly: you can eliminate the need to negate *c twice by reversing the function pointer table:
ctr table[2] = { &func_x, &func_0 };
int func(char *c) { return table[!*c](c+1); }
Nitpicky,I know, but I just thought I’d point it out anyway. :)
This one is ugly, convoluted, inefficient, non-threadsafe, and only works on POSIX systems :-)
#include
#include
#include
#include
static sigjmp_buf fpe_env;
static void _fpe_handler(int signal, siginfo_t *w, void *uap) {
siglongjmp(fpe_env, w->si_code);
}
int _done(const char * buffer, int * ptr_length) {
int length = *ptr_length;
free(ptr_length);
return length;
}
int _not_done(const char * buffer, int * ptr_length) {
struct sigaction act;
act.sa_sigaction = _fpe_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
sigaction(SIGFPE, &act, NULL);
int accum = 0;
*ptr_length = 0;
startloop:
accum += 255 / *buffer;
buffer++;
(*ptr_length)++;
goto startloop;
}
int strlen0(const char * buffer) {
int * ptr_length = (int*)malloc(sizeof(int));
int (*workers[])(const char *, int * ptr_length) = { _not_done, _done };
return workers[sigsetjmp(fpe_env, 1)](buffer, ptr_length);
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stdout, “usage: %s STRING\n”, argv[0]);
} else {
const char * buffer = (const char *)argv[1];
int length = strlen0(buffer);
fprintf(stdout, “strlen(%s) = %d\n”, buffer, length);
}
return 0;
}
If someone asked me to implement a strlen() function in C that, when compiled, would not contain any conditional branches, I’d promptly hire a subcontractor. :-)
Looks like someone else thought to count by fail as well… Cheers.
#include
#include
#include
int count = 0;
void weee(int signal)
{
printf(“count %i\n”,count);
exit(0);
}
int main()
{
char* string =”This should be 17″;
signal(SIGFPE,weee);
bad:
*string = *string/(*string++);
count++;
goto bad;
return 42;
}
Tail-recursive version (i.e. the compiler replaces the call with an unconditional jump):
typedef int (*ctr)(char *, int);
static int func_0(char *c, int len) { return len-1; }
static int func(char *c, int len)
{
static ctr table[2] = { &func_0, &func };
return table[!!*c](c+1, len+1);
}
int str_len(char *c) { return func(c, 0); }
This one involves C++ exception handling and compiled with MSVC.
int div0_strlen(char* str)
{
char* str_end = str;
__try {
for(int trash;;*str_end++, trash = *str_end / *str_end) {}
} __except (EXCEPTION_EXECUTE_HANDLER) {}
return str_end – str;
}
int main(int argc, char* argv[])
{
printf(“len = %d\n”, div0_strlen(“test string”));
return 0;
}
I think everyone has missed the point of the question. Every one of these solutions just find a way to trick the compiler (or CPU) to make the decision for you. Tail recursion and computed jumps are technicalities to avoid jcc or equivalent, as is evident when the compiler transforms your beautiful Ocaml style tail recursion back into a loop Rolf!
The only real way to solve this without making a decision at any point is something like:
int crazy_strlen(char *str) {
int non_null = 1;
int count = 0;
non_null &= !!str[0]; count += non_null;
non_null &= !!str[1]; count += non_null;
non_null &= !!str[2]; count += non_null;
non_null &= !!str[3]; count += non_null;
…
non_null &= !!str[MAX_LEN]; count += non_null;
return count;
}
Of course this is just because all the decisions (ie. max string length) have been made before compilation.
This will almost certainly segfault. Your code doesn’t stop counting at the NUL terminator; it will merrily trample onwards through the rest of the machine’s address space until it hits something it’s not allowed to read.
Correct. I’ve been spending most of my time in environments without memory protection hardware but my point stands. Besides there are ways around this eg. allocate a separate page for each string per page and set MAX_LEN to the page size.
Every other solution has an explicit loop, just not actually using the branch instructions. If this is the criterion then the simplest solution on x86 is just “rep scasb” since there is no explicit branch or loop here, the loop is performed in microcode.
(Also observe that the result is incorrect: at least one byte after the terminating NUL is very likely to be truthy.)
? After the first NUL non_null is 0 and therefore count will not change again.
Of course although the previous version is data storage optimal, it is not time optimal unless string length happens to equal MAX_LEN!
If we take no “conditional branches” to mean no loops we can build some logic to perform this operation optimally (and introduce it as an instruction in our soft-core processor):
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
– Calculate string length, fetching string from asynchronous memory
entity strlen is
port
(
clk : in std_logic;
— Start strlen calculation
start : in std_logic;
— String memory address
str_addr : in std_logic_vector(31 downto 0);
– Memory
mem_addr : out std_logic_vector(31 downto 0);
mem_data : in std_logic_vector(7 downto 0);
mem_read : out std_logic;
— Result
count : out std_logic_vector(31 downto 0);
done : out std_logic
);
end entity strlen;
architecture SYN of strlen is
signal incr : unsigned(count’high downto count’low);
begin
count_proc : process (clk, start)
variable count_u : unsigned(count’high downto count’low);
variable mem_addr_u : unsigned(mem_addr’high downto mem_addr’low);
begin
– Note for timing reasons start should be synchronous with clk even though
– it is used as async reset
if start = ’1′ then
count_u := (others => ’0′);
done <= '0';
incr <= to_unsigned(1, incr'length);
mem_addr_u <= unsigned(str_addr);
elsif rising_edge(clk) then
incr(0) <= incr(0) and (mem_data(0) or mem_data(1) or mem_data(2) or mem_data(3) or
mem_data(4) or mem_data(5) or mem_data(6) or mem_data(7));
count_u := count_u + incr;
end if;
count <= std_logic_vector(count_u);
mem_addr <= std_logic_vector(mem_addr_u);
end process;
done <= not incr(0);
end architecture SYN;