Sunday, February 28, 2016

Tweetable Brainfuck Interpreter in C

Background

    Brainfuck is a turing complete esoteric programming language created by Urban Müller in 1993. The language is made up of only eight instructions that easily translate into C. A Hello World program could look something like:

    --[>--->->->++>-<<<<<-------]>--.>---------.>
    --..+++.>----.>+++++++++.<<.+++.------.<-.>>+.


    Urban’s primary goal was to create a language that could be compiled with the smallest possible compiler. Several small compilers has been developed over the years. The smallest compiler is less than 100 bytes of binary code.

    In addition to create small compiler binaries, it has become a sport to write interpreters in as few bytes of source code as possible, often referred to as code golfing. This article describes a tweetable implementation written in the C language.

    However, it may be good to tell up front, that it is only a tweetable method. Although fully functional in 140 characters of code, it needs to be called by a separate main method. Extending the method to a fully compilable program requires 151 characters of code which places the implementation among the shortest presented to date.


Language Overview

    A brainfuck program is a sequence of commands and an instruction pointer keeps track of the instruction to execute next. The program starts at the first instruction and terminates when the last instruction is executed. The language uses an array of memory cells that can be modified by the program. In addition the language supports reading data from standard in and writing data to standard out.

Program environment:
char cells[infinitely large] = { 0 };
char* cell_ptr = cells;

Brainfuck Command     C equivalent code
          > ++cell_ptr;
          < --cell_ptr;
          + ++*cell_ptr;
          - --*cell_ptr;
          . putchar(*cell_ptr);
          , *cell_ptr = getchar();
          [ while(*cell_ptr) {
          ] }


Golfing the Interpreter

    To start off, let's begin with a fairly standard brainfuck interpreter implementation:

    int m[30000], *d=m, t;
    void g(char *p) {
      for (; *p != 0 ; p++) {
        switch (*p) {
        case '>': ++d; break;
        case '<': --d; break;
        case '+': ++*d; break;
        case '-': --*d; break;
        case '.': putchar(*d); break;
        case ',': *d = getchar(); break;
        case '[':
          if (*d == 0) {
            for (t = 1; t != 0;) {
              ++p;
              if (*p == '[') ++t;
              else if (*p == ']') --t;
            }
          }
          break;
        case ']':
          for (t = 1; t != 0;) {
            --p;
            if (*p == '[') --t;
            else if (*p == ']') ++t;
          }
          --p;
          break;
        }
      }
    }


    There are multiple approaches to reduce the number of code characters. Some implementations use recursive calls to the method, some tries to eliminate / combine for loops, some use modulus on the ascii value of the instructions and look at bit patterns, and some use the question mark operator and implicit conversions from boolean expressions to integers.

    All these techniques are great for code golfing, although the implementation described here uses almost none of these. There are a few places where these techniques actually do help, but the main code size reduction is done in a different way.

    There are four groups of brainfuck commands, memory cell increment/decrement, cell pointer increment/decrement, input/output, and loops. An interesting observation is that the ascii codes of the two commands in each group differs by 2. The input and output needs to be handled by special casing, but we could implement the increment/decrement the same way, for example:

    if (*p == ‘>’) d++; else if (*p == ‘<’) d--;
    if (*p == 62) d++; else if (*p == 60) d--;
    d += (*p == 62); *d -= (*p ==60);
    d += (*p==62)-(*p==60);
    d += *p-60?*p==62:-1;
    d += 1/(61-*p);


    The last one has a caveat, and that is if *p == ‘=’. All characters that aren’t brainfuck commands are supposed to be ignored, but with ‘=’ we do get a division by zero. It is relatively easy to avoid this, but for the tweetable version we will just accept that programs containing ‘=’ characters won’t run.

    The interesting thing with the last example is that we can save some extra characters by assigning *p to a temporary variable and subtract 44 which is the ascii value of the input command ‘,’. In addition to making the check for an input command short, it will make one of our increment/decrement operations sort as well. Although simple, it turns out that these savings on the non loop commands are larger than any advanced modulus or bitmasking.

    Another saving can be made by combining the two inner for loops. This is actually quite straightforward. One for loop increments the cell pointer, and the other one decrements it. As we have an efficient way to compute increment/decrement, we can use it to determine the step to advance the cell pointer in the inner loop. If assign s to the cell pointer step and t to the nesting count we get something like:

    for(t=s=1/(t-48);*d?t>0:t;t+=1/(*p-92))p-=s;

    Note that we initialize the nesting counter t to -1 if we are searching forward, and 1 if we are searching backward. This is to allow us to use the short increment/decrement statement when updating the nesting count. The only caveat is the additional decrement at the end of one of the loops. But we can avoid that decrement by not incrementing the data pointer at the end of the outer loop in this case:

    p+=s<1

    A few characters can also be saved by using read() and write() instead of getchar() and putchar(). The saving comes from the fact that we can read or write 1 or 0 bytes using boolean to integer conversion in the method call, and also that we can embed a d+=... in one of the calls instead of having it as a separate statement.

    From a golfing perspective the C language evolution hasn’t been all that great. If we limit ourselves to C89, we can get rid of type declarations and instantly save some code characters. The types default to int. The brainfuck language doesn’t dictate the size of each cell, so we could without violating the language accept that the cell size is int.

    Lastly, but quite unfortunately, the code after all golfing isn’t tweetable with 30,000 memory cells, so the only thing left is to limit the number of data cells. Small brainfuck programs run fine with only 10 cells, so the last thing we’ll do is to reduce the number of cells from the recommended 30,000 to 10 and we are tweetable!

    m[9],*d=m,s,t;g(char*p){for(;t=*p;p+=s<1)
    for(t-=44,t?*d-=1/t,write(1,d+=1/(t-17),t==2),
    t=s=1/(t-48):read(0,d,1);*d?t>0:t;t+=1/(*p-92))p-=s;}


Compilable Program

    The easiest way to make a compilable program with the tweetable method is to implement a main method that calls the brainfuck method, e.g.:

    main(int a,char**b){g(b[1]);}

    But this adds a whopping 29 characters of code. If we limit ourselves to 32 bit computer systems we can merge the main method into the brainfuck method. By using implicit integer to pointer conversions we can get something reasonable compact:

    m[9],*d;main(s,t){char*p=1[d=t];for(d=m;t=*p;p+=s<1)
    for(t-=44,t?*d-=1/t,write(1,d+=1/(t-17),t==2),t=s=1/
    (t-48):read(0,d,1);*d?t>0:t;t+=1/(*p-92))p-=s;}


    The most interesting part here is probably the assignment of p. The arguments of main defaults to int types in C89, but we know the second argument of main is a pointer to a character array, where the first array holds the executable name, and the second the argument to the program. So by temporarily assigning d to the pointer value stored in t, we can index the second character array which holds the code. then we assign p using an implicit cast from int* to char*.

    It is of course also possible to remove all compiler and platform specific features and write a fully portable interpreter adhering to the initial language specification (including ignoring all non command characters). Here is one 171 character long version:

    char m[30000],*d=m,s;main(int t,char**p){for(++p;t=**p;
    *p+=s<1)for(t-=44,t?*d-=1/t,write(1,d+=t-17?0:1/(t-17)
    ,t==2),s=t=t-47?w==49:-!*d:read(0,d,1);t;t+=1/(**p-92))
    *p-=s;}


Conclusion

    As mentioned in the introduction the article do present a tweetable brainfuck method written in C, but falls short of being a complete program. Despite that it may be interesting reading for any code golf enthusiasts and perhaps it gives some ideas to someone to get all the way to a complete tweetable brainfuck interpreter program written in C.

    And finally, if you feel the urge to tweet, copy paste the code below and send it off to your friends:

    m[9],*d=m,s,t;g(char*p){for(;t=*p;p+=s<1)
    for(t-=44,t?*d-=1/t,write(1,d+=1/(t-17),t==2),
    t=s=1/(t-48):read(0,d,1);*d?t>0:t;t+=1/(*p-92))p-=s;}


No comments:

Post a Comment