Sunday, February 28, 2016

Tweetable Brainfuck Interpreter in C

Background

    Brainfuck is a turing complete esoteric programming language created by Urban Müller in 1993. The language is made up of only eight instructions that easily translate into C. A Hello World program could look something like:

    --[>--->->->++>-<<<<<-------]>--.>---------.>
    --..+++.>----.>+++++++++.<<.+++.------.<-.>>+.


    Urban’s primary goal was to create a language that could be compiled with the smallest possible compiler. Several small compilers has been developed over the years. The smallest compiler is less than 100 bytes of binary code.

    In addition to create small compiler binaries, it has become a sport to write interpreters in as few bytes of source code as possible, often referred to as code golfing. This article describes a tweetable implementation written in the C language.

    However, it may be good to tell up front, that it is only a tweetable method. Although fully functional in 140 characters of code, it needs to be called by a separate main method. Extending the method to a fully compilable program requires 151 characters of code which places the implementation among the shortest presented to date.


Language Overview

    A brainfuck program is a sequence of commands and an instruction pointer keeps track of the instruction to execute next. The program starts at the first instruction and terminates when the last instruction is executed. The language uses an array of memory cells that can be modified by the program. In addition the language supports reading data from standard in and writing data to standard out.

Program environment:
char cells[infinitely large] = { 0 };
char* cell_ptr = cells;

Brainfuck Command     C equivalent code
          > ++cell_ptr;
          < --cell_ptr;
          + ++*cell_ptr;
          - --*cell_ptr;
          . putchar(*cell_ptr);
          , *cell_ptr = getchar();
          [ while(*cell_ptr) {
          ] }


Golfing the Interpreter

    To start off, let's begin with a fairly standard brainfuck interpreter implementation:

    int m[30000], *d=m, t;
    void g(char *p) {
      for (; *p != 0 ; p++) {
        switch (*p) {
        case '>': ++d; break;
        case '<': --d; break;
        case '+': ++*d; break;
        case '-': --*d; break;
        case '.': putchar(*d); break;
        case ',': *d = getchar(); break;
        case '[':
          if (*d == 0) {
            for (t = 1; t != 0;) {
              ++p;
              if (*p == '[') ++t;
              else if (*p == ']') --t;
            }
          }
          break;
        case ']':
          for (t = 1; t != 0;) {
            --p;
            if (*p == '[') --t;
            else if (*p == ']') ++t;
          }
          --p;
          break;
        }
      }
    }


    There are multiple approaches to reduce the number of code characters. Some implementations use recursive calls to the method, some tries to eliminate / combine for loops, some use modulus on the ascii value of the instructions and look at bit patterns, and some use the question mark operator and implicit conversions from boolean expressions to integers.

    All these techniques are great for code golfing, although the implementation described here uses almost none of these. There are a few places where these techniques actually do help, but the main code size reduction is done in a different way.

    There are four groups of brainfuck commands, memory cell increment/decrement, cell pointer increment/decrement, input/output, and loops. An interesting observation is that the ascii codes of the two commands in each group differs by 2. The input and output needs to be handled by special casing, but we could implement the increment/decrement the same way, for example:

    if (*p == ‘>’) d++; else if (*p == ‘<’) d--;
    if (*p == 62) d++; else if (*p == 60) d--;
    d += (*p == 62); *d -= (*p ==60);
    d += (*p==62)-(*p==60);
    d += *p-60?*p==62:-1;
    d += 1/(61-*p);


    The last one has a caveat, and that is if *p == ‘=’. All characters that aren’t brainfuck commands are supposed to be ignored, but with ‘=’ we do get a division by zero. It is relatively easy to avoid this, but for the tweetable version we will just accept that programs containing ‘=’ characters won’t run.

    The interesting thing with the last example is that we can save some extra characters by assigning *p to a temporary variable and subtract 44 which is the ascii value of the input command ‘,’. In addition to making the check for an input command short, it will make one of our increment/decrement operations sort as well. Although simple, it turns out that these savings on the non loop commands are larger than any advanced modulus or bitmasking.

    Another saving can be made by combining the two inner for loops. This is actually quite straightforward. One for loop increments the cell pointer, and the other one decrements it. As we have an efficient way to compute increment/decrement, we can use it to determine the step to advance the cell pointer in the inner loop. If assign s to the cell pointer step and t to the nesting count we get something like:

    for(t=s=1/(t-48);*d?t>0:t;t+=1/(*p-92))p-=s;

    Note that we initialize the nesting counter t to -1 if we are searching forward, and 1 if we are searching backward. This is to allow us to use the short increment/decrement statement when updating the nesting count. The only caveat is the additional decrement at the end of one of the loops. But we can avoid that decrement by not incrementing the data pointer at the end of the outer loop in this case:

    p+=s<1

    A few characters can also be saved by using read() and write() instead of getchar() and putchar(). The saving comes from the fact that we can read or write 1 or 0 bytes using boolean to integer conversion in the method call, and also that we can embed a d+=... in one of the calls instead of having it as a separate statement.

    From a golfing perspective the C language evolution hasn’t been all that great. If we limit ourselves to C89, we can get rid of type declarations and instantly save some code characters. The types default to int. The brainfuck language doesn’t dictate the size of each cell, so we could without violating the language accept that the cell size is int.

    Lastly, but quite unfortunately, the code after all golfing isn’t tweetable with 30,000 memory cells, so the only thing left is to limit the number of data cells. Small brainfuck programs run fine with only 10 cells, so the last thing we’ll do is to reduce the number of cells from the recommended 30,000 to 10 and we are tweetable!

    m[9],*d=m,s,t;g(char*p){for(;t=*p;p+=s<1)
    for(t-=44,t?*d-=1/t,write(1,d+=1/(t-17),t==2),
    t=s=1/(t-48):read(0,d,1);*d?t>0:t;t+=1/(*p-92))p-=s;}


Compilable Program

    The easiest way to make a compilable program with the tweetable method is to implement a main method that calls the brainfuck method, e.g.:

    main(int a,char**b){g(b[1]);}

    But this adds a whopping 29 characters of code. If we limit ourselves to 32 bit computer systems we can merge the main method into the brainfuck method. By using implicit integer to pointer conversions we can get something reasonable compact:

    m[9],*d;main(s,t){char*p=1[d=t];for(d=m;t=*p;p+=s<1)
    for(t-=44,t?*d-=1/t,write(1,d+=1/(t-17),t==2),t=s=1/
    (t-48):read(0,d,1);*d?t>0:t;t+=1/(*p-92))p-=s;}


    The most interesting part here is probably the assignment of p. The arguments of main defaults to int types in C89, but we know the second argument of main is a pointer to a character array, where the first array holds the executable name, and the second the argument to the program. So by temporarily assigning d to the pointer value stored in t, we can index the second character array which holds the code. then we assign p using an implicit cast from int* to char*.

    It is of course also possible to remove all compiler and platform specific features and write a fully portable interpreter adhering to the initial language specification (including ignoring all non command characters). Here is one 171 character long version:

    char m[30000],*d=m,s;main(int t,char**p){for(++p;t=**p;
    *p+=s<1)for(t-=44,t?*d-=1/t,write(1,d+=t-17?0:1/(t-17)
    ,t==2),s=t=t-47?w==49:-!*d:read(0,d,1);t;t+=1/(**p-92))
    *p-=s;}


Conclusion

    As mentioned in the introduction the article do present a tweetable brainfuck method written in C, but falls short of being a complete program. Despite that it may be interesting reading for any code golf enthusiasts and perhaps it gives some ideas to someone to get all the way to a complete tweetable brainfuck interpreter program written in C.

    And finally, if you feel the urge to tweet, copy paste the code below and send it off to your friends:

    m[9],*d=m,s,t;g(char*p){for(;t=*p;p+=s<1)
    for(t-=44,t?*d-=1/t,write(1,d+=1/(t-17),t==2),
    t=s=1/(t-48):read(0,d,1);*d?t>0:t;t+=1/(*p-92))p-=s;}


Tuesday, January 20, 2015

ColecoVision Driving Module

Introduction

    ColecoVision became one of the major game consoles when released in 1982 and sold over two million units, mainly in North America. However, sales and popularity of the console dropped quickly in the big video game crash of 1983 and was withdrawn from the video game market in 1985. ColecoVision carried titles such as Donkey Kong, Smurf, and Zaxxon.

    The console was powered by an 8 bit Z80 processor running at 3.58MHz Z80 and had a whopping 1 kB RAM. The video processor had 16kB video ram providing 256 x 192 pixel resolution in 16 predefined colors. The sound was powered by a three channel square wave sound generator with an additional noise channel. The games were released on cartridges with up to 32 kB ROM.

    One of the more interesting aspects of the console was the hardware expandability through the Expansion Module interface. The first expansion module made the ColecoVision compatible with Atari 2600, enabling the biggest game library at the time. The second expansion module was a driving controller with a steering wheel and gas pedal, and other expansion modules followed.

    This article focuses on the driving controller, and the challenges using it in modern homebrew development projects. The steering wheel was first introduced with the game Turbo, and it was later used in a handful of games. Typically for the games in the 80’s is that these where one person games, and we’ll go into details on why and what actually can be done to allow two steering wheels to be used at the same time in a game.


Motivation

    The main reason for me learning about and solving the problems with the driving module came when me and Vincent van Dam decided to develop a new racing game for the ColecoVision with graphics help from Luc Miron. Our goal was to do something never done before on ColecoVision; a variation of the microcars game, including multi directional scrollers, real physics engines to model the cars, advanced music players, and of course support for the driving module allowing two players to compete. Many of the features we wanted had never been done on ColecoVision, so it was a quite ambitious goal. The hardware constraints of the console made the development hard and the game took over 2,000 hours to develop. The finished game was a game published by CollectorVision in 2011:



Driving Module Overview

    The steering wheel has a 2 bit rotary encoder with two concentric rings with contact surfaces and openings offset by 90 degrees as the picture below illustrates.



    The rings move as a person turns the steering wheel, metal plates at fixed position generates an ‘on’ signal when they touch the metal surface of the rings, and an ‘off’ signal otherwise. These signals can be read by software on I/O port $FC for the first steering wheel and I/O port $FF for the second steering wheel, allowing the software to read the angle of the rotary wheel as a 2 bit value:

Contact 1 Contact 2
    Angle
Off Off
    0° to 90°
Off On
    90° to 180°
On On
    180° to 270°
On Off
    270° to 360°


On the ColecoVision, contact 1 is mapped to bit 4 and contact 2 to bit 5.

Driving games typically want to know how much the user moved the steering wheel. A convenient measurement is a single integer value (called steeringCunter in the algorithms below) where a negative value means steering to the left, and a positive value steering to the right. A small value means steering a little bit and a large value steering a lot. We’ll discuss two approaches, one pure polling based, and one that is using interrupts that calculates the steeringCunter value.


Polling

    A straightforward way of maintaining the steeringCounter value is to initialize it to 0, then read the values of contact 1 and contact 2 at a periodic interval. The contact bits are Gray coded, which means that it is a binary counter where two adjacent codes differ by only one bit position. The algorithm for a polling function increases or decreases the steeringCounter value by one depending on the previous values of the contact bits and which contact changed since last read. If both contact bits has changed we don’t do anything as we can’t say if the wheel turned left or right.

    The code snippet below illustrates how the polling method could be implemented in a fairly efficient way utilizing the Gray coded binary counter.

  #define CONTACT_1 0x10
  #define CONTACT_2 0x20
  #define CONTACT_MASK (CONTACT_1 | CONTACT_2)

  void pollSteeringWheel() {
  int wheelCounter = 0;
  unsigned char lastState = 0;

  void pollSteeringWheel() {
    /* Read the current state of the steering wheel contacts */
    /* and compute the bits that changed since last call. */

    unsigned char newState = readIoPort(0xFC) & CONTACT_MASK;
    unsigned char bitsChanged = lastState ^ newState;
    /* Mirror bitsChanged if the high bit is set in lastState
    /* to allow a single test for increment and decrement. */

    if (lastState & CONTACT_2) {
      bitsChanged ^= CONTACT_MASK;
    }

    /* Increase or decrease wheelCounter if only one */
    /* bit has changed. */

    if (bitsChanged == CONTACT_1) {
      wheelCounter++;
    }
    if (bitsChanged == CONTACT_2) {
      wheelCounter--;
    }
    lastState = newState;
  }


    The code can easily be extended to support two steering wheel controllers. Just duplicate the code for the second steering wheel, reading from port $FF instead of port $FC.

    The drawback of using a polling method like the one above, is that it needs to be called at a frequency that is higher than the rotary wheel moves 90 degrees as the user moves the steering wheel. Although the code is not complicated it uses a fair bit of CPU on a ColecoVision system, and interleaving calls to the polling method is sometimes not that easy.


Interrupt Driven Approach

    To allow software to register steering wheel movements in real time, the first contact is connected to a non maskable interrupt, which means that when the first contact changes from ‘off’ to ‘on’ the cpu is interrupted and the software can read the bits at that time. We can use the interrupt and the state of the rotary encoder to implement a counter that measures how much the wheel has been rotated to the left or to the right.

    The first contact can turn from ‘off’ to ‘on’ in two places, either if the wheel is turned clockwise from ‘off’ to ‘on’ or counter clockwise from ‘off’ to ‘on’. By looking at the state of the second contact we can see if the wheel was turned clockwise or counter clockwise. The second contact is ‘on’ when the transition from ‘off’ to ‘on’ occurs on the first contact when we rotate clockwise. Similarly, the second contact is ‘off’ if we rotate counter clockwise.

    There are however two quite big problems with using interrupts on the steering wheel controllers.
  1. Both steering wheels share the same nonmaskable interrupt, so when supporting two steering wheels, it is not possible to tell which of them that caused the interrupt to be fired.
  2. The mechanical contacts can cause jitter if the steering wheel is positioned just at the edge of the connecting metal plates, causing multiple interrupts to be fired even though the wheel is not spinning.

Algorithm using Interrupts

    To address both these issues, we’ll implement a clock pulse for each steering wheel that is set in the interrupt service routine, and reset in a background polling task. When the interrupt is triggered, the program checks if the logical clock pulse. If the clock pulse is high, no action is taken. If the clock pulse is low, we know that the steering wheel has moved, and we update the counter based on the state of the second contact. The pseudo code below implements the algorithm for one of the steering wheel controller. The real code linked at the end of the article duplicates the code for the second controller both in the polling method and the interrupt service routine.

  #define CONTACT_1 0x10
  #define CONTACT_2 0x20

  unsigned char clockPulse = CONTACT_1;
  int wheelCounter = 0;

  void pollSteeringWheel() {
    /* Clear clock pulse if contact 1 is off. */
    clockPulse &= readIoPort(0xFC);
  }

  void interruptServiceRoutine() {
    /* Update counter for steering wheel if clock pulse */
    /* was reset in the polling routine. */
    if (clockPulse == 0) {
      /* Read contact 2 to check direction. */
      if (readIoPort(0xFC) & CONTACT_2) {
        wheelCounter++;
      } else {
        wheelCounter--;
      }
      /* Set clock pulse. */
      clockPulse = CONTACT_1;
    }
  }


    Using one clock pulse for each steering wheel (setting in the interrupt service routine and clearing it in the background) serves two purposes:
  1. It allows for two steering wheels to share a single interrupt
  2. It provides a pseudo low-pass filter that significantly reduces any jitter caused by the mechanical contacts.

    Although this approach also require polling, it has some advantages. First the polling interval can be twice as long as the pure polling based version. Second, the polling method is rather short compared to the one in the polling only version. Together this gives a solution that require much less CPU time.


Tuning

    To get a responsive input from the controllers, the background task needs to run around once every one millisecond. Running it too often causes more jitter, and running it less often reduces the responsiveness of the steering wheel. The algorithm is not that sensitive for some variance in when the background task is run, and occasional large delays between runs does not cause any large impact to the user. An easy way during development to see that the background task is called at the desired interval is to add a code snippet that changes the border color every time it is been called.

    Also, a game may want to downshift the wheelCounter values to get the desired sensitivity of the steering wheel.


Sample Code

    A Hitech C implementation (assembly code using c calling conventions) of the interrupt based algorithm can be downloaded here.


Tuesday, July 9, 2013

Rewinding CRC - Calculating CRC backwards

Background

    In two earlier articles (Calculating Reverse CRC and Reverse CRC Patch with Readable Characters) I've described algorithms to find a sequence of bytes and ASCII characters that will append to a buffer to change the CRC to a desired CRC.
    Some readers have said that it would be nice to be able to modify the data and apply a patch when the beginning of the buffer is unknown. This article describes a method to do just that, although it does require the final CRC of the original buffer to be known.


Problem

    The algorithms described in earlier articles require that the CRC at the beginning of the buffer is known. But suppose you have a message with an unknown beginning and you want to change a sequence in the buffer and apply a patch to make the mutated buffer end up with the same CRC as the original. For example changing:

    {unknown data} + {known buffer} -> Known CRC

to

    {same unknown data} + {modified buffer} + {patch} -> Same Known CRC

    In order for the previously described patch algorithms to work, you would need to know the CRC at the beginning of the known buffer to use the algorithms. This article describes an algorithm that starts at the end of a known buffer and calculates the CRC at an earlier point in the buffer.


Solution

    The solution is not that complicated but has a few caveats that we'll go into. The algorithm is basically the reverse of the normal CRC algorithm that looks like:

    newCrc = (oldCrc >> 8) ^ crc32Table[value ^ (oldCrc & 0xff)];

    Assume that the CRC at the current location in a buffer contains the octets [An Bn Cn Dn] where An is the high eight bits and Dn is the low eight bits. And the CRC before the last byte Vn was added to the CRC was [Ao Bo Co Do]. If we put these two CRCs into the equation and performs the shifts, we'll get:

    [An Bn Cn Dn] = [0 Ao Bo Co] ^ crcTable[Vn ^ Do]

    Note the 0 in the high byte of the old CRC. Looking at this it is quite clear that the high byte of the new/current CRC is equivalent to the high byte in the CRC table lookup. Knowing this we can easily look at all 256 CRC table indices and find the ones that matches the equation:

    An = crcTable[i] >> 24

Once we know a table index that matches the equation, we can calculate the old octet Do:

    Do = i ^ Vn

where Vn was the byte that we added to the old CRC to get the new CRC. This is however where the caveat comes in. There may be more than one table entry that satisfy the equation, which means that there will be multiple possible previous CRCs. The good news is that the standard CRC with the polynomial 04c11db7h and most other polynomials has exactly one table entry that do, so if using the standard CRC polynomial there is a single solution to the equation and thus only a single possible previous CRC.

    Let us continue. Once Do is calculated, it is easy to calculate the other three octets in the previous CRC, by inserting the found Dn into the equation:

    [0 Ao Bo Co] = [An Bn Cn Dn] ^ crcTable[Vn ^ Do]

    So to address the original problem where we wanted to find the CRC X number of bytes before the end, we'll apply this algorithm recursively on the known buffer starting at the last byte and the known CRC and end when we reach the beginning of the known buffer:

  for (int i = len(buffer) - 1; i >= 0; --i) {
    for (int j = 0; j < 256; ++j) {
      if ((crc32Table[j] >> 24) == (crc >> 24)) {
        crc = ((crc ^ crc32Table[j]) << 8) | (j ^ buffer[i]);
        break;
      }
    }
  }


    This gives us the CRC at the beginning of the known buffer, and we can apply the patch algorithms described in earlier articles.

    Although not that common, it may be worth looking at a solution to address the case where the polynomial doesn't have unique high octets. In this case we need to do a bit more work to find all solutions. We can use a recursive function call, but this may not scale very well, so the best option is to use a stack where we keep track of each found match in the CRC table.
    There are two pieces of information that needs to be pushed on the stack for each match; the current location in the buffer and the CRC at that location. Then we need to modify the program slightly to use the stack. The CRCs we found at the beginning of the known buffer is then stored in a list.

  std::list<UInt32> startCrcs;
  std::stack<match> stack;

  // Push last octet in the buffer to seed the loop.
  stack.push(new Match(size, INV(endCrc)));

  while (!stack.empty()) {
    Match* node = stack.top();
    stack.pop();
      if (node->offset == 0) {
      // Reached the beginning of the buffer; save the CRC.
      startCrcs.push_back(INV(node->crc));
    }
    else {
      // Find previous CRCs and push nodes to stack.
      for (int i = 0; i < 256; ++i) {
        if ((crc32Table[i] >> 24) == (node->crc >> 24)) {
          UInt32 prevCrc = ((node->crc ^ crc32Table[i]) << 8) |
                           (i ^ buffer[node->offset - 1]);
          stack.push(new Node(node->offset - 1, prevCrc));
        }
      }
    }
    delete node;
  }


Usage

    As mentioned, this algorithm may be useful if you want to back calculate CRC in a stream of data or in a frame when you know the final CRC in order to modify the data. There are some limitations though, one is that there may be multiple solutions depending on polynomial. The bigger limitation is that the algorithm requires that the CRC at the end (or at least at some point later than where the to calculate CRC at) which may be unknown.


Source

    I have added an implementation of the Rewind algorithm to the previous CRC package with some examples and test code. The source code can be found here.

Sunday, May 13, 2012

How to win (or not to win) IOCCC

Background

    The International Obfuscated C Code Contest, IOCCC, is a programming contest where the goal is to write the most obscure/obfuscated C program that shows the importance of programming style, stress the compilers and illustrates some of the subtleties of the C language. The competition started in 1984 and is now the longest running contest on the Internet. Through the years, the competition has produced amazing small programs. Man y of them are unique and fun, but there are also programs that solve the same problems as previous winners. One of the most popular is to generate small primes.

    This article talks about different ways to make a small primes program and reasons for why they didn’t win, followed by what eventually led to the 7th winning small primes program.


Simple Small Primes Program

    One of the entries I made to the competition is what may be the shortest small primes program with its 52 characters:

  l;main(O){for(;2-l||printf("%8d",O);l=O%--l?l:++O);}


    This program has some features of winning entries:
  • It is short and compact
  • uses hard to read variable names
  • Uses question mark operator
  • implicit type definitions
  • depends on order of Boolean operators
    This is a fairly good list of features that makes up a good obfuscated program. But it has one big flaw (apart from calculating primes), which is that the algorithm is quite easy to spot in the code. Anything that looks for modulus of one variable by a decreasing or increasing second variable is most likely a prime number generator. To win over the previous winners the program really need to be a bit more creative in hiding what it does, or do it in a more interesting way than easy to spot brute force.


Disguising the Algorithm

    A simple modification to the program is to make the brute force approach a bit less revealing. One way is to look at what makes a prime. A prime number p is a number greater than 1 that satisfies the equation with only integer numbers:

          a * b = p    iff:   a = 1 or b = 1

A simple substitution of a and b where

          a = i + j
          b = i – j

gives us the following equation:

          (i + j) * (i – j) = i2 – j2 = p    iff:   i – j = 1

    With this equation we can be a bit more creative. We still need to find an integer solution to the equation and one approach is to iterate over x and y until we find two numbers x and y that satisfy the equation above. Knowing that the derivate of n2 is 2n, we can find the squares iteratively where

          12 = 1
          d1 = 1
          dn = dn - 1 + 2
          n2 = (n-1)2 + dn

    The nice thing with this iteration is that we can use only additions and subtractions to find whether a number is a prime. We need two derivate variables in our iteration, say x and y to test the potential prime number. An interesting observation from the initial algorithm is that we actually don’t need to keep track of both squared values as we are only interested in iterating until we find two values of x and y that gives us an integer solution to the equation. Once the integer solution is found, we can check if x – y = 2, which means that p must be a prime.

    We still need one variable to hold the accumulated value of the squares, so we introduce a variable z. Each step in the iteration for finding will increase z by x or decrease z by y depending on whether z is greater or less than 0. We have an integer solution to the problem once z is 0. The iterative algorithm is implemented in the following obfuscated program:

  x,y,z;main(p){for(;;z+=!z?x-y-2?p:printf(
      "%8d",p),x=y=1,p+=2:z>0?y-=2:-(x-=2));}

    This program has many of the characteristics of the first implementation, but it also hides the obvious modulus that killed the first version. For most people it would be pretty hard to see that this version calculates prime numbers, but for anyone skilled in hyperbolic functions, the program still gives enough hints on what it’s doing, and since the program is trying to compete in a category where the IOCCC judges explicitly mention that it is much harder to win with a small primes program, its simply not good enough.


Be Creative in the Implementation

    Although the last program may have been too trivial, it does open up for several interesting implementations as it the problem is solved by simple additions and subtractions. One interesting feature of C is the pre-processor. A prime number generator using the pre-processor has already won IOCCC, so it may be a long shot going that route. That particular entry did hide the math in several pre-processor macros, where defined variable holds some value needed for the calculation. In fact it is very easy to find the actual test for a prime number as it is limited to one line doing a modulus test.

    Since we have a simple algorithm, we could use it to do something less obvious using the pre-processor. What if we didn’t assign any values to the variables we use? We could let a defined variable represent a 1 bit, and if the variable is not defined, it is a 0. This way we can use 10 variable names to represent a 10 bit number.

    We could also use recursive includes, where we have a state variable indicating what one pass of the program code does. For example we could exercise part of the program to do an addition, and a second to do subtraction etc.

    In order to do an addition of two numbers with these constraints, we need to have a one bit adder, that generates a carry. This is really not that hard. Assume we use x and y as the bits, c, as carry, and place the output back in x and c, we can do this in the pre-processor as:

  #ifdef x
  #ifdef y
  #ifndef c
  #define c
  #endif
  #else
  #ifdef c
  #undef x
  #endif
  #endif
  #else
  #ifdef y
  #ifndef c
  #define x
  #endif
  #else
  #ifdef c
  #undef c
  #endif
  #endif
  #endif

    Then we can use the one bit adder to add our 10 bit values A0-A9 and B0-B9 by first defining x and y for each of the bits in the two values and include the program recursively to exercise the code above, like:

  #ifdef A0
  #define x
  #else
  #undef x
  #endif
  #ifdef B0
  #define y
  #else
  #undef y
  #endif
  #include "program.c"

    And then do the same for for A1-A9 and B1-B9. We can do similar for subtractions. Furthermore, we can save the value of the last carry and use it as an overflow bit. Then we can leverage that for testing if a value is greater than another. Finally we need some similar logic to run the actual algorithm by including and exercising the right portions of the code. The whole program can be found at the IOCCC who won page.

    What makes this a winner compared to the other examples is probably that this really pushes the compilers to the limit. Not a single compiler was able to calculate prime numbers up to 1024 at the time of the competition. Linus Torvalds did work hard to fix one compiler within 12 hours after the source code was released to the public, but that is really an exception. Most compilers still don’t handle the program. The number of recursive includes are over 6.8 million, and the amount of memory required by the compiler is significant. It is also extremely slow; it takes hours to compute just a few hundred prime numbers. Furthermore, the actual pre-processed code is many megabytes and the only actual code, is one print statement for each prime number.


Conclusion

    To win IOCCC with something that is not unique or new is a challenge, but it is possible to find a new approach that is interesting enough, even with something as common as a prime number generator. In general though, it is much easier to win with something new, but even then, some fairly clever and interesting techniques need to be present. The good thing is that it is possible to improve an idea and reach something worthy by looking at alternatives and some out of the box thinking, just as these examples show. And the improvements can be introduces iterative and doesn’t need to be thought out in advance.

Happy coding.

Sunday, January 29, 2012

Finding Reverse CRC Patch with Readable Characters

Background

    In the article Calculating Reverse CRC I described an algorithm to create a four byte patch sequence that can be inserted into a message to give the message a desired CRC. A reader asked if and how the algorithm could be adopted to return a patch sequence with only ASCII characters. This article details how to extend the original algorithm to achieve this.


Problem

    As described in the original article, one of the properties of CRC is that you can find and append a four byte sequence to any message and get any desired resulting CRC hash.

    For a given input CRC and desired CRC, we can easily check if the four patch bytes are ASCII characters. The problem is that for many combinations of input CRC and desired CRC the byte sequence includes non-ASCII characters. The problem is how we address the case when we don’t get ASCII characters in our patch sequence.


Observations

    Before we go into the solution, we can make some observations that may be beneficial. The binary patch sequence is four bytes, or 32 bits. As we want to limit the patch sequence to ASCII characters, each patch byte need to be limited to characters 0-9, A-Z, and a-z. The number of bits we can encode with these characters are roughly 6 (actually 62 different character values, but close enough to 6 bits for our needs). Using four characters would only give us 24 bits of information and we can’t represent all possible patch bytes. By adding two additional characters to our patch sequence, we will get 36 bits which would be sufficient.


Solution

    The problem is that the CRC algorithm and the reverse CRC algorithm are working on 8 bit values, and can’t be changed. So we need to find a solution around how we could add the extra two patch characters without changing the nature of the original CRC algorithm.

    This can fortunately be done relatively easy. Assume that our patch sequence is six characters represented by the names A-F:

     ABCDEF

    We know the initial CRC (before the patch bytes are added to the CRC) as well as the desired CRC. We also know that the reverse algorithm operates on four characters. We can use this knowledge to break the problem into two by introducing a temporary CRC.

    If we pick two characters for the substring AB and then append these characters to the original CRC we get a new CRC value. The new CRC value can then be used in our original algorithm to get the patch bytes CDEF. If we pick the two characters AB carefully, we can get patch bytes CDEF that are only ASCII.

    So the problem is now reduced to finding two characters AB that satisfy the requirements that the patch sequence CDEF only contains ASCII characters.

    This problem can be tough to solve, but as we are only trying to find two characters which means that there are only 62 * 62 = 3844 possible solutions. This is a very small number for trial and error on a computer, and the best approach is to do a brute force trial to find these characters, as shown in the following pseudo code:


  for (char a in ValidCharset) {
    for (char b in ValidCharset) {
      Crc32 crc32(msgCrc); // Temporary CRC object
      crc32.append(a);
      crc32.append(b);
      if (ContainsValidChars(crc32.findReverse(desiredCrc)) {
        // Return the found patch bytes....
      }
    }
  }


Usage

    Finding a patch sequence with a limited character set that generates a desired CRC can be useful to for example finding weak passwords in systems that use CRC for encryption. Other uses may include cases where you want to CRC any message but for some reason have constraints on what the generated CRC can be and the only option for patching is to use printable characters.


Source Code

    The complete source code of a CRC implementation that includes a method for finding a six readable character patch sequence for the reverse CRC, as well as some test cases can be found here.