Sunday, January 29, 2012

Finding Reverse CRC Patch with Readable Characters

Background

    In the article Calculating Reverse CRC I described an algorithm to create a four byte patch sequence that can be inserted into a message to give the message a desired CRC. A reader asked if and how the algorithm could be adopted to return a patch sequence with only ASCII characters. This article details how to extend the original algorithm to achieve this.


Problem

    As described in the original article, one of the properties of CRC is that you can find and append a four byte sequence to any message and get any desired resulting CRC hash.

    For a given input CRC and desired CRC, we can easily check if the four patch bytes are ASCII characters. The problem is that for many combinations of input CRC and desired CRC the byte sequence includes non-ASCII characters. The problem is how we address the case when we don’t get ASCII characters in our patch sequence.


Observations

    Before we go into the solution, we can make some observations that may be beneficial. The binary patch sequence is four bytes, or 32 bits. As we want to limit the patch sequence to ASCII characters, each patch byte need to be limited to characters 0-9, A-Z, and a-z. The number of bits we can encode with these characters are roughly 6 (actually 62 different character values, but close enough to 6 bits for our needs). Using four characters would only give us 24 bits of information and we can’t represent all possible patch bytes. By adding two additional characters to our patch sequence, we will get 36 bits which would be sufficient.


Solution

    The problem is that the CRC algorithm and the reverse CRC algorithm are working on 8 bit values, and can’t be changed. So we need to find a solution around how we could add the extra two patch characters without changing the nature of the original CRC algorithm.

    This can fortunately be done relatively easy. Assume that our patch sequence is six characters represented by the names A-F:

     ABCDEF

    We know the initial CRC (before the patch bytes are added to the CRC) as well as the desired CRC. We also know that the reverse algorithm operates on four characters. We can use this knowledge to break the problem into two by introducing a temporary CRC.

    If we pick two characters for the substring AB and then append these characters to the original CRC we get a new CRC value. The new CRC value can then be used in our original algorithm to get the patch bytes CDEF. If we pick the two characters AB carefully, we can get patch bytes CDEF that are only ASCII.

    So the problem is now reduced to finding two characters AB that satisfy the requirements that the patch sequence CDEF only contains ASCII characters.

    This problem can be tough to solve, but as we are only trying to find two characters which means that there are only 62 * 62 = 3844 possible solutions. This is a very small number for trial and error on a computer, and the best approach is to do a brute force trial to find these characters, as shown in the following pseudo code:


  for (char a in ValidCharset) {
    for (char b in ValidCharset) {
      Crc32 crc32(msgCrc); // Temporary CRC object
      crc32.append(a);
      crc32.append(b);
      if (ContainsValidChars(crc32.findReverse(desiredCrc)) {
        // Return the found patch bytes....
      }
    }
  }


Usage

    Finding a patch sequence with a limited character set that generates a desired CRC can be useful to for example finding weak passwords in systems that use CRC for encryption. Other uses may include cases where you want to CRC any message but for some reason have constraints on what the generated CRC can be and the only option for patching is to use printable characters.


Source Code

    The complete source code of a CRC implementation that includes a method for finding a six readable character patch sequence for the reverse CRC, as well as some test cases can be found here.


18 comments:

  1. This is an awesome article, thank you. I do however have a question ..

    I may be misunderstanding the implementation so please excuse me.

    I have a string : abcd - that has a CRC32 hash of 0xccc6120d

    testManyAscii returns, as expected 100 results.

    but the string returned cannot be used in place of the abcd string I had.

    Is this down to the poly in place being different, or am i using the implementation incorrectly?

    ReplyDelete
  2. I added the method to set up the tables with a different polynomial because I had a feeling that it may be needed for you. I tried quickly to find any info on the web but didn't find anything. I'm sure with some more searching it will come up. Maybe there are some other minor differences as well (e.g. the xor on the crc before new data is added.)

    ReplyDelete
    Replies
    1. Hi, thank you so much for the response and also the addition of the Poly Method :)
      Having done a fair bit of reading about the topic, I have found that it is the first and last integers aren't XOR'd for my purposes.
      I am facing the challenge of getting this into C# as well as finding the poly that i should be using, so it's a lot to take in at the moment.

      Thank you again for such an interesting article :)

      Delete
    2. Thanks :) Let me know how it goes.

      Delete
  3. Hi,
    I feel really bad for asking as I really appreciate what you have applied in this article, but I am struggling with altering the method to not XOR the first and last CRC32 integers .. can you offer a pointer at all?

    ReplyDelete
    Replies
    1. It looks like the CRC value is not inverted in PST CRC. so remove all the ~ characters in the file (which inverts the CRC) and initialize crc_ to 0 (the latter is probably not needed for just reversing CRC). Not sure if you tried that already.

      Delete
  4. Hello, Thank you so much for this, I am learning a great deal from the code, and having a great time doing it. You will have to excuse my ignorance here, I genuinely apologise in advance (or is it too late? :) )
    Initialising crc_ to 0 instead of 0xffffffff was tried earlier, I have left that in place.
    Removing the ~ from the file stepped me out of my knowledge comfort zone. IT threw an error around the Destructor function stating that the function already has a body.

    ReplyDelete
    Replies
    1. If you have a password and expected CRC I can try to see how the algorithm needs to be modified. I didn't find any documentation that described it detailed enough to be sure what needs to be done, but if I can see some expected output it may help to get you on the right track.

      Delete
    2. Thanks!! I've been playing around a fair bit today, but the results have varied significantly.

      An example would be :
      Password: abcd
      CRC: 0xccc6120d

      Delete
    3. Sorry for dome delay. I updated the zip file with the option to not invert the CRC. You need to comment out the define in the beginning of the Crc32.cpp file. When you do, you get the result you want from your example. Hope this helps.

      Delete
  5. Hi, there is absolutely no need for you to apologise!! You're helping me enormously!!
    I'm trying it at the moment to see if the ASCII strings can be substituted in place of abcd and it isn't working .. for example ... one of the strings returned is from the CRC 0xccc6120d is Bze8PW .... So in an attempt to understand how things go together, I applied the Bze9PW string as a password and it came back with 0x98C67913 as it's CRC hash .. is it me being stupid ?

    ReplyDelete
    Replies
    1. Try to replace main() with the following (and comment out the INVERSE_CRC define:

      #include <string.h>
      int main(void) {
        Crc32 crc32;
        char startStr[] = "abcd";
        crc32.append(startStr, strlen(startStr));
        UInt32 crcOrig =   crc32.get();
        printf("CRC of: %s = 0x%.8x\n", startStr, crcOrig);

        crc32.set(0); // Reset CRC
        const char* patch = crc32.findReverseAscii(crcOrig);
        printf("Patch for 0x%.8x: %s\n", crcOrig, patch);

        crc32.set(0); // Reset CRC
        crc32.append(patch, strlen(patch));
        UInt32 crcPatch = crc32.get();
        printf("CRC of patch: %s = 0x%.8x\n", patch, crcPatch);
      }

      This should output:

      CRC of: abcd = 0xccc6120d
      Patch for 0xccc6120d: 1sUE2k
      CRC of patch: 1sUE2k = 0xccc6120d

      Let me know if you get the same result. If you use the same Crc32 object (as the example above) you need to reset the CRC between calculations.

      Delete
    2. I cannot find any other way to say this than you are a wonderful human being.
      You have taught me so very much. a HUGE thank you.

      Delete
    3. Hi Daniel!
      THANKS A TON! For the guide as well as the source code in C. I haven't used it yet, but I was pleasantly surprised by your prompt response! I will try this and get back to you once I am able to implement this in my work.
      Thanks again! You're really helpful!

      Delete
  6. Thanks a lot. I tried code and working well.
    But it works only for ASCII string. What to do with UNICODE string?

    ReplyDelete
    Replies
    1. One question is what the desired behavior is, e.g. perhaps allowing characters of a particular language. Another is what type of encoding to use, e.g. UTF-8, UTF-16 etc. What use case did you have in mind?

      Delete
  7. These are the most common benefits that you get by installing an effective data recovery software. Although, you will get several other benefits by installing the software for data recovery, but the above mentioned are the most common that you will surely get by installing them. winzip activation code

    ReplyDelete