Wednesday, October 20, 2010

Calculating Reverse CRC

Background

    Cyclic Redundancy Check is a commonly used method for performing integrity checks on data. It is a simple hash function that provides assurance that data is not corrupted and is widely used in data communication and data storage applications.

    CRC is not suitable for protecting against intentional modification of data. It is a cryptographically weak algorithm and can easily be reversed. This article describes a simple algorithm to generate a four byte data sequence that can be added to a message to make the resulting CRC any value you want.


CRC algorithm

    The mathematics behind CRC is described in many articles on the internet and I will not go into any details about the theory. The algorithm to calculate a ‘reverse CRC’ described here is based on the 32-bit polynomial, CRC-32-IEEE, most commonly used by standards bodies, but can easily be adapted to other CRC types.

    The algorithm described in this article uses a table driven CRC implementation. A CRC lookup table contains 256 pre-calculated values, derived from the CRC polynomial. When the CRC of a message is calculated, each byte in the message modifies a 32 bit hash:

     crc = (crc >> 8) ^ crc32Table_[*ptr++ ^ (crc & 0xff)];

    One small thing to note is that the final CRC hash is the inverse of the calculated hash. A typical method to calculate CRC of a message or to append more data to a CRC would look something like:


  UInt32 append(const void* buffer, UInt32 size, UInt32 oldCRC)
  {
     const UInt8* ptr = (const UInt8*)buffer;
     UInt32 crc = ~oldCrc;

     while (size--) {
        crc = (crc >> 8) ^ crc32Table_[*ptr++ ^ (crc & 0xff)];
     }

     UInt32 newCrc = ~crc;
     return newCrc;
  }


Reversing CRC

    One of the properties of CRC is that you can find and append a four byte sequence to any message and get any desired resulting CRC hash. This means that a message can be patched with a four byte sequence in order to get any hash you want.

    Let us define our desired CRC as AABBCCDD where AA is the most significant byte and DD is the least significant byte. For each byte in the message, the CRC algorithm shifts the hash one byte to the right and xor it with a value from the lookup table. This means that in each step a new byte is added to the most significant byte in the hash, and in consecutive calculations the byte is shifted to less significant positions. This behavior can be used to quite easily create a sequence of bytes that will result in our desired CRC.

    The table below shows how the CRC algorithm transforms the hash into the desired CRC. The xx values are reminders of the CRC of the message preceding the patch bytes.

        CRC after 1st patch byte   = DD3 xx xx xx
        CRC after 2nd patch byte  = CC2 DD2 xx xx
        CRC after 3rd patch byte  = BB1 CC1 DD1 xx
        CRC after 4th patch byte   = AA BB CC DD


Getting more into concrete details of the algorithm, we can easily see that the byte AA is the high byte of a CRC table lookup value xored with 0. The CRC table is called T in the following equations.


      AA = ( T[ i3 ] ^ 0 ) >> 24


What we need to achieve is to find the appropriate index i3 that makes this equation true. This can be rewritten as a Boolean expression

      ( AABBCCDD ^ T[ i3 ]) >> 24 == 0, where 0 <= i3 < 256


    So the challenge is to find the index i3 that satisfies the expression and this shouldn’t be a hard challenge for anyone with basic skills in programming. Although it is not really needed to spell out the whole hash, AABBCCDD, in the expression above, it allows us to simplify successive calculations.

Moving forward to the second highest byte, we can unroll the CRC algorithm to get


      BB = ( ( ( BB1 ^ 0 ) >> 8 ) ^ T[ i3 ] ) >> 16


Where BB1 is calculated the same way as AA, i.e.


      BB1 = ( T[ i2 ] ^ 0 ) >> 24


Now we can create a Boolean expression for calculating the table index for the 3rd patch byte. By putting the equations together and rearrange the shifts a bit, we can see that the index of the table lookup for calculating BB1 needs to satisfy the Boolean expression


      (((AABBCCDD ^ T[ i3 ]) << 8) ^ T[ i2 ]) >> 24 == 0, where 0 <= i2 , i3 < 256


Note that the value BB is fetched from the full desired CRC value and shifted up to become most significant byte, just like AA was the most significant byte in the expression to calculate i3.

Similarly the table indexes for calculating the first and second byte and we four nice equations to solve:


          (AABBCCDD ^ T[ i3 ]) >> 24 == 0
        (((AABBCCDD ^ T[ i3 ]) << 8) ^ T[ i2 ]) >> 24 == 0
      (((((AABBCCDD ^ T[ i3 ]) << 8) ^ T[ i2 ]) << 8) ^ T[ i1 ]) >> 24 == 0
  (((((((AABBCCDD ^ T[ i3 ] ) << 8) ^ T[ i2 ]) << 8) ^ T[ i1 ]) << 8) ^ T[ i0 ]) >> 24 == 0


   So by finding the indexes of the table entries, it is possible to end up with the desired CRC. Solving these equations are easy, just do a reverse lookup on the CRC table. As you see, the first equation gets the table index for the last patch byte, and the value can be used to calculate the earlier table indexes.
    The code below shows the algorithm in practice. The calculated indexes are stored in the tableIdx array. Note that the desired CRC is inverted before starting the algorithm. The reason is that the CRC algorithm operates with inverted values as described in the beginning of the article.


  int tableIdx[4];

  UInt32 iterCrc = ~desiredCrc;
  for (int j = 3; j >= 0; j--) {
     for (int i = 0; i < 256; i++) {
        if (((iterCrc ^ crc32Table_[i]) >> 24) == 0) {
           tableIdx[j] = i;
           iterCrc = (iterCrc ^ crc32Table_[i]) << 8;
           break;
        }
     }
  }


    If performance is a concernt, the inner for loop can easily be converted into a table lookup. The code would then look like


  int tableIdx[4];

  UInt32 iterCrc = ~desiredCrc;
  for (int j = 3; j >= 0; j--) {
     tableIdx[j] = crc32RevTable_[iterCrc];
     iterCrc = (iterCrc ^ crc32Table_[i]) << 8;
  }


    So far we haven’t looked at either the CRC of the message we are patching, nor the actual values of the patch bytes. None of these values were needed to find the table indexes. The next step in the algorithm is to find the values that generate the indexes needed to produce the final desired CRC. This is done by starting the normal CRC algorithm from the end of the original message and appending the four patch bytes. In each step, we need to calculate the actual patch byte before calculating the CRC. Since we know what table index to expect, we simply need to make sure that

  crc32Table_[tableIdx] == crc32Table_[patch ^ (crc & 0xff)]

for each patch byte. So with msgCrc being the CRC hash of the message, we get the following code to finally find the four patch bytes:


  UInt32 crc = ~msgCrc;
  for (int j = 0; j < 4; j++) {
     patch[j] = (crc ^ tableIdx[j]) & 0xff;
     crc = (crc >> 8) ^ crc32Table_[patch[j] ^ (crc & 0xff)];
  }


Usage

    Reverse CRC may be useful when a message needs to be modified but the CRC of the message cannot be changed. The message can then be patched as desired, and a four byte patch block will be inserted to correct changes in CRC.

    Other uses may include cases where you want to CRC any message but for some reason have constraints on what the generated CRC can be. I’ve seen cases where this actually was needed and the best solution was to add a patch to the message CRC to force the final CRC to be within a certain range.


Source Code

    The complete source code of a CRC implementation that includes a method for finding a four byte patch for the reverse CRC, as well as some test cases can be found here.


26 comments:

  1. nice, here we have a crc32 patch byte calc, even you can predefinie the patchbyte positions.

    http://www.xs4all.nl/~itsme/projects/sources/crc.c

    I would like to adopt this code to crc16 but I am unable :( could you help me?

    ReplyDelete
  2. It should be pretty simple to port to crc16. In the code you referred to, you need to change the target CRC calculation to

    crcTarget=fileCRC(f, origCRC, fileLength, offset+2);

    and then modify calcBytes() to calculate only two bytes (remove indexes 2 and 3 and change the shift from 24 to 8) and update the table calculations in initcrc() to initialize tables of shors instad of longs based on the CRC polynom you want (there are a couple of commonly used ones, check e.g. wikipedia)

    You can also modify the code posted here to calculate the bytes, just change the reverse lookup to loop through 2 bytes and change the shift from 24 to 8.

    Hope this helps

    ReplyDelete
  3. Is your CRC reverse method valid for checksums created using a really long string(say, of 100 characters)?

    ReplyDelete
  4. Yes it should be. The algorithm starts with the CRC of the message and you can get that using a standard CRC algorithm, and then it appends four bytes to get the CRC you want.

    As the first comment suggests, there are cases where you want to have other positions than the end for your CRC. Then the program needs to be patched, but it should be pretty straight forward.

    ReplyDelete
  5. Thanks for the reply Daniel.
    Well I have a question in mind.
    Suppose I have a CRC32 checksum, and I have a string which was used to create that CRC32 checksum. But I also do NOT have a portion of they string(called the key), which was used to create the CRC32 checksum.
    So let's say the string(minus the key) is like this:
    abcdef000000001.00mnopxyz
    The key is also somewhere in the string which was used to create the final CRC32.
    I wanted to know, suppose I wish to interchange that 00000001.00 part to 00001000.00, keeping the rest of the string the same, is it possible to somehow keep the CRC32 value(checksum) the same as before, by using your method?
    In short, is your technique of reversing the CRC32 valid if a part of the string is not known at all?

    ReplyDelete
  6. If you want to keep the same final CRC you have two options:

    1. Add four patch bytes somewhere in the string after the change (either in the middle of the string or at the end)

    2. Replace four bytes in the original string after the change with four patch bytes

    So with your example you can do e.g. (first two are adding patch bytes and the last replacing):

    abcdef00001000.00mnopxyzXXXX
    abcdef00001000.00mnoXXXXpxyz
    abcdef00001000.00XXXXxyz

    All you need to do is calculate the CRC of the original message at the location just after the patch bytes are inserted/replaced, then use the algorithm to generate the patch bytes.

    If you do this you'll keep the original CRC.

    Its not possible though to keep the original CRC of the message by only making your change. Four patch bytes somewhere after the change is needed.

    ReplyDelete
  7. Okay so I need to calculate the CRC of the original message at the location just after the patch bytes are inserted/replaced before using the algorithm.
    That means, I calculate the CRC of abcdef000000001.00 and then proceed as you have done.
    So the fact that I have no idea about a part of the string(which is probably at the end) doesn't hamper my efforts, right? (Remember that original string+that hidden'key' part was used to create the CRC32 checksum)
    Sorry for sonding dumb, but this hidden 'key' is creating too much problems for me!

    ReplyDelete
  8. Indeed, you begin by calculating the CRC of
    abcdef000000001.00
    assuming that you can insert four patch bytes after your new key, i.e.
    abcdef00001000.00XXXX

    Then you calculate the CRC for the new part (excluding the patch bytes).

    When you have these two CRC's you apply the algorithm to calculate the patch bytes. Use the original CRC as the desired CRC, and the CRC from the modified string as the input.

    Finally you put the entire string together and you have your new string:

    abcdef00001000.00XXXXmnopxyz

    The new string will have the same CRC as the original string.

    Hope this helps.

    ReplyDelete
  9. I'm new to C++ and would like to use your code to generate a CRC32 from a number, and subsequently reverse the CRC32 back to the original number. Would you mind demonstrating how I would go about this? Thanks very much.

    ReplyDelete
  10. It sounds like you are interested in encrypting numbers and then decrypting them. A CRC is a hash function that basically creates a checksum of the data. There is no way to revert back to the original data from a CRC.

    You may want to look at e.g. http://en.wikipedia.org/wiki/Format-preserving_encryption

    or any block ciphers to get what you need.

    ReplyDelete
  11. @Daniel- thanks a ton! I could do it finally. Though in one of the cases where I didn't know the initial parts of the string, I couldn't, as I could never find out the original CRC upto that point where I want the change.

    ReplyDelete
  12. I'm glad that it worked! Just curious, in the case where you didn't know the initial parts of the string, how are you using it? If you know the whole string after the point where you add the patch, you could reverse calculate the CRC from the end of the string to where you want to add the patch. But then you need to know all data from that point to the end.

    ReplyDelete
  13. Hi Daniel. No it never worked in the case when I didn't know the initial parts of the string. But it did when I knew it fully. Oh yes, though I didn't try this case- Should it work in cases when I don't know, say, last few characters of the string, instead of first few?(This part isn't involved in patching or anything)

    ReplyDelete
  14. I think it is possible to revert the CRC back from the end to say the N'th character of the string, and then on add a 4 byte patch at that location based on your new string that would give the same CRC as your original message. So essentially you could have a (partially) unknown beginning of the string, replace it with a known beginning of the string and a patch sequence, followed with the known ending and end up with the same CRC as the original string.

    If this is what you need I can take a look at how to achieve it. I think that it is possible to do this with only a small modification to the algorithm.

    ReplyDelete
  15. Yes I got your point. But what if the original CRC was calculated from beginning to end? Will the method of calculating from end till N work?

    ReplyDelete
  16. Hi Daniel. Nice method this.
    But I have a problem while working in plain-text method. As in, I could find patch bytes XXYYZZAA but how do I insert it in my original string?
    For example, if my original string is DANIELVIK. I want to keep the original CRC(x=crc(DANIELVIK)) and my new string is DANIELVIK1. To have the same x, how do I insert patch bytes?
    This is because the patch bytes obtained are hex values. Of course they can be converted to plain-text printable values, but there are certain non-printable characters which are coming up too. In that case how I do use the obtained patch bytes?
    Thanks, and eagerly awaiting your answer.

    ReplyDelete
  17. Interesting question. I think the algorithm need to be modified a bit for ascii. Now it inserts 4 binary values. For ascii I would insert e.g. Base64 encoded values, but it would require some work to modify the algorithm.

    ReplyDelete
  18. Hi Daniel. I concur with PHURPA above.
    It would really help if you could suggest a way around those non-printable characters.
    Hope you can find time for it.
    Regards.

    ReplyDelete
  19. I also think it would be a very handy feature. I'll see what I can do.

    ReplyDelete
  20. Hi,
    I am hoping you can help me with a bit of a challenge. I am trying to write a PST password app for myself. There are loads of them published on the web, and I know I am getting close, and I want to do it as a challenge to myself.. (There are some great free ones out there already).
    I am aware that the password hash is stored in the PST and can retrieve this.. but that is where my understanding fails. Can you offer any advice? From what I have read it is a simple modification of CRC32, do you have any knowledge of how i could cause the collisions I am after to get a different string to work as the password?

    ReplyDelete
  21. I am about to publish an article and an update to the reverse CRC that returns an ASCII string instead of a binary patch. If you want to implement it yourself, you can read the post and not peek at the code. I hope to have it ready tonight.

    ReplyDelete
  22. Hi Daniel,
    This method will work when CRC method of the original message is the same like in my CRC.
    Am I right?
    Another question - is it possible to find real CRC method when some packets have right CRC using standard CRC16 check, but some packets haven't?

    ReplyDelete
    Replies
    1. Hi,

      Sorry for slow response (missed the post). This method uses the standard 32 bit CRC method used in many places. There are however other 32 bit CRC polynoms in used, and they would need a different CRC table. There are also some variations on whether the CRC is negated before/after calculation.

      The code use fairly standard naming, and I don't think its hard to make a CRC16 version.

      I'm not sure what you mean with the second question. Do you want to correct invalid CRC's by modifying the packets, or try to detect whether a CRC is present?

      Delete
    2. Hi,
      thank you for answer.
      Re second question:
      I did a small mistake while capturing and analysing the incoming data and therefore I was looking for CRC "crack".
      Finally, when I have checked everything again, problem has been solved :).
      It was soemtimes like a panic while you have good results with bad results and you are looking for help instead of self-checking.

      Delete
  23. Hi,
    Please help me with something...
    I am using a slight different method of generating crc table and computing the hash and I am stuck at modifying the reverse computing function.
    Could you please help me and tell me where I am wrong?
    Here's the modified code. Thank you!

    //////////////////////////////////////////////////////////////////////////////
    // Calculate lookup tables for any polynomial. The default tables use the
    // polynomial 0x04c11db7.
    //////////////////////////////////////////////////////////////////////////////
    void Crc32::CalculateTables(UInt32 polynomial)
    {
    // Create CRC lookup table
    for(int i = 0; i < 256; i++)
    {
    crc32Table[i] = i<<24;
    for (int j = 0; j < 8; j++)
    crc32Table[i] = (crc32Table[i] << 1) ^ (crc32Table[i] & (1 << 31) ? polynomial : 0);
    }
    }


    //////////////////////////////////////////////////////////////////////////////
    // Appends data to the CRC object. The CRC hash is updated with CRC
    // calculations of the provided buffer.
    //////////////////////////////////////////////////////////////////////////////
    void Crc32::append(const void* buffer, UInt32 size)
    {
    const UInt8* ptr = (const UInt8*)buffer;

    while (size--) crc_ = (crc_ << 8) ^ crc32Table[*ptr++ ^ (crc_ >> 24)];
    }


    //////////////////////////////////////////////////////////////////////////////
    // Finds reverse CRC patch bytes. The method returns four bytes that
    // can be appended to the CRC in order to get the desired CRC
    //////////////////////////////////////////////////////////////////////////////
    const UInt8* Crc32::findReverse(UInt32 desiredCrc) const
    {
    int crcIdx[4];
    int i, j;

    UInt32 iterCrc = desiredCrc;
    for (j = 3; j >= 0; j--)
    {
    for (i = 0; i < 256; i++)
    {
    if (((iterCrc ^ crc32Table[i]) && 0xFF) == 0)
    {
    crcIdx[j] = i;
    iterCrc = (iterCrc ^ crc32Table[i]) >> 8;
    break;
    }
    }
    }

    UInt32 crc = crc_;
    for (j = 0; j < 4; j++)
    {
    patchBytes[j] = ((crc>>24) ^ crcIdx[j]) & 0xff;
    crc = (crc << 8) ^ crc32Table[patchBytes[j] ^ (crc >> 24)];
    }

    return patchBytes;
    }

    ReplyDelete