Daniel's Software Blog: C Language Quirks

Saturday, May 1, 2010

C Language Quirks

Background

During the years, I’ve had the privilege to work with prototypes of processors still under development that sometimes existed only in a handful of copies in the world. My task has been to port operating systems to these new processors. Similar the processors, the accompanying tools were also early beta versions. As with most beta software, the compilers I used were somewhat buggy and they required a quite deep knowledge about the C language in order to debug issues introduced by incorrect compilations.

Learning the ins and outs of the C language was something I really enjoyed and at the time I knew more or less the entire C language standard with all its weird behavior that are pretty unknown to most regular developers. In this article I’ll try to describe some of the more interesting things I learned over the years.

Shift Operator Oddities

Many developers interested in the details of the C language know that shift operators don’t always have an intuitive behavior. For example, shifting down a negative number may on some targets be a division while it isn’t a division on others. This is called undefined behavior and the result of the operation depends on whether the CPU fills the high bits with the sign bit or not. Basically it is strongly recommended not to use shift operators on signed operands (unless you are absolutely sure you are only working with positive numbers, which you probably can’t be).

My favorite quirks on shift operators are on unsigned operands, for which shift operations generally work fine. Consider the following example and try to guess the output:

    unsigned char c = 2;

    unsigned int  i = 2;

    printf("%d\n", c >> 9);

    printf("%d\n", i >> 33);

    printf("%d\n", c << 9);

    printf("%d\n", i << 33);

The example is performing left and right shift on a byte and a word, and the amount shifted is bigger than the width of the type, in this case 9 for the byte and 33 for the word. The idea is to perform the same operation on two different types and highlight some artifacts. The example actually shows a couple of different quirks of the C language, and the result also dependent on the processor it is executing on.

First off, the result of the shift operation is undefined if the right hand operand is greater or equal to the width of the left hand operand. This means that if the amount shifted is larger than 32 on a 32 bit type, the result depends on the compiler and the processor. Power PC and ARM processors typically yield the result 0 for both left and right shift of more than 32 bits, which is what most humans would think is logical. But most Intel processors since 80286 use only the low five bits in the right hand operand, so the expressions above are in reality shifting the integers only one bit.

The second issue is that shift operations always promote the left hand operand to (16 or 32 bit) integer. Although it looks like the bytes in the example are shifted 9 bits which is more than a byte, it is not affected the undefined behavior described above. The reason is that the byte is promoted (i.e. converted) to an integer before the shift operation is executed, and 9 bits is of course a perfectly valid amount to shift an integer value without any side effects.

The result of the small program is shown in the table below. As we are dealing with undefined behavior, I captured the results of the most common architectures to highlight the difference:

                   Intel    PPC & ARM
      c >> 9         0          0
      i >> 33        1          0
      c << 9         1024       1024
      i << 33        4          0

More Shift Operator Confusion

Shift operators also have another unexpected behavior. The shift operators work logically much like multiplications or divisions and most programmers would probably expect the same precedence as the multiplicative operators. That is, the expectation may be that shift operations are executed before additions. This is however a big mistake. The shift operators have in fact very low precedence, and most arithmetic operators including additive operators have higher precedence. This can easily lead to bugs for example:

int i = a << 2 + 1; /* Calculate a * 4 + 1 */

But because of the weaker precedence of the shift operator, the expression above is actually equivalent to

int i = a << (2 + 1);

which is far from what the programmer intended.

Bitwise Operator Mistakes

Many developers that are programming embedded systems and applications that are close to hardware are well aware of the powers of bitwise operators and use them frequently. There is really nothing strange with these operators, but sometimes there are bugs caused by not understanding their behavior or just too speedy programming. The example below shows one case where the behavior isn’t what the programmer intended, and it is quite well hidden:

/*****************************************************
** Outputs state of individual bits in input value
*****************************************************/
void printBits(unsigned char val)
{
    if (val & 1 != 0) printf("Bit 0 is set\n");
    else              printf("Bit 0 is cleared\n");

    if (val & 2 != 0) printf("Bit 1 is set\n");
    else              printf("Bit 1 is cleared\n");

    if (val & 4 != 0) printf("Bit 2 is set\n");
    else              printf("Bit 2 is cleared\n");
}

The function is nicely written, well commented, and the logic is straight forward. The developer designed it to test one bit at the time by masking other bits, and if the result is non-zero, the bit is set. So for example invoking the function like

printBits(0x05);

should ideally show that bit 0 and 2 is set and bit 1 is cleared. Somewhat surprising though is that the output is

    Bit 0 is set
    Bit 1 is set
    Bit 2 is set

The reason is however quite simple. Most mistakes with bitwise operators are caused by not understanding the precedence rules. In this case the inequality operator has higher precedence than the bitwise AND operator. What this means is that the expressions in fact look like:

if (val & (1 != 0))

A logical expression X != Y returns 0 if X and Y are equal and non zero if they are different. Most compilers always generate 1 as result if the test is true, which means that all if statements in the function turns out to be the same:

if (val & 1)

and basically test if bit 0 is set or cleared regardless of the numeric value, which wasn’t at all what the developer had in mind when writing the function.

Bitwise Not Operator

The bitwise not is a unary operator that calculates the bitwise complement of the operand and is a handy operator when performing bitwise arithmetic. Below is an actual real life example on when the behavior may not be exactly as one would expect. The code sets the bitwise complement of two bytes into an integer:

    unsigned char a = 0x44;
    unsigned char b = 0x88;

    unsigned short val = (((~a) << 8) | (~b));

This looks like a safe operation, the code carefully left shifts one of the values to avoid overlap, and then OR the two bitwise complements into the integer. The expression is well parenthesized to ensure there are no precedence issues. The expected value is naturally 0xBB77, with the high 8 bits being the bitwise complement of variable a, and the low 8 bits being the bitwise complement of variable b.

There is a small catch though; the developer writing this program overlooked that the bitwise complement operator performs integer promotion of the operand. In the example above, this means that the bitwise complement of 0x88 is not 0x77 as the developer assumed. The integer promotion is done prior to the operator being executed, and the result is actually 0xFFFFFF77. When OR-ing with the bitwise complement of the other byte, and after truncation to unsigned short, the resulting value turns out to be 0xFF77.

Bizarre Promotions

Some of the examples so far have shown unexpected consequences of automatic integer promotion. Sometimes these cause bugs that are quite hard to find. The following example shows a perfectly valid and commonly used test to see if an addition of two positive integers overflows.

    void testOverflow(int a, int b)
    {
        if (a + b < a) printf("overflow\n");
        else           printf("no overflow\n");
    }

This function could for example be called with two large numbers, and the function would print “overflow”, e.g.

testOverflow(INT_MAX, INT_MAX);

It is easy to think that this piece of code can be reused with different data types. Assume that someone needs to do the exact same test, but he is working with signed bytes instead. It may seem to be straightforward to just take the code and change the types:

    void testOverflow2(char a, char b)
    {
        if (a + b < a) printf("overflow\n");
        else           printf("no overflow\n");
    }

But when invoking it with the signed byte equivalent of INT_MAX:

testOverflow2(CHAR_MAX, CHAR_MAX);

The output surprisingly enough is “no overflow”.

Considering the only change that was made was to change the type, the different output may be a bit confusing. But there is a very good reason for the behavior, and you guessed, it is spelled integer promotion, and the variables a and b are promoted to integer before the addition is executed, and thus the result is always greater or equal to the right hand operand in the comparison.

More Bizarre Promotions

Another example when the promotion rules play some tricks is shown below. Consider the code:

    signed char a = 127;

    printf("%d\n", (a + 1));
    printf("%d\n", ++a);

In this example it looks like both print statements would show the same output. Since the variable a is a signed character with the value 127, the natural expectation is that when adding one, the value would overflow and wrap around, and become -128.

This is however only true for one of the statements. The ++ operator adds one to the operand and the result is stored in the operand. Since the variable is a char, the addition will overflow and give the result -128. The printf function takes an integer as argument, and the char value will be promoted to integer. Promoting a negative char value to integer, keeps the sign and value, and thus the output of the printf function will be -128.

For the first statement the integer promotion works differently. The variable a is promoted to integer prior to executing the addition, which results in an integer addition 127 + 1, which equals 128. The result is already an integer so no further promotion is needed for the printf call, and the output is 128.

Even More Bizarre Promotions

Integer promotion can also be a problem in conditions. The following example illustrates this:

    signed char   cs = -1;
    unsigned char cu = cs;

    signed int   is = -1;
    unsigned int iu = is;

    if (is == iu)     printf("equal\n");
    else              printf("not equal\n");

    if (cs == cu)     printf("equal\n");
    else              printf("not equal\n");

Again, we are comparing the behavior of the same operations made on an integer and on a byte type and as you may expect we get different result and the cause is not surprisingly integer promotion.

The first if statement compares a signed and an unsigned version of the same integer value. Both operands are promoted to the biggest type larger or equal to int. In this case unsigned int is consider biggest and both operands are thus converted to unsigned int. The conversion is trivial, and since we assigned the variables to the same value the comparison will show the values are equal.

More interesting is the second comparison. The same rules apply, both operands are promoted to the biggest type larger or equal to int. In this case both types are smaller than int, so both operands are promoted to int before the comparison is executed. Here comes the interesting part. The variable cs is a signed char with the value -1.

Promoting it to integer preserves the sign and the resulting iteger is thus -1. The variable iu on the other hand has the value 255 (same binary representation) but promoting unsigned characters to int also preserves the sign, and the resulting integer will be 255. It is clear then that the test will show that the two values are not equal.

Lexical Ambiguities

The C language has many lexical and syntactical issues including case statements without breaks, if – else statements without brackets, and other weak constructions that allow for mistakes. There are however a couple of cases, where the language actually is ambiguous and you can write a statement that doesn’t do what you expect. One of my favorites is shown the following example:

    #define inverse(a) 1.0/a

    double d[2] = { 2.0, 2.0 };

    printf("%.1f\n", inverse(*d)   /* print inverse */ );
    printf("%.1f\n", inverse(d[1]) /* print inverse */ );

The program is very straight forward with a a method, inverse, that is implemented as a macro and does just what it suppose to do, calculating the inverse of the input argument. The output is not as straight forward though. You would expect the two print statements to output the same value, 0.5, since the program is passing the value 2.0 to inverse in both cases. At least that what it looks like and the output is correct for the second statement. But the first statement outputs 1.0 (!). So did we exploit some weird floating point bug here or what is going on?

Well, the answer may get clearer if we perform macro expansion on the first statement:

printf("%.1f\n", 1.0/*d /* print inverse */ );

Maybe not entirely clear yet, we all know that the indirection has higher precedence than the division, so the program should fetch the first value of the array which is 2.0.

The problem is however not at all related to any pointer indirection issues or floating point bugs. What happens here is that the division followed by the indirection is parsed as a multi-character token in the C language, in this case the start of a comment. As you know a comment ends with the */ token, and everything in between is treated as a comment and not code. So the division and the dividend are interpreted as part of a comment and are ignored.

Fun with unary operators

Although the following example is really not that weird and it’s a perfectly valid and well specified C expression, I find it somewhat amusing.

    int i = 2 - - - - - - 2;

    printf("i = %d\n", i);

Most programmers know that unary operators have higher precedence than additive operators which basically means that e.g. 2 + (-1) equals 1. Usually the unary operator is placed immediately before the operand, but this is not required by the language. Furthermore, the parentheses aren’t needed either, because the unary operator has higher precedence than the additive operator. A more explicit version of the same expression would look like

int i = 2 – (-(-(-(-(-2)))));

and it is more clear that it is a subtraction of two values where the latter has a series of negations. The output of the small program would thus be 4.

Conclusion

I tried to show a couple of interesting cases where the C language may not be as clear and straight forward as it perhaps should be. These issues have resulted in many bugs and I think a stricter definition had helped a lot, but I’m sure there are good reasons for all the undefined and unintuitive behavior in the C language. The solution is rather simple; spend a lot of time to learn the specification. This may sound harsh, but I’m really happy that I was forced to learn it the hard way. I’ve had plenty of fun with the knowledge over the years and I know I share it with others that also had the privilege to dig deep into the language.

40 comments:

Maintenance ManMay 6, 2010 at 12:42 PM
That expression with the bunch of unary negative sign operators is pretty trippy. So should we ask this one on C programmer interviews? Ha.
ReplyDelete
Replies
Isaak LimMay 6, 2010 at 12:49 PM
Imagine my surprise when reading through reddit.com and finding your blog ^^
- Isaak
ReplyDelete
Replies
AnonymousMay 6, 2010 at 5:18 PM
Shifting signed integers is not undefined behavior. It's implementation-defined behavior.
ReplyDelete
Replies
AnonymousMay 6, 2010 at 5:43 PM
Correction: Right shifting signed integers is not undefined behavior. If the number is negative, it's implementation-defined behavior, otherwise it's defined. Left shifting signed integers is undefined behavior if an overflow occurs or if the number is negative.
ReplyDelete
Replies
AnonymousMay 6, 2010 at 10:16 PM
Your overflow examples are not valid code in the specification sense. Integer overflow is undefined behaviour . For example a compiler could rightfully optimize the if away, because (a+b < a) is always false for positive ints. Unfortunately many programs rely on this behaviour. Thus, a compiler writer not only has to know the C standard, but also the expectations.
ReplyDelete
Replies
AnonymousMay 6, 2010 at 10:42 PM
Also regarding overflow, if you use unsigned ints it is not undefined.
ReplyDelete
Replies
AnonymousMay 7, 2010 at 12:18 AM
The bug with the inverse macro can only happen on a broken compiler, or maybe a very old one. The compiler has to remove comments and replace them with spaces before it executes preprocessor directives. So when it's doing macro substitution, comments no longer exist.
ReplyDelete
Replies
Jean-DanielMay 7, 2010 at 4:05 AM
Nice post.

About the "Bitwise Operator Mistakes". Smart compiler (clang for instance) are able to detect this kind of mistake.

For example, trying to compile the printBits() function, it will output a couple of warning like the following one:

"warning: & has lower precedence than !=; != will be evaluated first "
ReplyDelete
Replies
Daniel VikMay 10, 2010 at 8:00 PM
Thanks for great comments. I am aware that the overflow example is a bit misleading and the behavior with integers is not fully described. I wanted to show the different behavior of a 'simple' type change but I realize after reading comments that the initial behavior of the integers probably are more interesting.

The inverse macro example is also more interesting than I first thought and if someone have some more insight I'd be happy to learn more. I tested the code on several different compilers, and some remove the comments before macro expansion and some after. Interesting is that even two different versions of gcc got different result.
ReplyDelete
Replies
mumpsimusMay 19, 2010 at 8:55 PM
Variation on one of your examples above:

#define inverse(a) 1.0/a

double d[2] = { 2.0, 2.0 };

cout << inverse(*d) << ", " /* print both */
<< inverse(d[1]) << endl; /* versions */

This will print "10.5".
ReplyDelete
Replies
mumpsimusMay 20, 2010 at 11:35 AM
By the way, this works fine in GCC. Looking at the preprocessor output, it looks like the GCC preprocessor actively prevents generation of '/*':

inverse(*d)

becomes:

1/ *a

Which raises the challenge, can one defeat GCC's preprocessor so as to generate the original, 'bad' behavior...
ReplyDelete
Replies
-February 9, 2011 at 4:20 AM
also, the % operand acts differently between processors... try % in a negative number. like -10 % 360 , would contain 350 on some processors, and -10 on others...
ReplyDelete
Replies
heymeSeptember 4, 2011 at 2:28 AM
amazing concepts of c guys...
watch these on http://technoease.com/453/programming/c-programming/control-instructions-i/#more-453
ReplyDelete
Replies
UnknownFebruary 24, 2014 at 6:11 AM
http://letsplaywithc.wordpress.com/
ReplyDelete
Replies
IQ Business CenterAugust 12, 2014 at 5:44 AM

this is very good nice article. this is very useful for C Language students.

==========================================================
This is very nice article. This is very use ful for C Language Learners.
http://www.bytesonlinetraining.com/c-language-online-training/
===========================================================

hi sir. i want to do C Language training.
C Language ONlINE TRAINING Thanks for providing valuable information.
ReplyDelete
Replies
UnknownMarch 2, 2016 at 1:48 AM
Hey man I just wanted to say thanks for taking the time to write something worth reading .i am also say somethink about sap modules likesap modules
thank u
ReplyDelete
Replies
AngelFebruary 10, 2017 at 3:16 AM
Lovely Post!
Angel Investor in India
ReplyDelete
Replies
AngelFebruary 10, 2017 at 3:16 AM
Thanks for doing such a great job which many friends through this.Keep related blogs is really well written.
San Diego Movers
ReplyDelete
Replies
jOHNJanuary 14, 2019 at 7:22 AM
vidmatein JhR
vidmatein JhR
ReplyDelete
Replies
tiger shroffFebruary 5, 2019 at 5:06 PM
The article was very impressive i love to read this article vidmate I love reading bloggin everyday.
ReplyDelete
Replies
SOPHIE lilyMay 22, 2019 at 12:22 AM
https://nine9appsdownload.blogspot.com/2019/04/free-grab-app.html
https://nine9appsdownload.blogspot.com/2019/04/master-plus-app.html
https://nine9appsdownload.blogspot.com/2019/04/hd-videos-downloader.html
https://nine9appsdownload.blogspot.com/2019/04/download-helper.html
https://nine9appsdownload.blogspot.com/2019/04/downloader-app.html
https://nine9appsdownload.blogspot.com/2019/04/video-player.html
https://nine9appsdownload.blogspot.com/2019/04/video-player_11.html
https://medium.com/@msshilpikgupta/hulu-plus-96d1fec25554
ReplyDelete
Replies
EntertainmentMay 27, 2019 at 11:44 PM
This post is really helpful to know. Thank you very much!Tamil Movies Download
ReplyDelete
Replies
yahoosportsJune 19, 2019 at 7:53 PM
For free video downloader, you can also try Vidmate, a powerful aggregated audio-video player & live broadcast software, you can watch a large number of videos from over 1000+ sites in overseas countries. It is renowned as a well-deserved movie-watching artifact. What’s more, you can download the hottest movies, music, HD videos and live TV on your Android device for free! Vidmate
ReplyDelete
Replies
patJune 30, 2019 at 9:09 PM
very good
vidmate
ReplyDelete
Replies
patJuly 3, 2019 at 10:10 PM
nice blog
vidmate
ReplyDelete
Replies
patJuly 3, 2019 at 10:10 PM
nice blog
vidmate
ReplyDelete
Replies
patJuly 3, 2019 at 11:17 PM
nice web
vidmate
ReplyDelete
Replies
HowToJuly 16, 2019 at 5:21 AM
Thanks For This Post It helped Me Alot
ReplyDelete
Replies
AliceSeptember 16, 2019 at 9:16 PM
Well done. Download Likee video for short video making and sharing. Magical video editing tools.
ReplyDelete
Replies
Work MailSeptember 19, 2019 at 12:15 AM
There are three main reasons many people fail building business credit. The 1st BIG reason people fail, is that their business isn't setup credibly in the lender's eyes. The perception lenders, vendors, and creditors have of your business is critical. Before applying for business credit a business must insure it meets or exceeds all lender credibility standards. There are over 20 credibility points that are necessary for a business https://adesk.cheapsoftwaredownload.net/autodesk-fabrication.html have a strong, credible foundation.
ReplyDelete
Replies
Justin WadeJanuary 7, 2020 at 2:38 AM
Private loans, also known as alternative or private student loans, are providing a growing number of college students with much-needed education funds to cover college-related expenses that may not be covered by award caps, Federal student loans, scholarships and grants. Find out why... buymodafinilonline.reviews
ReplyDelete
Replies
AshuJanuary 12, 2020 at 6:47 AM
Street Dancer Full Movie Download 720p leaked by Filmywap,Filmyzilla,123mkv watch online

Street Dancer Full Movie Download Free Leaked

Street Dancer Full Movie Download
ReplyDelete
Replies
Edward foxJanuary 14, 2020 at 11:02 PM
Dreamweaver is the web's number one development software for websites and our chosen platform for Adobe Creative Suite Training. Over the years it has progressed from being a static web development tool, to being one that can integrate with dynamic, database driven software like WordPress. how to download lumion
ReplyDelete
Replies
PM Tech TricksFebruary 1, 2020 at 6:12 PM
Avee Player template
Avee Player template
ReplyDelete
Replies
PM Tech TricksFebruary 1, 2020 at 6:12 PM
VidMate apk download install
VidMate apk download install
ReplyDelete
Replies
Jakson butterFebruary 4, 2020 at 10:46 PM
I can’t really help but admire your blog, your blog is so adorable and nice , putlocker
ReplyDelete
Replies
rakhi sharmaJune 23, 2020 at 2:01 AM
APPRECIATING POST Yuvi's Blog Canlimacizle
ThopTV App

ReplyDelete
Replies
adityaApril 12, 2021 at 1:58 AM

very nice blogs!!! i have to learning for lot of information for this sites.Sharing for wonderful information. Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing

Acquired and Inherited Traits
fiu canvas
Alphanumeric Characters
Area of a Rhombus
ReplyDelete
Replies
Toasty Toad stoolDecember 24, 2021 at 12:28 AM
C is interesting language ! shareit app vidmate 2017
ReplyDelete
Replies
Toasty Toad stoolJanuary 4, 2023 at 11:16 PM
This function could for example be called with two large numbers, and the function would print “overflow” vidmate 2019 instasave downloader
ReplyDelete
Replies

Add comment

Daniel's Software Blog

Saturday, May 1, 2010

C Language Quirks

40 comments:

About Me

Followers