When first learning C pointers there is one thing I wish had been better explained; operator precedence vs order of operations.
1 2 3 4 5 6 |
int myarray[4]= {1,2,3,0}; int *p = myarray; int out = 0; while (out = *p++) { printf("%d ", out); } |
The above example prints out 1 2 3. Code like *p++ is a common sight in C so it is important to know what it does. The int pointer p starts out pointing to the first address of myarray, &myarray[0]. On each pass through the loop the p pointer address is incremented, moves up one index in the array, but the previous p unincremented address (index) is dereferenced and assigned to the out variable. This happens until it hits the fourth element in the array, 0 at which point the while loop stops. But what does *p++ do? And how does it move from one element in the array to the next.
In terms of operator precedence the postfix operator (++) binds tighter than the dereference operator (*). If we were reading this wrong we might think that we are incrementing the value pointed to by our p int pointer. But what is actually happening is four separate operations and a clash between operator precedence and order of operations.
Think about this code.
1 2 3 |
int x = 0; int y = x++; printf("x = %d, y = %d\n", x, y); |
First x is set to 0. Then x is incremented by the postfix operator but y is assigned the old x value of 0. As the print shows y = 0 and x = 1. The postfix operator is actually a shorthand for 4 different steps. It first makes a copy of x in memory, then increments the copy of x’s value by one, then it assigns the incremented value back to the original x address, and finally returns the old value of x. If we were using the prefix operator ++x instead of the postfix operator x++ then the newly incremented value of x would have been assigned to y and y would have printed out 1.
The shortcut postfix code is the same as if we did this.
1 2 3 4 |
int x = 0; int y = x; x = x + 1; printf("%d ", y); |
Think of order of operations like driving a car home from work. There are multiple steps and some steps have to happen before others. First you pull out of the parking lot, then drive down street A, then street B, and so on until you get home. You can’t pull into your driveway before you pull out of the work parking lot. It doesn’t work that way. In the above example the x + 1 has to happen before the x = . The operations have to go in that order for the program to even work. Where people get confused is thinking about the ++ as an operator instead of as a shortcut for multiple operations.
The p++ in our first example acts the same way as the x code above. The * vs ++ in *p++ is a question of which to do first. Do we do all of the operations of ++ first or the dereference first? The postfix operator (++) binds tighter so we do all of the ++ operations first, then we do the dereference (*). This is order of operations vs operator precendence.
The postfix and prefix operators always follow four steps.
- Make a copy of the variable
- Increment the copy of the variable
- Assign the incremented copy back to the original variable
- Prefix ++x returns the incremented variable, Postfix x++ returns the unincremented variable
With an expression like *p++ there is an implicit return from p++ to the dereference operator.
A fully expanded version of *p++ in code might look like this. The p pointer is assigned to x, then p is incremented. But x is still pointing at the original unincremented p address.
1 2 3 4 5 |
int myarray[4]= {1,2,3,0}; int *p = myarray; int *x = p; p = p + 1; printf("*p = %d, *x = %d\n", *p, *x); |
Now that we know ++ is multiple operations and is actually about an order of operations, let’s play around with operator precendece. What happens if we use parentheses around the p++?
1 2 3 4 5 6 |
int myarray[4]= {1,2,3,0}; int *p = myarray; int out = 0; while (out = *(p++)) { printf("%d ", out); } |
Here we are performing the postfix operations first. This increments the p pointer address but returns the original unincremented pointer address, which is then dereferenced. This is the same thing that happened previously. *(p++) is equal to *p++ even though parentheses make the code more explicit to read.
If we have the parentheses around (*p) first what happens?
1 2 3 4 5 6 7 8 9 10 11 |
int myarray[4]= {1,2,3,0}; int *p = myarray; int out = 0; int i = 0; while (out = (*p)++) { if (i++ >= 4) { break; } printf("%d ", out); printf("%p\n", (void *)p); } |
Here the parentheses cause the pointer to be dereferenced first. Then the value pointed at is incremented and reassigned. As the output shows we never move to the next pointer address, we just keep incrementing the myarray[0] value.
What about this?
1 2 3 4 5 6 |
int myarray[4]= {1,2,3,0}; int *p = myarray; int out = 0; while (out = *++p) { printf("%d ", out); } |
This will print out 2 and 3 but not 1. Why? Because here we are using the prefix operator which does the increment and returns the incremented pointer address, not the original unincremented address. Still order of operations, just a different operation. Then the dereference happens on the incremented pointer address. If we used parentheses for *(++p) nothing would change.
What about if we changed the order and used parentheses?
1 2 3 4 5 6 7 8 9 10 11 |
int myarray[4]= {1,2,3,0}; int *p = myarray; int i = 0; int out = 0; while (out = ++(*p)) { if (i++ >= 4) { break; } printf("%d ", out); printf("%p\n", (void *)p); } |
Now we get an entirely different result, it prints out 2 3 4 5. Why? The parentheses (*p) says first dereference the p pointer to get its value. Then perform the prefix ++ to increment and reassign the dereferenced value. Then return the incremented value to the out variable. The first derefereced value is the int at myarray[0] = 1, that get incremented to 2 and reassigned back into &myarray[0]. On the next pass of the loop the same int get incremented to 3, 4 and so on. And as the output shows we never move to the next pointer address.
Getting this understanding of order of operations vs operator precedence makes understanding pointers in C much easier. Thinking about postfix and prefix operators in terms of all of their steps makes understanding common C idioms more clear.
Let me know your thoughts.
Update 1: There were some errors in the first version of this post. Thanks to lmm from hacker news for catching them. Code has been updated.
Update 2: Thanks to memorylane from reddit for bugfixes with the last example and suggestions about printing out the pointer values. Code has been updated.
Update 3: Thanks to belkiss from reddit for suggestions on clarity. Post has been updated.