If you are reading this you want to know more about c pointers. That’s a good thing. Even if you don’t program in C very often, understanding pointers gives you a deeper understanding how programming and memory works “under the hood”. Learning pointers will make you a better programmer. In this post we will start with variables and memory. We will look at how that relates to pointers. We will talk about the “why” behind pointers. We will discuss pointer operations. Then we will finish up with different types of pointers you will encounter.

What is a variable?

Let’s start simple. What is a variable? Most programmers will say a variable is a name for a piece of data that can change in a program. That’s true but it’s also just scratching the surface.

When a variable gets declared, memory to hold a variable of that type is allocated at an unused memory location. The location that is allocated is the variable’s memory address. For a compiler, a variable is a symbol for a starting memory address. The compiler knows two things about any variable, the name and the type. For int anum above, the anum is a symbol that gets translated to a memory address. The type, int, tells the compiler how much memory to store starting at that address.

A C compiler converts C source code to assembly source code. During that conversion variable names are converted to relative memory addresses. Here is an example in assembly of the code above. Don’t worry, you don’t need to know assembly to know pointers. This is just an example to show what happens.

Three things to notice. The DWORD,BYTE labels, the [rbp-4], [rbp-5] pieces, and the values 1, 97. The rbp is a base pointer. For our discussion, think of it like a starting point, a starting memory address. The [rbp-4], [rbp-5] are relative offsets, minus 20 and minus 4, from the starting point. The DWORD, BYTE are sizes, number of bytes to store. On my machine, a DWORD is 4-bytes, 32-bits, and a BYTE is 1-byte, 8-bits.

Put this all together and mov DWORD PTR [rbp-4], 1 says store 4 bytes with the value 1 starting at the relative offset [rbp-4], and mov BYTE PTR [rbp-5], 97 says store 1 byte with the value 97, the ascii value for ‘a’, starting at the offset [rbp-5]. When the program runs, the offsets like [rbp-4], are changed to actual memory addresses. The key takeaway is this. To a compiler all variables are just memory addresses and sizes.

To a compiler all variables are just memory addresses and sizes

What is a pointer?

C programs have different types of variables including ints, floats, arrays, chars, structs, and pointers. An int holds an integer number, a float holds a floating point decimal number. Arrays hold multiple values. A pointer is a variable that holds the memory address of another variable. It’s that simple. Above the int variable anum above holds the number 1 which is 4 bytes stored by the compiler at a starting at the relative offset [rbp-4]. When the program runs that offset might be the real memory address 0x1234. A pointer to anum would hold the value 0x1234.

The why behind pointers

Why do pointers exist? Why do we need them? The simple answer is efficiency. Back when C was created, computers were much slower. Most programs were written in assembly. Programmers needed to be much more efficient at solving problmes.

The more detailed answer has to do with call semantics. The C language is call-by-value. When you call a function in C, the value of any parameters are literally copied into the function’s call stack. Pass an int, 4-bytes are copied into the function. Pass a char and 1-byte is copied into the function. What happens when you need to pass a 100k element int array into a function? You don’t want to have to copy the 400,000 bytes into a function. That is really inefficient. Instead you have a pointer which references the array. The pointer, all 4 or 8 bytes of it, is copied into the function where it can be dereferenced and the array accessed. Same goes for large structs. Don’t pass a copy of the large struct in, pass in a pointer to the struct.

The operators

There are two main operators for working with pointers. The * operator and the & operator. There is also the -> operator but we will get to that later.

The * operator is used when declaring a pointer and when dereferencing a pointer. Declaring a pointer is like declaring any other variable. The compiler allocates spaces for the pointer. The size of a pointer, the number of bytes that are used to store each pointer, is dependent on the architecture of the machine. For 32-bit systems, pointers will be 4-bytes or 32-bits. For 64 bit systems, like most are these days, pointers will be 8-bytes or 64-bits.

The & operator is used to get the address of another variable. It is used to assign a value to a pointer. Putting the & operator in front of another variable returns a pointer to that variable of the type of that variable.

Pointer usage

Take the following code which shows some simple usage of the * and & operators.

  • Line 2 we use the * operator to declare an int pointer. In other words declare a variable that holds a memory address where the value-at that memory address is a int.
  • Line 5 we declare an int variable and assign it the literal value 1.
  • Line 8 we use the & operator to get the address-of the variable val and assign that address value to the ptr variable. We store the memory address of val in the variable ptr.
  • Line 11 we dereference the ptr variable retrieving the value at the address stored in the pointer.
  • Line 14 we dereference the ptr variable to set a new value to the address stored in the pointer.

Declaring a pointer is easy. It is the same a declaring a variable, the only difference being the * operator used in front of the variable name which indicates a pointer. Assigning a value to the pointer is easy, we use the & operator to get the address-of a variable of the correct type. Dereferencing is often where confusion lies.

Dereferencing Pointers

Dereferencing is just indirection. It is telling the compiler, “I have the address of a variable in the pointer. I want to access that pointed-to address either to get a value or set a value “. A pointer holds a reference to a variable; the reference being the memory address stored in the pointer. When we access the value at that reference, we de-reference the pointer.

Dereferencing can be used to either indirectly get a value from the pointer address or to assign a value to the pointer address.

Let’s look at an example.

  • Line 6 we declare an int value named ival and assign it the value 1.
  • Line 7 we declare an int pointer iptr and assign the address of ival to iptr.
  • Line 10 we dereference the iptr variable to get the value pointed to by iptr and assign it to the int variable named get.
  • Line 11 we print out the get variable.
  • Line 14 we dereference the iptr variable to set a new value, changing the value to 2. Literally we are assigning the value 2 to the address pointed to by iptr.
  • Line 15 we dereference the iptr variable again to get its value and assign that value to the int set variable.
  • Line 16 we print out the value of the set variable. It is now 2.
  • Line 17 we print out the value of the ival variable. It is also now 2.

If we run that code and we get:

*iptr = 1
*iptr = 2
ival = 2

In this example we have used dereferencing to both get and set values. Some people get confused and think dereference means getting a value. It doesn’t. Dereference means to indirectly access the address stored in the pointer. You can get a value, like we do in line 6 above, or you can set a value, like we do in line 10 above.

Pointers and types

Take a look at the following code.

When we declare a int pointer we are declaring the variable as a pointer, that it holds the address of another variable, and that the value at that address is an int. Same goes for a float pointer, char pointer, or any other type. Declaring a pointer to be a specific type tells the compiler when the pointer is dereferenced the value pointed to will be of that type.

You will notice in the example above we declare a pointer type and then assign the address of a value of the same type. If you were to uncomment the last few lines and try to compile that code it would give “assignment from incompatible pointer type” errors and wouldn’t compile. You can only assign addresses to pointers of the same type.

The & operator returns a pointer of the type it is in front of. In the code above &ival returns an int pointer, fval returns a float pointer, and &cval returns a char pointer. Anywhere a pointer can be used, an equivalent &var can also be used.

Pointers to arrays

Just like you have a pointer to an int or float, you can have a pointer to an array as long as the pointer is the same type as the elements of the array.

Pretty simple. In fact if int *ptr looks exactly like an int pointer, that’s because it is. When an array is created, int myarray[4] = {1,2,3,0};, what actually happens is the compiler allocates memory for the entire array and then assigns a pointer to the array variable, in this case myarray, holding the address of the first element in the array.

Some people get confused and start thinking you can interchange pointers and arrays. You cannot. You can assign an array variable to a pointer of the same type but not the opposite. When an array is created, the array variable cannot be reassigned.

Here is an example

Pointers to structs

Like an array, a pointer to a struct holds the memory address of the first element in the struct. Here is some example code for declaring and using a struct pointer.

  • Lines 5-10 we declare we declare the struct person, a variable to hold a person struct, and a pointer to a person struct. The declaration for a pointer to a struct is similar to a pointer to an any other type, type *.
  • Line 12-13 we fill the struct with age and name values.
  • Line 14 we assign the address of the first variable to the struct pointer ptr.
  • Line 16 we print out values from the struct.

If we run that code and we get:

age=21, name=full name

On line 16 we have a new operator ptr->name. The -> operator is used to access a value from a struct pointer. This would be the same as doing (*ptr).field where we first derefence the struct pointer and then access the field using the standard . notation. Accessing a field from a struct pointer is so common, the -> operator exists to make it easier.

Pointers to pointers

A pointer can pointer to another pointer variable. You can have a pointer to a pointer, and a pointer to a pointer to a pointer and so on down the rabbit hole. In practice it is rare to see more than a pointer to a pointer. Usually two levels of indirection are enough.

Take the following code:

If you run this code, you should get output similar to this but with different memory addresses.

&ptr=0x7fff390fa6f8, &val=0x7fff390fa70c
ptr2ptr=0x7fff390fa6f8, *ptr2ptr=0x7fff390fa70c, **ptr2ptr=1
  • Lines 1-2 declare an int variable val and an int pointer variable ptr.
  • Line 5 is new. Here we are saying that we have a variable ptr2ptr that holds the address of another int pointer.
  • Line 6 we assign the ptr variable the address-of the val variable. We have seen this before.
  • Line 7 we assign the ptr2ptr variable the address-of the ptr variable. Double indirection. The ptr2ptr variable stores the address-of ptr which in turn stores the address-of val.
  • Line 8 we print out the address-of the ptr and val variables.
  • Line 9 we print out the value stored in ptr2ptr which is the same as &ptr. When we dereference that we get the address of val. When that is dereferenced we get the value 1.

Conclusion

I hope this (somewhat) brief overview helps with some of the different types of pointers you will see. If you found this useful, check out some of my other posts on function pointers in C and pointers and arrays in c.