When C was created, in 1972, computers were much slower. Most programs were written in assembly. C came along as a better assembly allowing programmers to manipulate memory directly with pointers. Programmers worked much closer to the machine and had to understand how memory worked to make their programs efficient.
Memory Addresses
Memory can be though of as an array of bytes where each address is on index in the array and holds 1 byte. If a computer has 4K of memory, it would have 4096 addresses in the memory array. How operating systems handle memory is much more complex than this, but the analogy provides an easy way to think about memory to get started.
Let’s say our computer has 4K of memory and the next open address is 2048. We declare a new char variable i = ‘a’. When the variable gets declared, enough memory is set aside for its value from unused memory. The variable name is linked to the starting address in memory. Our char i has a value ‘a’ stored at the address 2048. Our char is a single byte so it only takes up index 2048. If we use the &
operator on our variable it would return the address 2048. If the variable was a different type, int for instance, it would take up 4 bytes and use up elements 2048-2051 in the array. Using the &
would still return 2048 though because the int starts at that index even though it takes up 4 bytes. Let’s look at an example.
1 2 3 4 5 6 7 8 9 10 11 |
// intialize a char variable, print its address and the next address char charvar = '\0'; printf("address of charvar = %p\n", (void *)(&charvar)); printf("address of charvar - 1 = %p\n", (void *)(&charvar - 1)); printf("address of charvar + 1 = %p\n", (void *)(&charvar + 1)); // intialize an int variable, print its address and the next address int intvar = 1; printf("address of intvar = %p\n", (void *)(&intvar)); printf("address of intvar - 1 = %p\n", (void *)(&intvar - 1)); printf("address of intvar + 1 = %p\n", (void *)(&intvar + 1)); |
Running that you should get output like the following:
address of charvar = 0x7fff9575c05f address of charvar - 1 = 0x7fff9575c05e address of charvar + 1 = 0x7fff9575c060 address of intvar = 0x7fff9575c058 address of intvar - 1 = 0x7fff9575c054 address of intvar + 1 = 0x7fff9575c05c
In the first example on lines 1-5 we declare a char variable, print out the address-of the char, and then print out the address just before and just after the char in memory. We get the addresses before and after by getting the using the & operator and then adding or subtracting one. In the second example on lines 7-11 we do the same thing except this time we declare an int variable, printing out its address and the addresses right before and after it.
In the output we see the addresses in hexadecimal. What is important to notice is that the char addresses are 1 byte before and after while the int the addresses are 4 bytes before and after. Math on memory addresses, pointer math, is based on the sizeof the type being referenced. The size of a given type is platform dependent but for this example our char takes 1 byte and our int takes 4 bytes. Subtracting 1 address from a char gives a memory address that is 1 byte previous while subtracting 1 from an int gives a memory address that is 4 bytes previous.
Even though in our example we were using the address-of operator to get the addresses of our variables, the operations are the same when using pointers that hold the address-of a varible.
Some commenters have brought up that storing &charvar – 1, an invalid address because it is before the array, is technically unspecified behavior. This is true. The C standard does have areas that are unspecified and on some platforms even storing an invalid address will cause an error.
Array Addresses
Arrays in C are contiguous memory areas that hold a number of values of the same data type (int, long, *char, etc.). Many programmers when they first use C think arrays are pointers. That isn’t true. A pointer stores a single memory address, an array is a contiguous area of memory that stores multiple values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// initialize an array of ints int numbers[5] = {1,2,3,4,5}; int i = 0; // print the address of the array variable printf("numbers = %p\n", numbers); // print addresses of each array index do { printf("numbers[%u] = %p\n", i, (void *)(&numbers[i])); i++; } while(i < 5); // print the size of the array printf("sizeof(numbers) = %lu\n", sizeof(numbers)); |
Running that you should get output like the following:
numbers = 0x7fff0815c0e0 numbers[0] = 0x7fff0815c0e0 numbers[1] = 0x7fff0815c0e4 numbers[2] = 0x7fff0815c0e8 numbers[3] = 0x7fff0815c0ec numbers[4] = 0x7fff0815c0f0 sizeof(numbers) = 20
In this example we initialize an array of 5 ints. We then print the address of the array itself. Notice we didn’t use the address-of & operator. This is because the array variable already decays to the address of the first element in the array. As you can see the address of the array and the address of the first element in the array are the same. Then we loop through the array and print out the memory addresses at each index. Each int is 4 bytes on our computer and array memory is contiguous, so each int addres be 4 bytes away from each other.
In the last line we print the size of the array. The size of an array is the sizeof(type) * number of elements in the array. Here the array holds 5 ints, each of which takes up 4 bytes. The entire array is 20 bytes.
Struct Addresses
Structs in C tend to be contiguous memory areas, though not always. And like arrays they hold multiple data types, but unlike arrays they can hold a different data types.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
struct measure { char category; int width; int height; }; // declare and populate the struct struct measure ball; ball.category = 'C'; ball.width = 5; ball.height = 3; // print the addresses of the struct and its members printf("address of ball = %p\n", (void *)(&ball)); printf("address of ball.category = %p\n", (void *)(&ball.category)); printf("address of ball.width = %p\n", (void *)(&ball.width)); printf("address of ball.height = %p\n", (void *)(&ball.height)); // print the size of the struct printf("sizeof(ball) = %lu\n", sizeof(ball)); |
Running that you should get output like the following:
address of ball = 0x7fffd1510060 address of ball.category = 0x7fffd1510060 address of ball.width = 0x7fffd1510064 address of ball.height = 0x7fffd1510068 sizeof(ball) = 12
In this example we have our struct definition. Then we declare a instance ball of the struct measure and we populate its width, height, and category members with values. Then we print out the address of the ball variable. Like the array varible structs decay to the address of their first element. We then print out each of the struct members. Category is the is the first member and we see that it has the same address as the ball variable. The width member is next followed by the height member. Both have address higher than the category member.
You might think that because category is a char and chars take up 1 byte then the width member should be at an address 1 byte higher than the start. As you can see from the output this isn’t the case. According to the C99 standard (C99 ยง6.7.2.1), a C implementation can add padding bytes to members for aligment on byte boundaries. It cannot reorder the data members but it can add in padding bytes. In practice most compilers will make each member the same size as the largest member in the struct but this is entirely implementatation specific.
In our example you can see that the char actually takes up 4 bytes and the size of the struct takes a total of 12 bytes. What to take away?
- A struct variable points to the address of the first member in the struct.
- Don’t assume that struct members will be a specific number of bytes away from another field, they may have padding bytes or the memory might not be contiguous depending on the implementation. Use the address-of (&) operator on the member to get its address.
- And use sizeof(struct instance) to get the total size of the struct, don’t assume it is just the sum of its member fields, it may have padding.
Conclusion
Hope this post helps you to understand more about how addresses operate on different data types in C. In a future post we will go over some basics on pointers and arrays in C.