How to Think About Variables in C

C is memory with syntactic sugar and as such it is helpful to think of things in C as starting from memory. One of the pieces that I think is often overlooked is variables and data types. If you have the right mental model for variables and data types it makes other concepts in C, and other langauages, easier. Let’s start with three definitions.

  • Every variable is a starting memory address to the compiler.
  • Every variable has a data type.
  • A data type is a number of bytes to the compiler.

Yes I am being simplistic and yes certain data types have certain syntactic sugar but I have found this to be a good mental model. In most assembly languages, data types don’t exist. You operate on bytes and offsets. Most C compilers operate only one step above assembly, giving useful abstractions instead of dealing with individual bytes and instructions.

When you write a variable like int x = 10; what you are saying to the compiler is there is a memory address which we have labeled x. Starting at that address we have sizeof(int) bytes. Copy the value 10 into those sizeof(int) bytes. Notice I said a variable was a starting location. Under the hood, all variables in C reference their single starting memory address and the compiler knows to use a certain number of bytes based on the data type.

Most of this happens automatically in C. You say int and it generates the assembly code to copy the correct number of bytes to the correct memory locations. How does a C compiler know the size of its data types? Integer data types are defined in the limits.h file. Float data types are defined via macros in the floats.h file. Other data types such as structs and typedefs are defined in code.

Let’s think about copying variables.

If we copy an int from y to x, we are saying we have this value containing sizeof(int) bytes at the memory address starting at y, copy that to this other location starting at the memory address x. Under the hood that is what the C compiler and its resulting assembly language is doing.

Thinking of variables as memory addresses and data types as a number of bytes has helped clarify many concepts for me in C. Whenever I get stuck on a concept I always come back to what is going on at the memory level and that usually helps to move forward.

Update: I removed a section on L-values and R-values. I was trying to make a point linking L-values and assignable memory locations but in the end it was creating more confusion that it was helping.



|   

9 Responses to “How to Think About Variables in C”

  1. Keith Thompson says:

    Repeating my comment on Hacker News:

    A data type is a number of bytes to the compiler.

    The size of a type is just one of its many attributes. Even if, for example, long, float, and void* happen to have the same size, they’re still very distinct types.

    Integer data types are defined in the limits.h file. Float data types are defined via macros in the floats.h file.

    Integer and floating-point types are defined by the compiler, guided by the hardware and the ABI for the platform. <limits.h> and <float.h> document the characteristics of the predefined numeric types.

    A pointer doesn’t hold a memory address, it holds a number that represents a memory address.

    Sure, and a floating-point object is ultimately just a collection of bits — but that’s hardly the best way to think about either of them. Integers and pointers (addresses) are logically very distinct things, even if they happen to have similar representations. For example, the addresses of two distinct variables have no defined relationship to each other (other than being unequal); just evaluating (&x < &y) has undefined behavior.

    C lets you get away with a lot of type-unsafe stuff, particularly if you resort to pointer casts, but it’s fundamentally much more strongly typed than you imply it is.

    • Dennis Kubes says:

      What would you recommend as the best way to think about these things?

      In terms of type safety I wasn’t implying or suggesting anything. I was describing a mental model of variables as memory and numbers of bytes. The reason that you can, but shouldn’t, get away with type unsafe things in C is because it is just memory and syntactic sugar.

    • Prathamesh says:

      Just a minor point “Every variable is a starting memory address to the compiler”.
      Not true for bit-fields and variables that have register storage class specifier.

  2. […] How to Think About Variables in C […]

  3. root says:

    a data type is not only a number of bytes to the compiler.
    a data type is size AND representation.
    if you do

    int i = -1;
    unsigned int = (unsigned int) i;

    then the compiler converts from two’s complement to one’s complement (on most systems, or guaranteed by the C standard, I don’t know exactly).

  4. Derek says:

    Robert Traister wrote a book twenty yrs ago that explains exactly this called Mastering C Pointers.

  5. Dan Sutton says:

    It’s interesting for me to read about things like this: thirty years ago, this was the only way anyone ever thought about it – principally because we had no choice: everything we did (especially on PCs and CP/M machines and so on) ended up having to have a certain amount of straight machine code in it, simply for speed – the machines were so, so slow… and then there were things like reading from COM ports and so on – the libraries for C, and so on, were so few and far between that one ended up writing all this stuff in assembler, just to be able to do it at all. If you were writing code in other languages like Pascal, which didn’t run as fast as C when compiled, then you were switching to machine code all the time, just to get some response out of the thing. So you had no choice but to understand how things worked internally… my fascination at this article stems from the fact that, these days, most programmers are isolated from that level of the machine. It makes me wonder what they think is happening, and how that level of disassociation from the machine level affects them as programmers.

    • Dennis Kubes says:

      I completely agree in that I think one of the problems in software today is that we are moving too far away from the machine.

      There aren’t enough programmers that know or want to know what happens at the assembly or machine level. Some of the things that blew my mind were learning how things operate at the memory level. I have a friend who digs deeply into hard drive performance. He has shown how to get tremendous performance on spindled disks by understanding the internals.

      I have been accused of going in the wrong direction moving back towards assembly. But I think understanding how something operates from first principles gives an ability to create and program on a different level. I hope more programmers in the future start moving backwards toward the machine.

      • Richard M. Garber says:

        I couldn’t agree more. As a computer science major, it was frustrating writing code in a high level language and hitting “RUN”. I quickly switched to computer engineering because I wanted to understand how the machine actually works from the MOSFET level on up. It was enlightening to say the very least. BTW, a computer engineering degree also carries with it the benefit of being able to do amazing things in MineCraft with RedStone!

Leave a Reply