TL;DR: Someone was wrong on the Internet and I just couldn’t help myself. If you already know how memory allocation works you’ll find this post boring and you can skip it. But if you don’t, read on… :)
I was just reading an article called “A look at Heartbleed and why it really isn’t that bad” and, while I usually tend to agree with anyone who tries to fight against FUD, in this case it happens to be dangerously wrong. I’d write this as a blog comment rather than an entry on my own, but Tumblr seems firmly stuck in the 90’s and won’t even give me that option :/ so here it goes…
In a nutshell, the article downplays the severity of the Heartbleed attack based on the Address Space Layout Randomization (ASLR) feature of most modern operating systems, that causes memory allocations to be randomized as a mitigation for buffer overflows. The reasoning goes: since memory allocations are random, and the Heartbleed bug allows you to read memory at random as well, the odds of reading important data are pretty much close to zero – therefore the Heartbleed attack is useless and you shouldn’t change your passwords.
This is based, I presume, on the following (wrong) assumptions:
- ASLR means that every single memory allocation for every single variable in memory is 100% random.
- The Heartbleed attack is also completely random, since it reads past the end of a memory buffer that was allocated randomly.
- All memory (be it code, static data, stack data or heap data) are mixed up randomly in memory with no grouping of any kind.
- The memory space is somehow shared across all processes, except for root processes that can freely access all of the RAM (yes, weird, but the author seems to imply that for some reason!).
If those assumptions were correct then the conclusion would be sound, but unfortunately, they are not. Memory allocation is a little more complicated than that. I’ll try to make a brief description of how memory layout works. There are much smarter people than me out there who have described it better, so if you’re interested in the topic, Google and Duck Duck Go are your friends. ;)
There are many other factual errors in the article, like confusing files with memory, mentioning Perfect Forward Secrecy when it’s not relevant at all, assuming the attack captures passwords from a database instead of from other users requests on transit, and so on. In fact, there are so many errors I suspect the author may just be pulling a prank on gullible readers! Still, let’s try to make things clearer. :)
We must begin by making the distinction between memory space and memory addresses. Memory space is contiguous and shared for the entire machine, since there is only one RAM. However, the RAM is not accessed directly by user programs, but through the use of memory addresses that are assigned by the operating system. This mechanism is what prevents one program from directly accessing the memory of another program – each one is assigned a set of memory addresses, which maps to wherever the OS wants in RAM. It’s not user permissions or configuration that prevents it then, or whether the program is running as root or not, but simply that the same memory address for two different programs are mapped to different places in RAM. (There is an exception to this in which the OS can create shared memory between one or more programs, by mapping addresses to the same place in RAM, but it’s a special mechanism in which both programs cooperate to share a piece of memory and it’s not relevant to the Heartbleed bug). So whether the services that have the bug are running as root or not is not important at all, what’s important is what kind of information do these services have in their own memory.
The next thing we need to know is how memory allocation works. From the perspective of a high level language, like Python or Ruby, memory is a kind of magic: you just start using a new variable and it’s simply there, no questions asked. :) That’s because such languages have mechanisms in place to shield the programmer from knowing how memory allocation actually works. More often than not, programmers who have never played with low level languages have some interesting misconceptions about it…
In reality, memory is allocated in pages, where each page is 4096 bytes long (could be a different size for non-Intel architectures, or in some other circumstances, but let’s pretend it’s always that size). In order to create new variables, you need some memory to hold their values, so the program needs to request this memory to the operating system. This is where the ASLR feature takes place: in older operating systems like, say, Windows NT, the new pages were assigned memory addresses in a sequential way, so it was easy to predict where data was. But in newer operating systems each page is given a random memory address.
…well, not exactly. If you request the operating system for a larger area than just 4096 bytes, you need those bytes to have contiguous memory addresses, or you wouldn’t know how to access them. So the only thing that’s random is the address of the first byte of contiguous memory you request. Also, it can’t be 100% random: since all pages are 4096 bytes in size, and you can only request a whole number of pages, it stands to reason that these random memory addresses will always be aligned to 4096 bytes. In fact, for efficiency reasons and depending on the operating system, they may even be aligned to larger numbers like 64 kilobytes. Still random, but not SO random. And that’s not the end of it.
See, 4096 bytes is an awful lot of space. You can’t just request a new page from the OS every time you need room for a single variable, which may only be a handful of bytes long. So what programs do is place variables inside two structures called stack and heap. What they do is request memory in pages, use the room to store many variables in them, and track down which variable is stored where in order to know which bytes in each page are available for new variables to be put in. As more and more variables are created, the stack and heap begin requesting new pages – and as variables are destroyed, the memory space they live in is marked as free to be reused by newly created variables.
The stack is used when creating variables that live in the scope of a function call, that’s why it’s called a stack: when you call another function, the local variables for it are stacked on top of the previous function. And when the function returns, its local variables are removed from the top of the stack, leaving their memory space free for the next call. All stack space is, by force, contiguous, at least for a single thread (each thread gets its own separate stack). Therefore, while the base memory address of the stack is random thanks to ASLR, the relative locations of variables within it are not – they are deterministic! They will depend on the order in which functions are called and how many local variables each one has. If you have a bug that lets you access memory past a stack variable, you’ll be accessing other variables in the stack without problems, despite of ASLR.
The heap, on the other hand, is for variables that can live outside the scope of a single function call. When you create a new variable with the new() operator in C++, or when you call the malloc() function in C, what the program is really doing is grabbing some available bytes from one of the pages of the heap and marking it as used by that variable. When you destroy that variable by calling free() in C or delete() in C++, you’re marking that space as free so other variables can use it. It follows then, that ASLR doesn’t help you all that much here either: even though the pages of the heap are more or less random (since they don’t all have to be contiguous as it happens with the stack) the variables stored in it are still grouped together. This mechanism is also deterministic, but since a server may have multiple users doing things at any given time, the pattern of memory use will depend on lots of factors you don’t know (user activity, basically), so the exact memory layout is hard to predict. Still, variables are being grouped together, and they won’t likely be nearby anything else like executable code or static data, which is not allocated in the heap nor the stack. So if you can read memory past the end of a variable, you’re likely to read data from other nearby variables, quite possibly related to what the vulnerable code was doing (in this case, decrypting SSL traffic from users).
In conclusion: while it’s always healthy not to panic :) and the Internet has definitely seen worse bugs time and again, let’s not get carried away: if you used any service that contained the bug, you should be changing your passwords just to be on the safe side.