Tuesday, February 17, 2009

Finding memory stomps - strategies

Tracking down a memory stomp is long tedious work. Still, there are a few basic strategies that can help. Suppose you had a crash in your application which occurs after the launch or your application 1/10 of the time after anywhere from 10 to 15 minutes. Using the debugger, it appears hat a memory stomp occurs on apparently random elements of a 10000-element linked list. How would you track down this bug?

Most strategies involve instrumenting the code in some fashion and by the nature of random crashes, is likely to change the circumstances under which your memory stomp occurs. Be sure to check the before and after changes to make sure that you still see the stomp before attempting to track it down. There is no one-size-fits-all solution for memory stomps but watching your memory is a good strategy.

Before I talk about the strategies, we must cover the most common cause of memory stomps: uninitialized pointers and memory overwrites. Uninitialized pointers are becoming more rare because most people know how important this is but basically, this amounts to someone trying to use memory that was never allocated (or a pointer that we never assigned). The memory overwrite problem occurs when someone does a memcopy with too many elements writing beyond the end of a block of allocated memory.

It turns out that most of the strategies listed here capture both circumstances.

Move the stomped memory
This strategy means keeping the allocation in place but keeping the memory you want to use somewhere else. Basically, you want to allocate dummy memory that you can check for stomping periodically in the same memory location as the original stomped memory. If you check this RAM for stomping, you are more likely to track down the circumstances that cause the stomp. Just make sure that your application checks this RAM for any changes and that this check is run often. This will slow your application somewhat, but if it is just a raw block of RAM, it'll be extremely fast to check.

Memory segmentation strategy
Windows allows threads to run in their own memory space meaning that allocation by one thread are not necessarily usable by other threads. By moving some of your code into other threads, you are partly preventing the memory stomp. More importantly, when you application goes to overwrite that memory, Windows will barf because that memory is no longer available and you will get a "segmentation fault". You will know exactly when the stomp occurs.

Heap movement strategy
Memory stomps usually occur because your memory management scheme is home-grown and most of your RAM is managed. This means that you performed a huge allocation at some point and dolled out portions when people requested it.

Now, the first, and easiest, thing to do is look at your heaps. Which heap appears right before the stomped-on heap? That is a good place to start. Put some sentinel values in that heap (0xA5 works nicely, every other bit set) which you can examine readily. Now run your application and when it crashes, look at a few memory locations close to the maximum addresses in that heap. Do they still contain your sentinel values? If not, then that memory heap is also being stomped, and you just don't see the bug. But if you know who uses that memory heap, then you know whom to bang over the head.

If things aren't arranged this way in your application, consider making it so because this memory "shell game" allows you to move heaps around until you can find the offending subsystem and narrow the problem. Finding memory stomps can take a long time.

Memory sentinel strategy
Your allocation scheme can be modified, with very little effort, to include an extra byte or two at the beginning and end of each memory allocation. Most allocators do this anyway, you just may not know it. This is easy to see in the debugger by using new where you know the memory originates and looking at a few bytes preceding the pointer returned. Most allocators store between 16 and 80 bytes of extra info for every allocation you do. This helps keep extra data like who allocated it, on which thread, how many items (new item[num]), and so on. The minimum needed is a size parameter for the memory delete to know how much RAM to free at destruction time.

You can do something similar by adding a small chunk of misc data at the beginning and end of each allocation. Then, when deallocations are performed, you can check to make sure that these sentinels are still valid and throw and exception if they are not. Eight to sixteen bytes are a good starting point or for small block allocators, 2 bytes is best. Remember that all memory returned should be on 4-byte, or 8-byte, boundaries depending on bus bit width.

Third-party tools
Bounds checker is a good tool for helping track memory stomps. Usually, the instrumentation of the code that bounds checker does masks the problem, but this can still be a fabulous tool for tracking memory stomps and the performance can be minimal with a few minor settings changes.

Walking the memory
The stomped memory is being victimized by another portion of your application. But catching the criminal has been elusive, so one way is to periodically walk the RAM and report changes. You know approximate time when the stomp occurs, so see if you can turn on/off the "stomp checker" and just walk the RAM looking for important changes like bad pointers or whatever. This will slow your application a lot, but it can throw an exception as soon as it "sees" any changes and thus you can narrow down who is giving you grief.

Using some combination of these should allow you to track down memory stomps. Don't give up and never accept that the bug mysteriously 'disappeared'. The sooner you find it and squish it, the sooner you can get onto coding more fun things.


Cindy Dy said...

I like the way on how you put up your blogs. Wonderful and awesome. Hope to read more post from you in the future. Goodluck. Happy blogging!


Silvia Jacinto said...

Reading your article is such a privilege. It does inspire me, I hope that you can share more positive thoughts. Visit my site too. The link is posted below.



John said...

I wanted to be the first non-bot to comment here.

I am surprised this doesn't have more comments.
I am new to c++ and the concept of memory stops is foreign to me.
I am not even sure I get why 0xA5 is a sentinel sequence.
So... A non-null terminator? Why do you need anything but 0?

Do these memory stops happen with classes and pointers? Or is
it only when you do a malloc to allocate a continuous section of
memory on the stack?

Do any of these questions make sense or am I full of B.S.

Also, found your article from here: