Finding bad RAM with memtest86

Lately Firefox started to crash randomly without an apparent reason. Also other software on my computer started acting weird. It took me a while to found the cause of the problems. It turned out one of my memory modules has gone bad. Finding that out was not exactly easy. Normally I would run memtest86+ from an Ubuntu live CD. That was not possible as I have an UEFI BIOS (without legacy support) and memtest86+ is lacking UEFI support.

memtest_splashscreen

I was able to create a virtual machine (with KVM or VirtualBox) with reserved RAM and run memtest86+ in there. That actually showed the problem. Another trick to detect the problem is to run memtest86 (without the plus) as it has support for booting using UEFI since version 5. It can be downloaded from memtest86.com (choose CD image) and put it on an USB stick using UNetbootin (install using apt-get).

ram_modules

When I found the problem there was no other way than to try to add and remove memory modules and run the test again to find out which one was broken. It took some time, but eventually I succeeded. It wasn’t a pretty process and it took way too long. Somebody should write a memory testing program in user-space that also reports the slot of the broken RAM module. In the end the broken module turned out to be the one closest to the CPU (see picture), maybe it got too hot.

Suggested tools:

  1. MemTest86 from: memtest86.com
  2. Memtest86+ from: memtest.org
  3. Windows Memory Diagnostic from: microsoft.com

I hope it will help you.

Share

2 thoughts on “Finding bad RAM with memtest86”

  1. Dear Maurits,

    beside finding what you did quite bizarre, you could have guessed the memory bank that was damaged by looking at the address that was pointed during the failed write/verify cycle. But that, of course, would only have been valid if you dealt with real hardware.

    In fact, even if memtest86 was able to tell you the exact memory bank that was damaged, running it behind a hypervisor that implements its own memory address translation would have had no sense, since the virtual mapped address wouldn’t ever match the physical memory address of the damaged memory bank. Maybe the hypervisor could have said you that. Or maybe the hypervisor could have crashed when trying to dinamically allocate more physical memory for the guest.

    The reason why I find it bizarre to run memtest behind a hypervisor is simply because you will never be able to virtually walk all memory addresses when accessing memory in protected mode (since is shared and probably some other process is using it), that defeats even the purpose of running a memory test in first place.

    So, nice try! And consider yourself lucky for being able to find the broken memory, eheh 😉

  2. @Paolo: Thank you for your comment. Nice to hear from you again, how are you doing? I completely agree that it was an unorthodox approach, but hey.. it worked! You seem to imply that there is a guaranteed distribution of the RAM addresses over the RAM slots that applies to all motherboards. I did not know about that, how does that work? Do you have a reference?

Leave a Reply

Your email address will not be published. Required fields are marked *