Non-contiguous memory after "running for a while" ?

I'd like to know how this quote I found in another forum relates to Linodes, the part about servers "running for a while" and fragmented memory:

"32 bit Linux can malloc (allocate) a max of 3G (1G is reserved for the Kernel), and that is if and only if it can allocate 3 contiguous (no interruptions) Gig of memory. If you server has been running for a while it is quite possible that the memory is fragmented and therefore 2G of contiguous memory isn't available."

Basically, I'm using a 32-bit OS and getting close to where I'm not able to give MySQL the memory it wants. The 2G limit makes sense (my buffer pool needs to be LESS than 2G) but the part about fragmenting after "running for awhile" doesn't make sense to me.

So let's say I have 1.5G of contiguous memory (allocated to my InnoDB) and everything is running fine. Rough guess, is there a high probability that my 1.5G of memory will become non-contiguous "after running for a while" and everything will start swapping to disk?

If this isn't really a Linode-specific question, I can ask this on ServerFault.com or SuperUser.com, but I'm guessing you guys know the answer!

7 Replies

Linux has a memory map perprocess_. If you run a program that tries to allocate a tonne of short memory segments and then frees some of them then _that process_ may not be able to malloc 2Gb of space because it's memory map may be fragmented. However a different process will be able to allocate 2Gb, because it has a different memory map. The uptime of the OS won't impact that.

(Side note: what the process sees as "contiguous" isn't what the kernel sees; to the kernel the memory map is always all fragmented and all over the place because of paging).

Now process 1 fragmented memory could impact process 2 simply in as far as that fragmented usage might take up more virtual memory than otherwise, and so there's less free virtual memory available for the whole system, but that's not really what the original statement is talking about.

You should also be careful; if you're running a 32bit kernel have have >4Gb of RAM then you should run the PAE kernel, otherwise you'll be swapping to disk before you need to (or switch to the 64bit kernel). If you've got <2Gb of RAM then you don't want to let MySQL use that much 'cos your machine will swap.

i've seen on linode, and an odroid with no swap where you get tonnes of unable to allocate memory errors, even when there was stacks of free memory, over a GB for the odroid. They aren't fatal errors, but I'd guess they cause a slight delay. the fix is to add vm.minfreekbytes=16384 into /etc/sysctl.d/local.conf

after adding that line (and you probably need to add that file as well) you need to run

sysctl vm.minfreekbytes=16384

I'm scratching my head as to why that works when there's a GB of free memory already, but it does.

That's likely the kernel allocation bug in the IPv4 stack - see http://forum.linode.com/viewtopic.php?t=7805

not in my case, I was getting those messages from practically every process.

Thanks Stephen, that was informative.

Maybe this is a little off-topic from the original question, but I'm thinking the easiest solution for me will be to just move MySQL to a new 64-bit Linode (same datacenter) so I'm not starting over from scratch. I never have to reboot the current setup, it just runs forever like butter. So if I can leave it alone and just move the database to 64-bit, I'll be thrilled. I got the idea from this forum. So that's on my to-do list in the near future.

One thing to be aware of; communication from your front end (eg web PHP, whatever) to your backend (MySQL) will be slower if split over two machines. So you'll get potentially bigger scaling but at the expense of latency.

@sweh:

One thing to be aware of; communication from your front end (eg web PHP, whatever) to your backend (MySQL) will be slower if split over two machines. So you'll get potentially bigger scaling but at the expense of latency.
Since every non-small website ever does precisely that, I don't think 0.3 ms or whatever is a major catastrophe.

Edit: On the other hand, they've carefully optimized to reduce queries and cache everything. On the other other hand, where are all those caches stored? Memcached, on other machines, accessed over the LAN. :P

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct