So every big shop I ever worked at (all had over 200+ firewalls) always had log servers that were just sucking air. I’ve seen 16 processor 32 systems just melt. Why?
- Logs coming in
- Smart log indexing
- SmartTracker users searching
- SmartLog users searching
- SEIM like RSA and Arcsight sucking logs
- Tufin Sucking logs
and so I am always wondering do I need more CPU or more memory? Of course logging is the last thing funded by management so I’m going to have to squeeze blood from a turnip here.
Anyways because I have no life, I decided to analyze how virtual memory works on Linux and see if I can fine tune it somehow. As it turns out nicely, this analysis applies to Linux anything including NGF and VSX firewalls as well as the MDS management stations.
What am I looking for?
- Too much context switching
- Too much paging by user processes
- Too much paging by kernel processes
- Too much CPU processor hopping thus losing hot caches
R75.40=R77.10 to date are on Linux kernel 2.6.18. I found out that ‘top’ on these systems are wrong when it comes to memory analysis. Turns out ‘top’ and and memory reporting in general was fixed somewhere before Linux kernel 3.8.0.
BUT before I do I want to clarify for the masses “What is virtual memory”? I have been pontificating this for a while and come up with this. VM is three things, not one:
- The illusion that a 32bit process (and Linux kernel) can access up to 4GB of memory regardless if physical memory is available.
- Each process has its isolated process space
- A paging system that magically makes more memory appear to a process A as it is requested. It does this by paging out unused pages from physical memory to disk from some other poor sap slacker process B that is asleep at the wheel and allocating more memory to the requesting process A from the recently freed physical memory. The best analogy I know of is in Windows if you have a window application B minimized and you haven’t touched it in weeks, chances are that it has been banished to disk so that some other young vivacious process A that could use the physical memory that the minimized window B was sucking.
So when one of your young hot shot co-admins says “We’re outta virtual memory” you can now sound smart and say “What type of virtual memory???” and watch their eyes glaze over. Did a process try to go beyond its 4GB limit and blow up? Was there not enough paging space for all those fat memory sucking processes?
Oh yeah, one more thing. When a program that you write is linked to some helper library for say I/O, or database access, or GUI interface 99% of the time you are going to share memory with that library. So how is that shared memory charged?? To the VIRT of the library or to the requesting program(s)? THIS!!! is why 2.6.18 is reporting wrong stats as you shall see. 2.6.18 calculates wrong and it is fixed in 3.8.0.
How did I do this? Well I wrote a program and simulated a out of control process leaking memory on the heap 100MB at a time. I was trying to simulate a VSX kernel that kept increasing its heap space to allow for more connections. This is the problem I ran into.
If you look at the following diagram the system is reporting that 52KB of SWAP is being used. Well who is using it? If you look at the individual users of swap space it adds up to about 1GB of swap. So who is right?
Turns out the SWAP column is calculated WRONG!!! They just subtracted from a processes Virtual Memory usage (VIRT) the amount of Physical memory it is occupying (RSS) and they figured the difference was swapped out to disk. Unfortunately they forgot one minor issue…shared memory. VIRT includes shared memory. A process might have 46632KB of VIRT but 4136KB of that is shared by 1,2,5,10,100 other processes. In reality the process only has 424496KB of virtual memory and you then have to reduce the SWAP by that same amount….Kind of….
If library A with 100KB of data is shared by process A, B, C then A,B,C will have 300KB VIRT under Linux 2.6.18. So 100KB is now 300KB So what they decided to do was split it evenly between the 3. So each will have 33.33KB under 3.8.0 which makes more sense and the number will start to work out.
Prove it!
Somewhere leading up to 3.8.0, they updated the memory statistics in /proc/<procid>/smem which lists all the memory segments of a process. What this use to look like in 2.6.18 AND VSX and NGF (left side) was missing SWAP and SHR memory calculation. In 3.8.0 they now include these components and you can see it on the right hand side. The SWAP is the true SWAP space of the process and ‘top’,’ps’ reflect it accurately instead of guess at it with subtraction games.
So I know you are asking yourself “Geez Dreez, they lied to us once how do I know they are not lying to us again???”
Well Holmes (I’m into the new Sherlock Holmes series lately), I wrote a program (memgrow )that would simulate a memory leak in heap space by 100MB at a time and watched the smap table of all the processes to see what would happen. Check it out.
I also checked the VIRT and RSS tables and it all is very very close in 3.8.0 vs WRONG 2.6.18.
One more thing. You might see HUGE data allocation and you’d think that it should be swapped, but SWAP space is small. Why is that?
Below I allocated a 1GB chunk of heap space, but did NOT write to it. Check out ‘top’ it shows it allocated but did not get SWAPPED out or sitting in memory because the kernel is just holding it in reserve and not creating page tables for it yet because it has not been referenced.
Look what happens when I write to the space. The virtual memory is actually allocated and and because we are out of memory a lot of it is SWAPPED out and as much as possible it kept in RSS physical memory.
So when you see programs that have HUGE VIRT and DATA, you have to ask yourself are they REALLY REALLY REALLY using that space?? Or reserving it for a special day?
Its all about page faults baby…read on.
Have you fallen asleep yet? I know, I know….Why do I care???
Now you know what stats to trust and not trust. Now you can figure out what processes to watch that are running out of memory. Now you know if a system is memory starved or CPU starved.
How??
- It is not necessarily bad that the SWAP USED: is not = Zero. It means that processes that are near dead anyways are swapped out. The system got busy at some point and so it relegated those near dead processes into SWAP, and they haven’t woken up yet. You have to do more analysis to see if swapping impacts RUNNING processes.
- If process that is spending a higher percentage of its time RUNNING (not sleeping) and is swapping…you have a memory problem.Start ‘top’. Hit ‘i’ to see only the running processes (‘z’ will highlight them). Hit ‘f’ and add the Major Fault column. If running processes have a lot of faults, go to http://www.memory.com and buy more memory. Check out this MLM log server. The two busiest process
are spending most their time handling page faults. There might be 8 CPUs, but when the ‘fwd’ gets
on a CPU, it spends 75% of its time swapping then handling logs (OK, I am exaggerating but you get the point). Mo memory Mo memory Mo memory
- Pay special attention to your [kernel processes]. If they are swapping and faulting you’ve got big problems.
- If these same processes are NOT faulting but using 100% of the processor then you have 2 options:
- Use CoreXL fw ctl affinity to wire the affinity to 1 processor so it doesn’t jump around and lose its hot cache
- You need a box with a FASTER processor.
Well I could go on and on about this but I know you are both lost and bored. I talk about this a lot in my class.
Take aways:
- Do NOT trust R75.40+ ‘top’, ‘ps’ stats on SWAP space and virtual memory usage.
- Make sure your running processes are not faulting
Mystery solved Holmes: ‘There is nothing like first-hand evidence.’
Virtually yours,
dreez