Skip to content

Memory Management

  • Tuning the memory sub-system can be a complex process.

  • First of all, one has to take note that memory usage and I/O throughput are intrinsically related, as, in most cases, most memory is being used to cache the contents of files on disk. Thus, changing memory parameters can have a large effect on I/O performance, and changing I/O parameters can have an equally large converse effect on the virtual memory sub-system.

    free -m
    total used free shared buff/cache available
    Mem: 7763 3178 646 1022 3938 3262
    Swap: 7762 1034 6728
    cat /proc/meminfo
    MemTotal: 7949804 kB
    MemFree: 669748 kB
    MemAvailable: 3355456 kB
    Buffers: 28 kB
    Cached: 3777140 kB
    SwapCached: 13160 kB
    Active: 2357428 kB
    Inactive: 3249488 kB
    Active(anon): 1659132 kB
    Inactive(anon): 1201760 kB
    Active(file): 698296 kB
    Inactive(file): 2047728 kB
    Unevictable: 583624 kB
    Mlocked: 220 kB
    SwapTotal: 7949308 kB
    ...
    UTILITYPURPOSEPACKAGE
    freeBrief summary of memory usageprocps
    vmstatDetailed virtual memory statistics and block I/O, dynamically updatedprocps
    pmapProcess memory mapprocps
  • The pseudofile /proc/meminfo contains a wealth of information about how memory is being used.

  • The /proc/sys/vm directory contains many tunable knobs to control the Virtual Memory system.

  • Values can be changed either by directly writing to the entry, or using the sysctl utility.

  • When tweaking parameters in /proc/sys/vm, the usual best practice is to adjust one thing at a time and look for effects. The primary (inter-related) tasks are:

  • Controlling flushing parameters; i.e., how many pages are allowed to be dirty and how often they are flushed out to disk

  • Controlling swap behavior; i.e., how much pages that reflect file contents are allowed to remain in memory, as opposed to those that need to be swapped out as they have no other backing store

  • Controlling how much memory overcommission is allowed, since many programs never need the full amount of memory they request, particularly because of copy on write (COW) techniques

  • Memory tuning can be subtle: what works in one system situation or load may be far from optimal in other circumstances.

  • Exactly what appears in this directory will depend somewhat on the kernel version. Almost all of the entries are writable (by root).

    ls /proc/sys/vm/
    admin_reserve_kbytes dirty_ratio legacy_va_layout min_unmapped_ratio numa_zonelist_order panic_on_oom watermark_boost_factor
    compaction_proactiveness dirtytime_expire_seconds lowmem_reserve_ratio mmap_min_addr oom_dump_tasks percpu_pagelist_high_fraction watermark_scale_factor
    compact_memory dirty_writeback_centisecs max_map_count mmap_rnd_bits oom_kill_allocating_task stat_interval zone_reclaim_mode
    compact_unevictable_allowed drop_caches memfd_noexec mmap_rnd_compat_bits overcommit_kbytes stat_refresh
    dirty_background_bytes extfrag_threshold memory_failure_early_kill nr_hugepages overcommit_memory swappiness
    dirty_background_ratio hugetlb_optimize_vmemmap memory_failure_recovery nr_hugepages_mempolicy overcommit_ratio unprivileged_userfaultfd
    dirty_bytes hugetlb_shm_group min_free_kbytes nr_overcommit_hugepages page-cluster user_reserve_kbytes
    dirty_expire_centisecs laptop_mode min_slab_ratio numa_stat page_lock_unfairness vfs_cache_pressure
  • vmstat is a multi-purpose tool that displays information about memory, paging, I/O, processor activity and processes.
Terminal window
vmstat [options] [delay] [count]

If delay is given in seconds, the report is repeated at that interval count times; if count is not given, vmstat will keep reporting statistics forever, until it is killed by a signal, such as Ctrl-C.

Terminal window
vmstat 2 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 1048576 910912 28 4061280 6 16 62 42 52 151 4 2 94 0 0
0 0 1048576 940816 28 4040172 0 0 0 266 2874 5571 3 2 94 0 0
0 0 1048576 939220 28 4042236 0 0 0 44 2850 5257 3 2 95 0 0
0 0 1048576 938500 28 4042236 0 0 0 0 2695 5135 3 2 95 0 0
vmstat -SM -a 2 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
2 0 1024 825 3128 2305 0 0 62 42 55 157 4 2 94 0 0
0 0 1024 824 3122 2305 0 0 0 38 2829 5162 3 3 94 0 0
0 0 1024 836 3122 2305 0 0 0 3438 2983 5966 3 2 94 0 0
1 0 1024 841 3122 2305 0 0 0 44 2672 5199 3 2 95 0 0
vmstat -p /dev/sda3 2 4
sda3 reads read sectors writes requested writes
258262 26944496 303063 13001080
258262 26944496 303083 13001448
258262 26944496 303108 13001744
258263 26944504 303137 13004376
  • If the option -S m is given, memory statistics will be in MB instead of KB.

  • With the -a option, vmstat displays information about active and inactive memory.

  • Active memory pages are those which have been recently used; they may be clean (disk contents are up to date) or dirty (need to be flushed to disk eventually).

  • By contrast, inactive memory pages have not been recently used and are more likely to be clean and are released sooner under memory pressure.

  • If you just want to get some quick statistics on only one partition, use the -p option

Linux employs a virtual memory system, in which the operating system can function as if it had more memory than it really does. This kind of memory overcommission functions in two ways:

  • Many programs do not actually use all the memory they are given permission to use. Sometimes, this is because child processes inherit a copy of the parent’s memory regions utilizing a COW (Copy On Write) technique, in which the child only obtains a unique copy (on a page-by-page basis) when there is a change.
  • When memory pressure becomes important, less active memory regions may be swapped out to disk, to be recalled only when needed again.

Such swapping is usually done to one or more dedicated partitions or files; Linux permits multiple swap areas, so the needs can be adjusted dynamically. Each area has a priority, and lower priority areas are not used until higher priority areas are filled.

In most situations, the recommended swap size is the total RAM on the system. You can see what your system is currently using for swap areas by looking at the /proc/swaps file and report on current usage with free.

The commands involving swap are:

  • mkswap: format swap partitions or files

  • swapon: activate swap partitions or files

  • swapoff: deactivate swap partitions or files

    cat /proc/swaps
    Filename Type Size Used Priority
    /dev/zram0 partition 7949308 1111296 100
    free -m
    total used free shared buff/cache available
    Mem: 7763 3104 1528 707 3129 3532
    Swap: 7762 1085 6677

At any given time, most memory is in use for caching file contents to prevent actually going to the disk any more than necessary, or in a sub-optimal order or timing. Such pages of memory are never swapped out as the backing store is the files themselves, so writing out to swap would be pointless; instead, dirty pages (memory containing updated file contents that no longer reflect the stored data) are flushed out to disk.

  • Simplest way to handle memory pressure: Permit memory allocations until all memory is exhausted, then fail.
  • Second simplest way: Use swap space on disk to free up some resident memory. Total available memory is RAM + swap space.
  • Linux allows memory overcommitment, granting memory requests beyond RAM + swap, as many processes don’t use all requested memory.
  • Example:
    • An example would be a program that allocates a 1 MB buffer, and then uses only a few pages of the memory.
    • Another example is that every time a child process is forked, it receives a copy of the entire memory space of the parent. Because Linux uses the COW (copy on write) technique, unless one of the processes modifies memory, no actual copy needs be made. However, the kernel has to assume that the copy might need to be done.
  • Kernel allows overcommitment only for user process pages; kernel pages are not swappable and are allocated at request time.
  • OOM (Out of Memory) killer selects which processes to terminate during severe memory pressure.
  • Overcommission can be modify and even turn off by setting the value of /proc/sys/vm/overcommit_memory values:
    • 0 (default): Permit overcommission but refuse obvious overcommits. Root users get more memory allocation than normal users.
    • 1 : Allow all memory requests to overcommit.
    • 2 : Turn off overcommission. Memory requests fail when total memory commit reaches swap space + a configurable percentage of RAM (/proc/sys/vm/overcommit_ratio).
  • Heuristic algorithm not for normal operations but for graceful shutdown or retrenchment.
  • Process selection based on badness value (/proc/[pid]/oom_score) for each process.
  • Adjust oom_adj_score in the same directory for each task to make adjustments.