Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"


Question

Note: This question was originally asked here but the bounty time expired even though an acceptable answer was not actually found. I am re-asking this question including all details provided in the original question.

A python script is running a set of class functions every 60 seconds using the sched module:

# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))

The script is running as a daemonised process using the code here.

A number of class methods that are called as part of doChecks use the subprocess module to call system functions in order to get system statistics:

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]

This runs fine for a period of time before the entire script crashing with the following error:

File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory

The output of free -m on the server once the script has crashed is:

$ free -m
                  total       used       free     shared     buffers    cached
Mem:                894        345        549          0          0          0
-/+ buffers/cache:  345        549
Swap:                 0          0          0

The server is running CentOS 5.3. I am unable to reproduce on my own CentOS boxes nor with any other user reporting the same problem.

I have tried a number of things to debug this as suggested in the original question:

  1. Logging the output of free -m before and after the Popen call. There is no significant change in memory usage i.e. memory is not gradually being used up as the script runs.

  2. I added close_fds=True to the Popen call but this made no difference - the script still crashed with the same error. Suggested here and here.

  3. I checked the rlimits which showed (-1, -1) on both RLIMIT_DATA and RLIMIT_AS as suggested here.

  4. An article suggested the having no swap space might be the cause but swap is actually available on demand (according to the web host) and this was also suggested as a bogus cause here.

  5. The processes are being closed because that is the behaviour of using .communicate() as backed up by the Python source code and comments here.

The entire checks can be found at on GitHub here with the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.

The script was run with strace with the following output before the crash:

recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4)                                = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5])                            = 0
pipe([6, 7])                            = 0
fcntl64(7, F_GETFD)                     = 0
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
write(2, "    ", 4)                     = 4
write(2, "void = action(*argument)\n", 25) = 25
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "errread, errwrite)\n", 19)    = 19
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
read(8, "table(self, handle):\n           "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None\n         "..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "self.pid = os.fork()\n", 21)  = 21
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
write(2, "OSError", 7)                  = 7
write(2, ": ", 2)                       = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "\n", 1)                       = 1
unlink("/var/run/sd-agent.pid")         = 0
close(3)                                = 0
munmap(0xb7e0d000, 4096)                = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000)                          = 0xa022000
exit_group(1)                           = ?
1
106
5/23/2017 12:02:38 PM

As a general rule (i.e. in vanilla kernels), fork/clone failures with ENOMEM occur specifically because of either an honest to God out-of-memory condition (dup_mm, dup_task_struct, alloc_pid, mpol_dup, mm_init etc. croak), or because security_vm_enough_memory_mm failed you while enforcing the overcommit policy.

Start by checking the vmsize of the process that failed to fork, at the time of the fork attempt, and then compare to the amount of free memory (physical and swap) as it relates to the overcommit policy (plug the numbers in.)

In your particular case, note that Virtuozzo has additional checks in overcommit enforcement. Moreover, I'm not sure how much control you truly have, from within your container, over swap and overcommit configuration (in order to influence the outcome of the enforcement.)

Now, in order to actually move forward I'd say you're left with two options:

  • switch to a larger instance, or
  • put some coding effort into more effectively controlling your script's memory footprint

NOTE that the coding effort may be all for naught if it turns out that it's not you, but some other guy collocated in a different instance on the same server as you running amock.

Memory-wise, we already know that subprocess.Popen uses fork/clone under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec a puny 10kB executable such as free or ps. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM.

Alternatives to fork that do not have this parent page tables etc. copy problem are vfork and posix_spawn. But if you do not feel like rewriting chunks of subprocess.Popen in terms of vfork/posix_spawn, consider using suprocess.Popen only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free/ps/sleep and whatever else in a loop parallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.

HOWEVER, in your particular case you can skip invoking ps and free altogether; that information is readily available to you in Python directly from procfs, whether you choose to access it yourself or via existing libraries and/or packages. If ps and free were the only utilities you were running, then you can do away with subprocess.Popen completely.

Finally, whatever you do as far as subprocess.Popen is concerned, if your script leaks memory you will still hit the wall eventually. Keep an eye on it, and check for memory leaks.

79
10/31/2015 2:45:42 AM

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon