The Python developers made sure that the API between
multiprocessing is similar so that switching between the two variants is easier for programmers.
Passing data between multiprocessing processes
Because data is sensitive when dealt with between two threads (think concurrent read and concurrent write can conflict with one another, causing race conditions), a set of unique objects were made in order to facilitate the passing of data back and forth between threads. Any truly atomic operation can be used between threads, but it is always safe to stick with Queue.
Most people will suggest that when using queue, to always place the queue data in a try: except: block instead of using empty. However, for applications where it does not matter if you skip a scan cycle (data can be placed in the queue while it is flipping states from
queue.Empty==False) it is usually better to place read and write access in what I call an Iftry block, because an 'if' statement is technically more performant than catching the exception.
The multiprocessing module
Here, each function is executed in a new process. Since a new instance of Python VM is running the code, there is no
GIL and you get parallelism running on multiple cores.
Process.start method launches this new process and run the function passed in the
target argument with the arguments
Process.join method waits for the end of the execution of processes
The new processes are launched differently depending on the version of python and the plateform on which the code is running e.g.:
- Windows uses
spawnto create the new process.
- With unix systems and version earlier than 3.3, the processes are created using a
Note that this method does not respect the POSIX usage of fork and thus leads to unexpected behaviors, especially when interacting with other multiprocessing libraries.
- With unix system and version 3.4+, you can choose to start the new processes with either
multiprocessing.set_start_methodat the beginning of your program.
spawnmethods are slower than forking but avoid some unexpected behaviors.
POSIX fork usage:
After a fork in a multithreaded program, the child can safely call only async-signal-safe functions until such time as it calls execve.
Using fork, a new process will be launched with the exact same state for all the current mutex but only the
MainThread will be launched.
This is unsafe as it could lead to race conditions e.g.:
- If you use a
MainThreadand pass it to an other thread which is suppose to lock it at some point. If the
forkoccures simultaneously, the new process will start with a locked lock which will never be released as the second thread does not exist in this new process.
Actually, this kind of behavior should not occured in pure python as
multiprocessing handles it properly but if you are interacting with other library, this kind of behavior can occures, leading to crash of your system (for instance with numpy/accelerated on macOS).
The threading module
In certain implementations of Python such as CPython, true parallelism is not achieved using threads because of using what is known as the GIL, or Global Interpreter Lock.
Here is an excellent overview of Python concurrency: