It does not matter where you run your code, either on your local multi-core machine or on a cluster: Parallel programming approaches have become pivotal when it comes to scientific computing.
At the same time Python has become very popular and powerful. Hence, I recently started to learn more about parallel coding approaches in Python. There are two main approaches: using threading modules (either the low level _thread or the higher level threading module) or the multiprocessing module. The reason for the two different approaches is CPython’s GIL (Global Interpreter Lock) that the threading modules rely on. GIL is the subject of controversy because it prevents running true parallel applications. I am aware of the contradiction caused by using a higher level language like Python on the one hand and craving optimization on the other hand. Nevertheless, the multiprocessing module, which side-steps GIL, seems to be a good place to dive into parallel coding. Another advantage of the multiprocessing module is its higher level approach and own memory space, i.e., global variables are not automatically shared. Therefore, multiprocessing provides its own objects for shared memory (Array and Value).
After going through several excellent tutorials, e.g. by Doug Hellman ( pt. I , pt. II ) or Norm Matloff’s (UC Davis) book (chapter 14), I created the following template for parallel simulations. Simulations are particularly adaptable because they usually don’t share memory (communication between threads is computationally expensive) and run independently. However, the simulation results need to be stored, usually in an array. Here, the result objects are defined in line 43&44. The associated Locks are defined in line 46&47 and they take care of race conditions: a delicate synchronization issue that arises when two or more processes try to access shared memory.
#!/usr/bin/env python
# Multiprocessing simulation template
#
# Lars Seemann
# lseemann [at] uh.edu
import time
import random
import multiprocessing as mp
class glbls: # globals, other than shared
DIM_RESULTS = 5 # dimensionality of simulation result
thrdlist = [] # list of all instances
def simulation(rnd):
#______ACTUAL SIMULATION IN HERE________
# waste some time and get random results for now
tmp_count=0
for i in xrange(int(1e5)): # usually, the simulation
tmp_count+=random.random() # computes and accesses memory
result = []
result = []
for i in range(glbls.DIM_RESULTS):
result.append(rnd.gauss(mu=0, sigma=i+1))
return result
def worker(id,mynreps,rslt,rsltlock,rslt_sq,rsltsqlock,n_sim,nsimlock):
rnd = random # set up random number generator
for n in range(mynreps):
tmp_result = simulation(rnd)
nsimlock.acquire()
n_sim.value += 1
nsimlock.release()
for (i,x) in enumerate(tmp_result):
rsltlock.acquire() # acquire: make sure shared global is
rslt[i] += x # hold by one process only
rsltlock.release() # release
rsltsqlock.acquire()
rslt_sq[i] += x*x
rsltsqlock.release()
print "\tProcess%d ran %d simulations"%(id,mynreps)
def main(N_MC = 1e3 , N_THREADS = 2):
nreps=int(N_MC)/N_THREADS
result=mp.Array('d', glbls.DIM_RESULTS) # shared array object
result_sq=mp.Array('d',glbls.DIM_RESULTS) # shared array object
num_sim=mp.Value('i',0) # shared number array updates
rslt_lock = mp.Lock() # lock
rslt_sq_lock = mp.Lock() # lock
nsim_lock = mp.Lock() # lock
# Create and start processes
t_0=time.time()
for i in range(N_THREADS):
p = mp.Process(target=worker,args=(i,nreps,result,rslt_lock,result_sq,rslt_sq_lock, num_sim,nsim_lock))
glbls.thrdlist.append(p)
p.start()
for thrd in glbls.thrdlist:thrd.join() # wait till all processes finish
t_1=time.time()
Actual_N_MC = nreps * N_THREADS # actual nr of simulations
if num_sim.value!=Actual_N_MC: # consistency check
raise ValueError("Inconsistent number of simulations! %d,%d"%(Actual_N_MC,num_sim.value))
# Assess result
for i in range(glbls.DIM_RESULTS):
mean=result[i]/Actual_N_MC
std_dev=((result_sq[i])/Actual_N_MC - (result[i]/Actual_N_MC)**2)**(0.5)
print "\tN(mu=0,sigma=%d) : sample mean=%.4f \tstd dev=%.4f"%(i+1, mean, std_dev)
t = t_1-t_0
print "Time elapsed: %ds"%t
return t
if __name__ == '__main__':
main()
The class glbls define global variables: the dimensionality of the simulation results DIM_RESULTS (here set to 5) and a list of all process instances.
simulation contains the actual simulation. I am planning to use this template for an agent-based model. In general a simulation can return several quantities of interest. These quantities are returned as a list whose dimensionality must be defined as the global variable DIM_RESULTS. For demonstration purposes I just create normal distributed variables with zero mean and different variance.
The worker runs the simulation multiple times and stores the result in the shared results arrays. Note that the input of worker contains two multiprocessing.Array objects which are shared by all threads. I also keep the squared results to calculate the standard deviation later on.
main creates, starts, and joins all threads. Input is the number of monte carlo runs N_MC and number of threads N_THREADS. Optimal performance can usually achieved by setting the number of threads to the number of cores of the cpu. Once all processes ran, the results can be assessed. Here I just calculate the sample mean and sample standard deviation of the normal random numbers.
I hope this template can be utilized for different purposes. Feedback is always appreciated.