In the previous part of this series, we tried to do some very simple parallel computing (without being “embarrassingly parallel”) using MPI. The task was simple: have several independent processes(procs) guess random integers from 1 to 10, scoring points for integers haven’t been guessed yet. We achieved something kind of close to this, but we weren’t able to de-sync the procs. Basically, even by making some procs slower tha others– each was still getting the same number of guesses over the full computation. An obvious reason for this is that the previous treatment was symmetric with respect to the procs. Each was treated as the same as all the others and, once per loop, we used comm.allreduce to sync up the lists.
So, let’s try something different. We will re-write the code so that one proc is special. It will not be guessing, but just serving as a hub for the other processes to send information to. We will assign proc 0 to be this middle man. Here is some updated code, based on where we left off:
Does this approach seem plausible to you? Take some time to make sure you understand what we are trying to do. Go ahead and try to run it with just 3 processes (it isn’t going to work, but try it anyway for the experience.)
Why did we fail again?
You should have found that the program will hang unless every guessing procedure sends a message. The reason? MPI really really really needs to have a one-to-one correspondence with messages sent and messages received. So, when it comes to a line like comm.recv(source=1) and proc 1 hasn’t sent a message, it is just going to hang there until it gets a message. But, this also stops the loop from processing, and so it totally halts all the other processes. Thus, no message will ever come. The most straightforward solution to this is to have a boolean flag for each procedure that proc 0 can look at to see if there is or is not a message incoming. Then, we can make it only comm.recv on procs that have messages to be received. See below for the inclusion of this boolean flag.
Ok, now this is starting to look more like it! Give it a go. Can you get it to run? (this time, it should)
Did we fail yet again?
Unfortunately, it looks like we did! Here is a sample of a run of this code:
They are still getting the same number of guesses, when one of the processes should be able to guess about two times faster than the other! The issue is that despite conceptually disentangling proc 0 from the other procs, the message passing that we are using is fundamentally what is called “blocking” communication. This means that, even on the very first loop, proc 0 cannot continue until it gets a message from every other proc. We have avoided the error by using the “message” boolean– but we have kicked the parallelization can down the road because it still synchronizes the loops. The loop cannot go on to the next iteration until proc 0 receives that boolean message from each procedure.
Another issue is the comm.bcast, which also requires that all the different processes “catch up” before beng able to continue. This could be worked around as well, using more explicit comm.send and comm.recv calls; but again, it wont solve the fundamental issue.
The solution is to use explicitly “nonblocking” communication instead. This is a set of MPI bindings that are structured to not hold up other processes. I think we have finally pinpointed the issue, so my next post will definitely be about how we can finally play the guessing game we deserve!