Communication in parallel simulations

In parallel simulations, communication between the cpus may be a challenge. In most cases, this is handled automatically by Asap, but sometimes you will need to communicate in your own modules. This is most likely to happen in on-the-fly data analysis.

This page falls in three parts. The first part explains how normal Asap communication works. The second part explains how you can use these pre-existing communication patterns in your own module. The third part explains how you can roll your own communication patterns from scratch, using the Message Passing Interface (MPI).

In the following, a task refers to the process running on each cpu/core (this is consistent with usual MPI terminology). Thus, if a calculation is running on eight cpus/cores, there will be eight tasks, each taking care of atoms belonging to one part of space. A neighboring task is a task responsible for an adjacent part of space.

Normal Asap communication

When an Asap calculator calculates forces, energies or stresses in a parallel simulation, the following communication steps occur:

  1. Global communication of changes: Each task determines if the atoms have been changed, and if so if any atom has been moved sufficiently to trigger a migration. This is then communicated globally, so that all tasks proceed to the following steps even if atoms have only been modifies in some tasks.

  2. Migration (not every time): If any atom has moved sufficiently outside the part of space handled by its task, it is necessary to migrate atoms. All data about atoms that should be handled by another task is moved to that task.

  3. Ghost atoms update: Ghost atoms are atoms that are not in the part of space handled by this task, but are sufficiently close that they interact with atoms handled by this task. Before a calculation can proceed, the positions and atomic numbers of all ghost atoms must be transmitted from the task responsible for them. The calculation can now proceed.

  4. Additional ghost update in some calculators: Some calculators need an additional communication step halfway through the calculation, where intermediate results for the ghost atoms are communicated. This is for example the case for EMT, but not for LennardJones.

Using Asap’s communication patterns

Migrating your own data

If you have your own data about the atoms, you will need to migrate that data along with the atoms as they migrate. Example: During an MD simulation, you calculate the average position of all the atoms. This average value is associated with the atoms, and needs to remain associated with the right atoms when atoms are migrating between processors.

Fortunately, Asap can do this for you. If you store data on the atoms with atoms.set_data(name, array), that data is now associated with the atoms, and will migrate along with them. Access the data with atoms.get_data(name). Just be sure that all such data is stored on the atoms when the calculator is invoked (and migration may happen): after a call to atoms.get_forces(), atoms.get_potential_energy() etc any data in local variables of your script may not any longer be associated with the right atoms.

Piggybacking on the ghost atom mechanism

If your on-the-fly analysis not only needs information about the atoms, but also about their neighbors, difficulties arise as the neighboring atoms may be on other processors. In some cases, Asap’s internal mechanisms may help.

If you need information about an atom’s neighbors at distances closer than the cutoff used in the Calculator, you can access the calculator’s neigbor list object. This object will return information about neighbors even if they are on other processors. The positions are returned directly by the XXXX method, it also returns the indices of the atoms. If the index is larger than the number of atoms on this processor, it refers to “ghost atoms” on other processors. Data for the ghost atoms can be found in the atoms.ghosts dictionary, it normally contains ‘positions’ and ‘numbers’ (atomic numbers). To access e.g. the atomic numbers of the ghost atoms, use atoms.ghosts['numbers'][i-len(atoms)], where i is the index returned by the neighbor list.

Other kinds of data may be added to the atoms.ghosts dictionary. Add an array of the same name and type as an array in atoms.arrays, with the appropriate shape (i.e. the first dimension should be the number of ghosts, the remaining dimensions should be the same as in atoms.arrays). After the next force/energy calculation, the ghost data will be updated. If you add the ghost data before the first calculation, the first dimension of the array should be 0.

Beware that calling e.g. atoms.get_forces() does not guarantee update of the ghost data, if any other kind of calculation has been performed since last time the atoms were moved. An update may be guaranteed by invalidating the neighbor list, that will also trigger a migration.

Communication using message passing

The module asap3.mpi exposes the default MPI communicator, world, which can be used to perform your own communication. The interface is the same as the similar module in GPAW, any divergences that may arise should be regarded as bugs.

The MPI Communicator object

The MPI communicator objects (e.g. world) have the following attributes:

size:

The number of MPI tasks.

rank:

The rank of this task.

In addition, it has the following methods:

send(a, dest, tag=123, block=True):

Send the array a to task dest using tag tag.

If block is True, a blocking send is used and None is returned by the method.

If block is False, a non-blocking send is used, and the array a may not be modified until the communication has completed. The method returns a Request object that can be used to detect when this occurs (see the wait method of the communicator object). The Request object keeps a reference to the array a until the communication has completed.

receive(a, src=-1, tag=123, block=True):

Receive data from task src into array a; if src is -1 then data is accepted from any task. Important: The array a must be contiguous, and of the correct type and size. This is not checked!

If block is true, the method returns when data is received, and the return value is the task from which it was received.

If block is False, the method returns immediately, returning a Request object that can be used to query if communication has occurred. The array a should not be accessed until the communication has completed.

sendreceive(a, dest, b, src, desttag=123, recvtag=123):

Simultaneously send array a to src and receive array b from dest. The same effect may be obtained by overlapping a nonblocking send with a blocking receive or vice versa, but sendreceive may give better performance with some MPI implementations.

probe(src=-1, tag=-1, block=False):

Check if data is available from task src with tag tag (-1 means any task/tag). If data is available, the method returns a tuple containing the number of bytes, the source and the tag. If no data is available, None is returned. If block is True, None is never returned, instead the method waits until data is available.

barrier():

Wait until all the other tasks have also called this method.

wait(request):

Wait until the non-blocking communication represented by the Request object request has completed. Equivalent to request.wait().

test(request):

Tests if a communication has completed, returns True if complete, False if not. Cleans up after the communication if it is complete, so you do not have to call wait(), but doing so is allowed. Equivalent to request.test().

waitall(requests):

Takes a sequence (list, tuple or similar) of requests, and waits until all of them have completed.

testall(requests):

Takes a sequence of requests and tests if they have all completed. If just one is still incomplete, False is returned.

sum(data, root=-1):

Sums the scalar or array data on all tasks. If root is -1, all tasks receive the result, otherwise only the task root receives it. If data is a scalar, the method returns the result, if it is an array it is overwritten with the result.

product(data, root=-1), min(data, root=-1) and max(data, root=-1):

As sum(...), but performs a product, a min, or a max operation instead of a sum. Unlike sum(...), these methods do not support complex data.

broadcast(data, root):

Send the array data from task root to all other tasks. The array must have the same size and type on all tasks.

scatter(a, b, root):

Send data from array a on task root to the array b on all tasks, such that task N receives the Nth part of a. The size of array a must thus be the size of b times the number of tasks.

all_gather(a, b):

Data from the a arrays on all tasks are gathered into the b array. The size of the b array must be the size of a times the number of tasks. This version of the call updates b on all tasks.

gather(a, root, b=None):

As allgather, but the result is only available on task root.

new_communicator(ranks):

Create a new communicator object containing the tasks listed in ranks.

name():

Returns the processor name.

abort():

Aborts all tasks. For emergency use only!

Request object

The send and receive methods of the communicator object will return a Request object if called with block=False. The request object should be used to wait for the communication to completed by calling the wait and/or test method.

The request object has the following methods:

wait():

Wait until the communication has completed, and release the reference to the array used for the communication.

test():

Test if the communication has completed, return True if complete, False if not. After True has been returned, the reference to the array used for the communication has been discarded.

It is safe to call wait() and test() multiple times on the same object. Once test() has returned True or wait() has been called, any subsequent call to test() returns True.

You should not discard a request until wait() has been called or test() has returned True or you have by some other means insured that the communication has completed. Failure to insure this may cause your script to hang until communication has completed, while garbage collecting the request object.