Genesis 2.2 PDF Print E-mail
Written by Rizki Noor Hidayat Wijayaź   

Reference Manual Nigel Goddard:

Introduction Parallel Genesis allows a modeler to distribute a simulation across a compuational platform that supports the PVM message passing library. This includes most supercomputers and workstation networks. There is a natural mapping from network-level models to a parallel computer, but Parallel Genesis is also capable of executing a single-cell model on a parallel platform. This document describes the language extensions for parallel programming in the Genesis script language. The first section introduces the programming model and the second section specifies the syntax and semantics of each new script language extension.

PVM and Parallel Genesis

The Parallel Virtual Machine (PVM) system provides the illusion of a parallel platform.The PVM system may run on a single CPU or multiple CPUs, possibly of different type. In the rest of this document when we refer to the *parallel platform* we mean the illusion provided by PVM. When we refer to the *parallel machine*, we mean the physical set of CPUs and network connecting them that PVM is running on. An executing PVM program consists of user processes, typically one per CPU, which communicate via the PVM daemon which runs on each participating CPU. In parallel Genesis, each user process is an independent Genesis simulation, which we call a *node* of the parallel simulation. Nodes are uniquely identified by a node number (consecutive integers starting at zero).

Zones

Nodes can be grouped in *zones* when the simulation is started. Each node is in exactly one zone (by default, every node is in zone 0). The zones form a fixed partition of the parallel platform. The motivation for zones is to allow different parts of the simulation to run asynchronously (uncoordinated). For example, in a parameter search application one might wish to run many instances of a four-node model in parallel. Each instance uses four nodes which must run synchronously, but the instances need not be coordinated (except at start and finish). Thus we can run each instance in a separate zone, each zone containing four nodes. Zones are uniquely identified by consecutive integers starting at zero. The nodes within a zone are uniquely identified with a znode number (consecutive integers starting at zero).

Programming model

Namespace (memory), the Parallel library currently provides a private-namespace programming model.This means that each node has no knowledge of the elements that reside on other nodes.This implies that every reference to an element on another node must specify the node explicitly.It is envisioned that a shared-namespace programming model will be implemented eventually.This will allow nodes within a zone to reference elements on other nodes in the zone without specifying the node number.To ease upgrade of parallel models to the shared-namespace paradigm, it is recommended that element names within a zone be unique.If this recommendation is not adhered to, there will be naming conflicts if a model wishes to take advantage of the shared-namespace cabability when it becomes available.

Execution (threads and synchronization), the main thread of control on each node is that which reads commands from the script file (or keyboard if the session is interactive). The parallel library provides limited capabilities for this thread to create new threads on any node.On each node the threads are pushed onto a stack with the main thread at the bottom of the stack.Only the topmost thread may execute and when it completes it is popped off the stack so that the next thread down can continue.Threads ready to execute are NOT guaranteed to execute: if the topmost thread is blocked or looping, no ready thread lower on the stack can continue.An executing thread is guaranteed to run to completion (assuming it does not contain an infinite loop or block on I/O) so long as it executes only local operations, i.e. no operations that explicitly or implicitly involve communication with other nodes.

The commands descriptions below include specification of local or non-local status. In addition, simulation steps and reset are by definition non-local operations if there is more than one node in the zone.Users are strongly encouraged to use only local operations in child threads whenever possible.Users need to be very careful about thread creation to ensure that deadlock (no thread can continue) does not occur.

The parallel library provides facilites for blocking and non-blocking thread creation, usually used to execute commands on nodes different from the one the script is being executed on (*remote* nodes).When a thread (including the main script) initiates a blocking remote thread (also known as remote procedure call or remote function call), it waits until the thread completes before continuing.When a thread initiates a non-blocking remote thread (an asynchronous thread), it continues immediately without waiting for termination of the thread. While a thread is waiting, the node can accept a request for thread creation arriving from any node (including itself).

This new thread is pushed on the thread stack and executed, so that the original waiting thread does not continue until the new thread has completed.Scripts running on different nodes can synchronize via several different synchronization primitives.There are two types of barrier (each script waits at a barrier until all have reached it), one which involves all nodes in a zone, the other involving all nodes in the parallel platform.By default there is an implicit zone-wide barrier before a simulation step is executed, although this can be disabled. Pairwise synchronization of nodes is also possible.When a script requests that a command be run asynchronously on another node it initiates a child thread of control on the other node.

The child thread runs asynchronously with its parent.The parent can request notification or the child*s result when the child completes, and can wait on that notification or result (a *future*). By default all threads block for child completion before each simulation step although this feature is modifiable by the user.It is easy to reach deadlock (no thread able to continue) if the creation and execution of threads is handled carelessly.If a node initiates several child threads on a particular remote node, these are guaranteed to commence (but not necessarily complete) execution in the order in which they were initiated.A thread is guaranteed to execute eventually so long as no preceding thread 1) enters a loop which only executes local operations, or 2) blocks indefinitely because of deadlock.Once execution of a thread begins, it runs to completion without interruption so long as it only executes local operations.

Simulation and scheduling

The parallel library provides the ability to set up a Genesis message between to objects on different nodes (a remote message), provided the nodes are in the same zone.Data is physically transferred from one node to the next at the beginning of a simulation step (see section ??? for user control of this process).This means that there is no transfer of data between objects within a single timestep, which has ramifications for the schedule.The parallel library guarantees that execution on a parallel platform will be identical to that on a single processor if and only if there are no remote messages for which the source object precedes the destination object in the schedule (we assume that every node in a zone has the same schedule).

[download: 4,95 MB]