SimGrid 3.7.1
Scalable simulation of distributed systems
What is GRAS
Tutorial
Table of content
  1. What will you find here
  2. Further readings
  3. What is GRAS
    1. GRAS allows you to run both in simulation mode and on real platforms
    2. GRAS was designed for distributed computing, not parallel computing
    3. GRAS was designed for large scale computing
    4. GRAS targets at applicative overlay rather than end-user application
    5. GRAS tries to remain simple to use
  4. The model provided by GRAS
    1. Event types
    2. Communication model
    3. Timing policy
    4. Error handling through exceptions
    5. RPC messaging
  5. What's next?

What will you find here

Further readings

After this page, you may find these one interesting: HOWTO design a GRAS application. If you're new to GRAS, you may want to read the initiatic tour first, begining with Lesson 0: Installing GRAS or Lesson 1: Setting up your own project.


What is GRAS

GRAS is a framework to implement and study distributed algorithms. It provides a simple communication API to allow several processes to interoperate through the exchange of messages. This is quite classical, and GRAS differs from the other existing messaging API by several points:

We now detail each of these points.

GRAS allows you to run both in simulation mode and on real platforms

We wrote two implementations of the interface: the first one is built on top of the SimGrid simulator, allowing you to run your application in a controled environment, which reveals precious to debug and study algorithms. Everyone who tried to run even simple tests on more than 100 real machines will consider a simulator as a nirvana.

The experiments can be reproduced in the exact same conditions (which is somehow hard in real settings), allowing for example to reproduce a bug as many times as you want while debugging. You can also test your algorithm under experimental conditions which you couldn't achieve on a real platform (like a network topology and/or size you do don't have access to). Under some conditions, SimGrid simulations are also much faster than real executions, allowing you to run more experiments in less time.

Once you assessed the quality of your algorithm in the simulator, you can deploy it on real platforms using the second implementation of the library. Usually, taking an algorithm out of a simulator implies an almost complete rewrite. There is no need to modify your program for this in GRAS. You don't even need to recompile it, but simply to relink your program it against the right library.

GRAS applications running on real hardware deliver high performance. The sequential parts of your code are not mediated by GRAS or slowed down anyhow. The communications use advanced data exchange and conversion mecanism ensuring that you are likely to get performance at least comparable to other communication solutions (FIXME: cite the paper once it gets accepted).

GRAS applications are portable on several operating systems (Linux, MacOS X, Solaris, IRIX, AIX and soon Windows) and several processor architectures (x86, amd64, ppc, sparc, etc). Moreover, GRAS processes can interoperate efficiently even when deployed on differing material. You can for example have a process deployed on ppc/MacOS X interacting transparently with another one deployed on alpha/Linux.

The simulation mode of GRAS is called usually SG (for SimGrid) while the in situ execution mode is called RL (for Real Life).

GRAS was designed for distributed computing, not parallel computing

In GRAS, you build your algorithm as a set of independent processes interacting through messages. This is the well known MPMD model (multiple program, multiple data). It contrasts to the SPMD model (simple program, multiple data) and communications solutions such as MPI or PVM, where you build an uniq program with conditionnals here and there specifying what each processes should do (something like "If I'm process number 0, then send data to the others, else get the data sent to me").

None of these models are inherently better than the other, and there is a plenty of algorithms betterly expressed in the SPMD paradigm. If your program falls into that category, then GRAS may not be the right tool for you. We think however that most non-sequential algorithms can be expressed gracefully in a MPMD way where some are really difficult to express in a SPMD way.

There is no parallelism in GRAS, and it is discouraged to introduce threads in GRAS (althrough it should be possible in some months). This is an explict choice since threads are so hard to use (see the section GRAS tries to remain simple to use below). The framework itself do use threads to achieve good performances, but I don't want to impose this to users (FIXME: actually, GRAS is not multi-threaded yet internally, but I plan to do so really soon).

GRAS was designed for large scale computing

Another difference to the MPI communication libraries is that GRAS was not designed for static small-sized platforms such as clusters, but to dynamic larger-scale platforms such as grids. That is why GRAS do include static membership solutions such as the MPI channels. Support for fault-tolerance is also provided through the timeouts on communication primitives and through an exception mecanism.

GRAS also comes with a sister library called AMOK containing several usefull building block for large scale network aware applications. The most proheminent one allows to assess the network availabilities through active testing, just like the classical NWS tool in the grid research community. We are actively working on a network topology discovery mecanism and a distributed locking solution. Some other modules are planned, such as reliable broacasting in open environments.

GRAS targets at applicative overlay rather than end-user application

The application class targeted by GRAS is constituted of so called overlays. They do not constitute a complete application by themselves, but can be seen as a "distributed library", a thing offering offering a service to another application through a set of physically distributed entities. An example of such overlay could be a monitoring system allowing you to retrieve the available bandwidth between two remote hosts. It could be used in a network-aware parallel matrix multiplication library assigning more work to well interconnected nodes. I wouldn't advice to build a physical or biological compututation program on top of GRAS, even if it would be possible in theory.

In other words, GRAS is not a grid middleware in the common understanding of the world, but rather a tool to constitute the building bricks of such a middleware. GRAS is thus a sort of "underware" ;)

GRAS tries to remain simple to use

A lot of effort was put into the framework so that it remains simple to the users. For example, you can exchange structured data (any kind of C data structure) just by passing its address, and the framework will create the exact same structure on the receiver side.

There is no threads like the pthread ones in GRAS, and it is not planned to introduce this in the future. This is an explicit choice since I consider multi-threading as too complicated for usual users. There is too much non-determinism, too many race conditions, and too few language-level constructs to keep yourself from screwing up. This idea is well expressed by John Ousterhout in Why Threads Are a Bad Idea (for most purposes), published at USENIX'96. See section GRAS was designed for distributed computing, not parallel computing for platform performance consideration.

For the user code, I plan to allow the co-existance of several "gras processes" within the same regular unix process. The communication semantic will still be message-oriented, even if implemented using the shared memory for efficiency.

Likewise, there is no interuption mecanism in GRAS which could break the user code execution flow. When you write a function, you can be absolutely sure that nothing will happen between each lines of it. This assumption considerably simplify the code written in GRAS. The main use of of interruptions in a distributed application is to timeout communications when they fail. GRAS communication calls allow to setup a timeout value, and handle it internally (see below).

The only interruption mecanism used is constituted by exceptions, just like in C++ or Java (but implemented directly in C). They are propagated from the point where they are raised to a point where they will be trapped, if any, or abort the execution when not trapped. You can still be certain that nothing will happen between two lines of your code, but the second line may never be executed if the first one raises an exception ;)

This exception mecanism was introduced because without it, user code has to be loaded by tons of non-functional code to check whether an operation was properly performed or whether you have to pass the error condition to your caller.


The model provided by GRAS

From a more formal point of view, GRAS overlays (=applications) can be seen as a set of state machines mainly interacting with messages. Because of the distributed setting of overlays, the internal state of each process cannot be accessed or modified directly by other processes. Even when it would be possible pratically (like in SG), it is forbidden by the model. This makes it difficult to gain a complete knowledge on the global system state. This global system state can still be defined by agregating the states of each processes, but this remains theoretical and impratical because of the probable combinatorial explosion.

Event types

Two main types of events may change the internal state of a given process:

Messages are sent using the gras_msg_send function. You should specify the receiver, the message type and the actual payload. This operation can happen at any time of your program. Message sending is not considered as a process state change, but rather as a reaction to an incoming event. It changes the state of another process, though. Trying to send messages to yourself will deadlock (althrough it may change in the future).

Communication model

Send operations are as synchronous as possible pratically. They block the process until the message actually gets delivered to the receiving process. An acknoledgment is awaited in SG, and we consider the fact that RL does not the same as a bug to be fixed one day. We thus have an 1-port model in emission. This limitation allows the framework to signal error condition to the user code in the section which asked for the transmission, without having to rely on an interuption mecanism to signal errors asynchronously. This communication model is not completely synchronous in that sense that the receiver cannot be sure that the acknoledgment has been delivered (this is the classical byzantin generals problem). Pratically, the acknoledgment is so small that there is a good probability that the message where delivered. If you need more guaranty, you will need to implement better solutions in the user space.

As in SimGrid v3.3, receive operations are done in a separated thread, but they are done sequentially by this thread. The model is thus 1-port in reception, but something like 2-port in general. Moreover, the messages not matching the criterion in explicite receive (see for example gras_msg_wait) are queued for further use. Thanks to this specific thread, the emission and reception are completely decorelated. Ie, the main thread can perfectly send a message while the listener is receiving something. We thus have a classical 1-port model.

Here is a graphical representation of a scenario involving two processes A and B. Both are naturally composed of two threads: the one running user code, and the listener in charge of listening incoming messages from the network. Both processes also have a queue for the communication between the two threads, even if only the queue of process B is depicted in the graph.

The experimental scenario is as follows:

gras_comm.png

This figure is a bit dense, and there is several point to detail here:

Timing policy

All communication primitives allow 3 timout policies: one can only poll for incoming events (using timeout=0), wait endlessly for the communication to be performed (using timeout<0) or specify a maximal delay to wait for the communication to proceed (using timeout>0, being a number of seconds).

Again, this describes the targeted model. The current implementation does not allow to specify a delay for the outgoing communication. In SG, the delay is then hardcoded to 60 seconds while outgoing communication wait for ever to proceed in RL.

Another timing policy we plan to implement in the future is "adaptative timeouts", where the timeout is computed automatically by the framework according to performance of previous communications. This was demonstrated for example in the NWS tool.

Error handling through exceptions

As explained in section GRAS tries to remain simple to use, any function may raise exceptions breaking their execution. No support is provided by the framework to ensure that the internal state remains consistent when exceptions are raised. Changing this would imply that we are able to checkpoint the internal state to provide a transaction service, which seems quite difficult to achieve efficiently.

RPC messaging

In addition to the one-way messages described above, GRAS supports RPC communication. Using this, a client process asks for the execution of a callback on a server process. RPC types are close to regular message types: they are described by a type (a string), a payload type for the request, but in addition, they also have a payload type for the answer from the server to the client.

RPC can be either synchronous (the function blocks until an answer is received) or asynchronous (you send the request and wait later for the anwer). They accept the same timing policies than regular messages.

If the callback raises an exception on the server side, this exception will be trapped by the framework on the server side, sent back to the client side, and revived on the client side. So, if the client calls a RPC which raises an error, it will have to deal with the exception itself. No provision is given concerning the state consistency on the server side when an exception arise. The host fields of the exception structure indicates the name of the host on which it was raised.

The callback performing the treatment associated to a RPC can perform any kind of communication itself, including RPC. In the case where A calls a RPC on B, leading to B calling a RPC on C (ie, A->B->C), if an exception is raised on C, it will be forwarded back to A. The host field will indicate C.


What's next?

Now that you know what GRAS is and the communication model used, it is time to move to the Initiatic tour section. There, you will build incrementally a full-featured GRAS application demonstrating most of the aspects of the framework.


Back to the main Simgrid Documentation page The version of SimGrid documented here is v3.7.1.
Documentation of other versions can be found in their respective archive files (directory doc/html).
Generated by doxygen