Project Goal and Usage

SMPI aims at allowing the simulation of MPI programs within SimGrid. No source modification is required for that, just a recompilation. Then, you can use SimGrid to let your code "run" on top of simulated platforms. For more details on this project as a scientific project, please refer to the relevant publication (or the corresponding slides).

The rest of this page details how SMPI is implemented, and give insider informations about the underlying research effort. If you just want to use SMPI, you should refer to the online documentation. The documentation for SimGrid v3.7 is here. If a new version is available, check the file group__SMPI__API.html in the doc directory of the archive (and please drop me a reminder mail so that I update this page).

Modeling the MPI performance

SMPI is still a very active domain of research in our group: This is because getting MPI codes running on top of the simulator is not that difficult (you simply have to reimplement parts of the MPI interface on top of the simulator one), but getting realistic simulation's results is ways more challenging. You have to come up with a model of the interactions between the platform, the MPI runtime and the applications. This is the current frontier for us.

To face this difficulty, and because our group is geographically distributed, we decided to conduct this research openly on the Internet. We make our Open Lab Notebooks publicly available on the Internet. The full list of existing lablogs is given below while the findings are summarized on this page. Also, all our scripts and traces are available from github. Please remember that this material are mainly written to be useful within our group. If they reveal useful to you, we're happy (and we'd love to hear from you), but if you don't understand them, I fear that there is not much that we can do to improve it...

Testing Methodology

The main goal for now is to understand the MPI runtime's performance. For that, we follow a physician approach, doing experiments on an instrumented platform and visualizing the results. Later, we will devise a model reproducing what we observe, and implement it in SimGrid. Our settings comprise several tools, detailed bellow In addition, the test programs are executed on the Grid'5000 clusters.

  • Synthetic Applications. We use coNCePTuaL to that extend. It is a "A Network Correctness and Performance Testing Language", that is, a domain-specific language to write network benchmark programs. The experiment description language is much more expressive than the counter-part of SkaMPI. Also, it does not induce any hidden statistical treatment on the results, but that's less relevant because actually we don't use the logs of coNCePTuaL (see below).
    Although a bit verbose, the coNCePTuaL language is quite easy to understand. Some examples can be found on the project webpage, and below. We compile these coNCePTuaL scripts into C+MPI programs using the provided tools. Numerous other formats are provided, and we may be interested in the compilation into C+raw sockets at some point to isolate the network modeling from the MPI runtime modeling.

  • Instrumenting Application Execution. We use Akypuera to capture traces of the tested applications. This software is written by Lucas Schnorr, and is thus very efficient and pleasant to use. Basically, you simply have to link you C+MPI program against the library to get it producing trace files. The whole procedure is detailed on the dedicated wiki page on github.

  • Visualizing the result. Pajé is an anoying trace-based visualization tool. It is anoying because its advanced features make it very difficult to avoid, but its gnustep interface is really really frustrating. It is the result of a thesis work. It is very versatile, highly scalable and you can trust it. But we will have to rewrite it at some point to make it a bit more user friendly, I guess. You can get it from Debian and derivative, or directly from sourceforge. An example of visualization is available here (its context is here)

  • Statistically analyzing the result. Visualization is just perfect to help us building hypothesis about the data, but other tools are needed to assess them. For that, Arnaud hacked little perl script allowing to turn a Pajé trace into a file that you can load in R.

Initial Hypothesis

TODO (http://mescal.imag.fr/membres/arnaud.legrand/blog/2012/01-january/2012-01-24_16:43.php#sec-5)

Designed Experiments

http://localhost/wiki/blog/2012/0127/lablog_2012-01-27/

Findings

TODO

Detailed list of the existing LabLogs

Since this list is manually maintained, some posts may be missing. I do my best so that it does not happen, however. I also welcome mails reporting such errors when they occure anyway.