This article is the first in a serie about my recent work on SimGrid:


During the ten days, I did almost 200 commits in SimGrid. I must have doubled or even tripled my 2015 score in one week! Initially, I wanted to write a little POC for a meeting. We wanted to run any application on top of SimGrid, provided that you rewrite the communication code of that application to use a SimGrid-provided class instead of TCP.

I thought I would only have to write a little REST server that would allow to run and remotely control a SimGrid simulation. The REST clients would be the little codes embedded in the application in place of TCP interactions, and the server would answer the simulated answer. That would also require reimplementing the SimGrid user interface on top of the RESTful protocol, but that sounded easy too.

But the existing SimGrid interface, called MSG, is really messy so this sounded like a perfect occasion to rebuild it from the ground. Problem: SimGrid 3.12 was almost ready to release, and 3.11 was released one year ago already while we aim at two releases per yer. So I had to first finish the cleanups of 3.12 and release it before starting the new interface that would enable me to write my remote server.

And that's how I came into doing 160 commits in one week instead of enjoying my vacations. Of course, I had a hard deadline on the remote server thing. I did not manage to finish that POC on time, but here is what I've done before breaking down.

Clean all the things!

I must confess that I'm physically unable to hack dirty code. I really see this as a drawback in my professional personality as it often prevents me from being pragmatic (as this whole story proves), and as a result I often fall behind the planed schedule. This time again, I was unable to just what I had to in SimGrid and felt forced to do some long-awaited internal cleanups on my way.

The NS3 support of SimGrid should now compile out of the box, the broken support to GTNetS is gone (I had no need of NS3 nor GTNetS for this project), the C version of tesh is gone (we use the perl version since a while). The most outrageous parts of Cmake files and the documentation were updated. It drives me crazy that every refactoring seem justify 2 other cleanups. Some tesh-related portability bits remain in XBT. The GTNetS examples remain in the code until we port it to NS3. Our Cmake usage remains incredibly painful. in-cre-di-bly PAIN-FUL.

And more important, I'm far from done with the whole project architecture.

Since its inception, SimGrid is somehow object oriented, but in pure C. Most function take a pointer to structure as first parameter, that serves as a this. Inheritance was implemented through the inclusion of structures in structures. It was kinda working, but so intricate that only 2 or 3 people could understand and modify it. So we decided to move to C++ for sake of evolution and maintenance. The SURF module, in charge of the network, CPU and disk models, were converted to C++ last year.

But that conversion is still messy and somehow ongoing. That's because we had very specific settings that are uneasy to port to classical things. For example, the SimGrid hosts are stored in an associative map called xbt_lib, and each element (each host) is a tuple. Each module of SimGrid stores the data it wants in its part of the tuple. Surf stores the resource modeling information, Simix stores the list of processes, Java stores the Java object representing the host, SMPI stores the private host mailbox, and so on. Of course, we implemented the corresponding data containers ourselves, as every C project.

When you ask for something to happen from the Java layer, it goes down and each layer retrieves the information it needs in the tuple constituting the host. More specifically, the Task.execute from Java is implemented as the native Java_org_simgrid_msg_Task_execute(). It calls MSG_task_execute() that calls MSG_parallel_task_execute() (the generic case), which calls simcall_process_parallel_execute(). On the other side of the simcall barrier, SIMIX_process_parallel_execute() gets called and the simulated process waits for the completion of the created action. SIMIX_process_parallel_execute() calls surf_host_model_execute_parallel_task() which calls (in C++) host_model->executeParallelTask(), with host_model being a global. Unless you use the ptask host model, this calls host->p_cpufield->execute() which actually creates the action on which the simulated process can block. Yes, this is a mess, and these 8 call levels (actually a bit more thanks to the simcall barrier and the C++ object hierarchy in surf) could easily be replaced with 3 calls only: user mode, simix and surf.

Things improved a bit. For example, we now request G++ 4.7 because C++ before C++11 is ways too prehistorical to me. Not being able to initialize the fields from their declaration is not bearable to me.

But what drove me crazy was that surf++ introduce another data container for the hosts, duplicating the xbt_lib story. The representative of the CPU model is both in the xbt_lib tuple and as a field of the surf++ host. I tried to merge both data structures for a long time, but I did not manage. Parallel tasks and VM build some stuff upon surf++ hosts, and I don't manage to sort things out correctly.

Dozens of times, I tried to implement something, but when I ran the tests, they showed some forgotten corner cases, and had to revert all the changes. Sometimes, I managed to save some of my attempted changes, or to redo them after another preliminary cleanups. But without the testing infrastructure, I could not have been able to get anywhere. I love tests. I need tests. I want more tests.

Run all the tests!

We already had a comprehensive testing infrastructure, but we had to clean and refresh it with Gabriel. For example, our Ubuntu tester was based on Pengolin so g++ 4.7 was not available. At the end, we erased and reconfigured everything.

On CI@Inria, we now test both regular and MC builds on every possible Unix systems: Debian (wheezy, 32/64 bits), Ubuntu (trusty, 32/64), Fedora (v18/v20, 64bits), Mac OSX (64bits Maverick) and FreeBSD (v9, 64 bits). In each cases, run all our tests (542 integration tests, and over 10,000 unit tests). The integration tests are complete simulations of varying complexity that test different parts of the framework under several options (with pthreads, with our hand-written contexts, with ucontext; with each of the network models). In each case, we check that the output is exactly the expected one. Since these outputs contain the computed events' timestamps, we know that we will notice if our models diverge.

In addition, we test SMPI for standard compliance with the MPICH3 integration test suite. Since SMPI is still missing some features, we can only activate 239 tests out of 500. So SMPI is not a complete implementation of MPI, but our fairly decent amount of features is thoroughly tested. We are also in the process of integrating the ISP test suite. That would ensure that our model-checker for MPI applications can run at least the same tests than ISP. That would be good because in addition, we can check liveness properties while ISP is limited by design to safety properties only.

So that's a huge amount of tests that are run on each commit, but that's what we came up with to ensure that SimGrid will not fail on our users, and that the work of one of us will not break the work of the other ones. The drawback is that one CI@Inria run takes about half an hour, making it rather unpleasant to debug.

So we also setup a lighter Travis compilation project that tests only regular (non-MC) on Ubuntu 64 bits only, without the MPICH3 and ISP test suites. It usually answers in a matter of minutes.

Getting everything to work on all these architectures and testing condition was not easy. I tried to still limit the amount of commits by rewriting the commits and forcefully pushing to the remote servers when the daemons didn't like my previous attempt. What I'm trying to say here is that admittedly, such portability work generates a large amount of commits, but I did my best to not exaggerate the numbers.

The big OS missing from that picture is Windows. Releasing without Windows would be a pity as some of our users do use it. Most of them happen to use the Java bindings, so we need the whole stack on that operating system, not only our C core.

We tried to setup a windows compilation slave on CI@Inria, in vain (it freezes while scanning the dependencies before the actual compilation). We tried to setup an AppVeyor project, which is similar to Travis but targeting the windows world. We failed too. Right now, the mkdir, cat and cp tools that we use to test our testing infrastructure do segfault, and all of the programs that we compile ourselves fail to start. We get a 5 minutes timeout even of programs that usually take 0.02 sec.

If someone can help with Windows portability, that would be really welcome. Like I said, we would really like to have SimGrid working on Windows (in particular the Java bindings), but we feel like unwelcome strangers in a windows world.

At the end, I decided to postpone the release until September. I think that any mail sent during the summer will get scavenged by people struggling with their inbox after vacation. At least, we are (almost) there so it will be easy to release during the first week of September.

Pave all the way to SimGrid 4!

If you remember from the introduction, I planed to work on the next user interface of SimGrid (before exporting it over REST for remote uses). This interface, that will be merged part of SG 4, is nicknamed S4U: Simgrid For You.

So far, it's only in a separate git branch to not mess with SG 3.12 (that is kinda waiting for windows and ISP to stabilize). The goal is that that C++ interface becomes the next user interface of choice. The Java bindings will be built automatically from it using Swig (the current binding is completely manual, which is a pain to write and maintain). So S4U offers the features that the Java interface should offer.

The first class was s4u::Engine, to start a simulation and load the configuration files. Then, s4u::Host allows to retrieve a given host and provide the basic interactions with it (get speed, get consumed energy etc). Since I failed to merge the xbt_lib and Surf++ visions of the SimGrid host, each s4u::Host object has an inferior, that is the MSG/smx corresponding host. I was sad to add a third layer of organization just because I didn't manage to merge the two existing ones, but the clock was ticking and I really wanted to proceed. I think that I will continue to clean the internals and eventually merge everything.

Then came the expected s4u::Process, that are also implemented using an inferior from simix for now. I think that I will rename that to s4u::Actor because that's how I usually define the MSG processes: they are actors that run in the simulated world. I just have to read a bit about Scala actors to check that they are actually similar to what I intend.

The s4u::Channel replace the MSG_mailbox. They serve the same purpose, inspired from the Linda communication checkboard. That's only a rendez-vous point where a sender and a receiver can meet up. The name of mailbox was confusing the users, who often thought that they HAD to use host names as mailbox names. This is not the case: You can use any name that you want. It's just a facility to match a send action and a receive action.

I will not port the MSG tasks to S4U. They were perfectly adapted to the kind of studies that Arnaud et Al. conducted 15 years ago, but it is often troublesome for our users. Instead, s4u processes have directly a send() and a recv(), and they are already implemented.

The next step will be to provide a nice coherent interface to the non-blocking actions. s4u::Async will represent a non-blocking action on which you can block. They are called synchro in simix, but saying that communications are just particular synchronization objects was disturbing to the users. I guess that saying that trying to acquire a mutex is an asynchronous action on which you can block will be more natural to users.

I will also provide a consistent API for Async. The current send() is equivalent to send_async() + wait(), so send_async() creates an asynchronous send on which you can block, waiting for its completion (s4u::Send will derivate s4u::Async). Instead of having a dozen of functions send_async() with the various amount of optional parameters, I will introduce a send_init() function that creates a send asynchronous action, but does not start it. Then you'll have to use some setters to fill your options, and finally s4u::Async::start() will let the action begin. So actually, send() is equivalent to send_init() + ::setThis() + ::setThat() + ::start() + ::wait(). Of course, the same kind of interface will be provided to receive data, but also to execute stuff (exec_async() is sometimes useful to simulate the effect of a thread without creating a specific process) or to use mutex and semaphores.

Once this interface is in place (in the far future), I will seriously consider reimplementing MSG and SMPI on top of it, and do the profound cleanups made possible in Simix and below.

Implement all the Remote SimGrid!

With my working client/server data exchange implemented in S4U, I had enough to work on my Remote SimGrid thing. I first created the github project and wrote a nice README that explains the goal of the project. I'm a bit ashamed: this project is somewhat over sold from its documentation :)

Then, I setup a rsg binary that is the REST-like server which will run a Rsg simulation containing real programs. If you want to run the samba server on top of Rsg, you'll replace the code that interacts with TCP by some code using RSG (Remote SimGrid), which interface will closely follow the S4U interface. Of course you will not change that code all the time, but only when you want to run samba on top of SimGrid. You can do that with #ifdef or any other solution that you see fit to your project.

Then, you will write a deployment file specifying which instance of smb you want to run on which (simulated) host, and with which parameters. The rsg server will open a master socket, parse that file and fork+exec every listed process. When the smb process starts communicating, your code using RSG will be used instead of the classical TCP one. The RSG library will read from the environment variable the port on which it should connect back to the rsg server, and every interaction with the simulation kernel will be conveyed over the network.

So that's the theory. In practice, nothing from RSG is written and there is absolutely no protocol. All you can do is to start a "echo Hello | nc 127.0.0.1 8888" from the deployment file, and see that rsg manages to start it and see the message on its socket. That's a bit short, but I'm very close to a nice POC.

With the addition of a little send/recv exchange, I would have proved that the planed synchronization between the simulated world and the real sockets actually work. The first time that Gabriel told me that nothing specific was needed, and that the real sockets could be served by the SimGrid processes only when unlocked by maestro, I though that this would lead to deadlocks and madness. But I'm not convinced that Gabriel was right. As usual...

Do all the thing?

So you got it already. I underestimated the difficulty and dramatically increased the amount of things to do for that meeting. So at the end, I started everything and finished nothing. I do that all the time :(

I'm not sure of what I'll do next. Now that the hard deadline is missed anyway, I plan to enjoy a bit of vacations (I just write this as a brain dump), but I'm not sure of what will be the next step. I could finish the integration of the ISP test suite, the implementation of S4U, or the implementation of RSG (it would be useful even with only blocking send/recv that are currently in S4U).

Well we will see. I now HAVE TO enjoy the sea. My sons are calling to build a nice castle, and I feel I have to obey that call after that crazy week. That's me, in the binge coding mental state. Shame, shame.


(read the follow up of this article here: Remote-Simgrid working prototype)