For two days, we had a plenary meeting of the SONGS project. These meetings are always impressive because there is so much people working on SimGrid that we have issues listening to everyone. To me, that's a bit like a birthday party where all my friends come and speak of my pet project.
From Nancy, Marion gave the last news on the model-checking front. We are very close to the point where we can check liveness properties on unmodified SimGrid programs. We are now only blocked by some bugs in SimGrid itself, triggered only in very rare cases in simulation (such as canceling a communication exactly when it ends), and that the model checker don't miss. I did a very short presentation of our current work with Cristian to distribute the simulation. It is less mature, but with Paul Bedaride working on this full time now, I'm still sure that it will land in the main releases before Christmas. This lead to interesting discussion with Olivier Dalle on how the simulation should be distributed and parallelized. Finally, Lucas Nussbaum gave a short presentation on our work about Maximiliano's work on merging the experimental methodologies. I'm still a bit unsure of where this will lead, but someone has to do it to detect and fix the glitches.
From Grenoble, Augustin Degomme presented the recent advances on SMPI. That's pleasant to see that the engineering aspect is now almost solved: we pass all compatibility tests provided by MPICH. Some applications still resist our source-to-source modifications though. Augustin presented some work of the BigSim on that issue, and none of the proposed solution is perfect. Source-to-source is not robust; modifying the binary can highly degrade the code performance, endangering the simulation soundness; Automatically changing the .DATA segment to mark it as TLS (so that the variable become thread-specific) is not sufficient as the assembly code to access such a variable is not the same, so you have to complete it with source-to-source modification or binary-rewriting. Finally, memcopying the .BSS segment around on context switches sounds like a performance killer, and prevent the user processes from running in parallel. I guess that we'll have to come up with an execution mode like "locally distributed" where all MPI process still live in their own UNIX process, interacting with maestro that lives in another process, but with all processes on the same host and interacting through shared memory for speed. Let's see if the crazy macros currently written by Paul within simix can deliver such beast. Also, we discussed with Abdou on how to develop a Fortran 99 binding of SMPI to allow the simulation of the Mumps solver. That'd be huge!
Also from Grenoble, Lucas Schnorr presented their latest results on the visualization front. Viva is getting more and more mature with time, that's pleasing. This time, they implemented an immensely useful feature, where the tool can automatically decide which part of your platform must be aggregated depending on the entropy of your measurements. These guys really rule.
Jonathan gave us a presentation (through skype since he was still in Salt Lake for a meeting) on his work. He modeled the whole amazon cloud environment on top of the simple APIs I added in 3.8. This is really impressive, he did an incredible work. We now have to work on the integration of this work in the framework, and we'll reach the dream of a sound simulation framework of Cloud settings. I mean, something that is NOT comparable to CloudSim
From Bordeaux, we had several presentations on what do people want to simulate about P2P systems and what they need for that. Very good food for thoughts. We also had many potential users that are not part of the project yet from the HPC community. The discussions are enlightening, and very practical too. I went to anyone, asking what should be added to the framework to please them. I got a consequent TODO list, but some of the points are rather simple, and I already did some of them in the next day. I love this way of moving forward, with practical feedback and real needs.
Also, we had a long discussion on how to improve the documentation. I'm getting a bit tired of that, I must confess. My feeling is that we need to provide what motivated users need to get into the framework while some colleagues are very (very) eager to get many users, no matter how cleaver they are. For that we'd need to write tons of dumb-proof documentation, something I'm not sure to have the time to do. I have the feeling that I can either help motivated and interesting users to go further, or try to save students that are not willing to learn anyway. At some point, someone noticed that the main page of the documentation still mentions GRAS while it was deprecated, what constitute a clear pitfall to newcomers. So I went amok and killed GRAS. I wanted to do so since a long time, and this remark was too much for my nerves. RIP GRAS, any reference to this project disappeared from our code base and documentation.. As usual, Arnaud Giersch was the perfect hacker to help me achieving this tedious transition without breaking the rest of SimGrid. I want to work more with this brilliant guy.
Voilà. This is only the most prominent elements of the 3 days meeting, but this blog post is too long already. I really love the way this SONGS project goes, and I'm confident in the fact that we will deliver great results. Actually, I get the feeling that we will finish what we promised within 2 years of the project life time, and then use the second half to do really great things with that framework!