My research puts a particular emphasis on experimentation methodologies for distributed applications and algorithms. To this end, I use several approaches, such as direct execution on Experimental Facilities, Emulation, Simulation, and Formal Methods. I also bridge these approaches where possible.
Simulation with SimGrid
This project aims at constituting a simulation framework allowing the researcher on distributed application to test their algorithm in a scientific way. This main goal encompasses several objectives. First, we strive at solving methodological issues raised by simulation, so that the users can rely on the tool and focus on their algorithmic difficulties without even thinking about the methodology. For that, an intense effort were put into validating the results of the simulation kernel against what happens in the reality. It is still an active research topic in the SONGS project (that I lead), funded by the ANR french agency. Newer research included in this project is about addressing methodological issues around the simulation through campaign management to decide what is the right amount of experiments, and which they are, to answer a given question, or about the result analysis and visualization to help users avoiding visualizations artifacts. Another crucial aspect to ensure that this tool helps the scientists in their day-to-day work is its intrinsic usability. That is why the simulation kernel is highly optimized both for speed and memory. Also, we never change the interface of a published function so that code written earlier remains runnable, thus easing the comparison with existing work. These work and results explain why SimGrid is one of the most used simulation kernel in the grid algorithm community.
More details can be found on my page dedicated to SimGrid, or on Simgrid web page
Formal Assessment of Distributed Applications
Since a few years, I'm working on the use of model-checking to assess the validity of distributed applications. This naturally comes in addition to my previously existing work on the topic through the methodologies of direct execution on Grid'5000 and of simulation through SimGrid. This work, although still preliminary at this point, lead me to add basic model-checking abilities to SimGrid to give to the users the power of formal approaches to assess some points of their solutions by exploring every possible execution path of the algorithms. The main difficulty of this methodology is its inherent combinatorial explosion. This was reduced in our prototype by the use of state of the art dynamic partial order reduction (DPOR). We are now at the point where this approach becomes viable in practice to SimGrid users, and these changes were recently integrated into the tool. The way is however still long on this path since model-checking is mainly destined at finding incorrect behaviors such as deadlocks or race conditions where most SimGrid users are more interested in the performance of their applications. Achieving this through formal methods seems really difficult at this point since the time is usually not defined or only discrete in model checking. Nevertheless, this thematic mobility is proving enriching by giving another light (through semantic considerations) on issues such as parallel simulation, collective communication primitives or the comparison of differing execution traces. This will certainly lead to original solutions to these difficult issues in the future.
Check also this blog post: Model-checking efforts in SimGrid
Past Projects
Experimental Facilities
The most natural way to experiment and evaluate distributed algorithms and applications is through direct execution on experimental facilities, such as Grid'5000. This is a scientific instrument for the experimentation of distributed applications. Specifically designed to ease the experimentation, it gives a strong control over the experimental settings and allows to test every layer of the system, from the application down to the operating system. You can boot your own version of Linux onto 2000 nodes in less than one hour.
User of the instrument since its beginning in 2002, I was the Grid'5000 site in my lab for a few years, in charge of scientific animation through the EDGE project. Lucas Nussbaum took over this responsability a few years ago.
Ultra Scalable Simulation with SimGrid
USS SimGrid is a project granted by the ANR (2009-2011) of which I am PI. The goal is to push the tool scalability limits so that it becomes usable in P2P research.
Fast Application System Timer (details)
FAST is a library gathering information on the application needs and systems availabilities to help the network-aware applications to take appropriate decisions. This code was developped during my PhD, and then integrated in the DIET grid enabled Problem Solving Environment. The development is now discontinued.