Remote-SimGrid now with ProtoBuf and in Java

This article is part of a serie about my recent work on SimGrid:

Binge coding in SimGrid
Remote-Simgrid working prototype
Remote-SimGrid now with ProtoBuf and in Java
SimGrid is back on Windows

Two more weeks are gone already, so it's time for a little status update. I tried to do as much as I could before the next year, but it was not easy: I'm moving to a new position (full professor) in Rennes, 700 km from my previous position in Nancy. That's rather time consuming too. I think it's the last report of this track. The lectures will begin very soon, and I still have to prepare many things. Anyway. Here is what I managed before the new term.

In SimGrid

I just sorted a bit the cmake files, which really need it. That's a really unpleasant task, but there is so much cruft in there that it should be done. Having a good testing infrastructure is mandatory. At some point, I broke travis because of a change that seems to come down to:

-if (CMAKE_COMPILER_IS_GNUCC)
+if(${CMAKE_C_COMPILER_ID} STREQUAL "GNU")

It could have been because Clang used to pretend that it was GCC, until recent CMake (2.8.2) were instructed about the fraud. But it wasn't the reason since changing that test was not enough to get it compiling on Travis. At the end, I failed to understand the breakage and just killed that piece of cmake. It was intended to work around a bug in gcc 4.9 that required to link with gcc-ar instead of ar when using LTO (link time optimization). The build daemons are all green now so I guess that it became useless at some point (I do use gcc 4.9.3 on my box)...

Such story just reinforce my profund hate for cmake. Portability is already a true nightmare, and cmake just makes it worse. I loved the autotools much more and was deeply sorry that we had to move to become buildable on windows. The autotools had a strong philosophy, so you could guess how a given thing must be done in the framework. Cmake is just a piece of junk elements packed together.

Enough ranting, I won't move back to autotools nowadays, we have to sort that mess out. We should kill every bit of the cmake config that is not mandatory for one of the configured build daemon. Too bad that we don't have a working windows builder: I cannot actually do it.

In S4U, akka the future SimGrid4

I implemented files and storages in S4U. I'm not very happy neither with the internals of the storage handling nor with the S4U interface, so that's probably still a work in progress.

Declaring the storages in XML is rather messy: You have to declare a storage model (with its performance), then create a particular storage and then attach it to a given host, and finally mount it on a given host. If it's attached to the host where it's mounted, that's a local disk. If not, that's a remote disk (think of NFS or smb).

<storage_type id="single_HDD" model="linear_no_lat" 
              content="content/storage_content.txt" size="500GiB"
              content_type="txt_unix">
   <model_prop id="Bwrite" value="30MBps" />
   <model_prop id="Bread" value="100MBps" />
   <model_prop id="Bconnection" value="120MBps" />
</storage_type>
...
<storage id="Disk3" typeId="single_HDD" attach="carl" />
...
<host id="carl" power="1Gf">
  <mount storageId="Disk3" name="/home"/>
</host>

That's messy, but sorting the XML is always a pain. Our evil plan is to deprecate the XML interface in flavor of a lua interface instead. Christian is working on this, and that's also planned for SimGrid4.

The API is also complicated. We keep track of which files are located on which storage to detect space shortages. We have some code that does the same for the VMs on PMs, and I almost decided that it was the user layer responsibility to keep track of this for the VMs. I guess that we should factorize this code for both cases, separating that unpleasant and verbose bookkeeping to an external entity while surf concentrates on the actual performance actions (read/write for files, advanced CPU contention for VMs). But not this time...

Another complexity is that the storages are implicit when using files. In the above example, we declare that disk3 is mounted on /home, so every file created under /home will get on that disk. So open() has only one path parameter and does the resolution transparently. It would be simpler to make the storage thing explicit, I think. Agreed, you don't have to specify your mount point when you create a file on Linux, but that would remove the need of mounting disks in the XML description. I will discuss that point with Fred to see if I misunderstood something.

The implementation is also rather messy, for the usual reason. You don't know what's in MSG, what's in libsmx and what's in Surf. The file name bookkeeping is in Surf, where it clearly does not belong. But I have no idea of where to move it, actually. I slowly get the feeling that splitting the code in layers as we do is not right. We could have a single class Host, with all layers mixed into it. But then, how could we split between user mode code and kernel mode code in a single class? I hope that things will become clear after converting simix to C++...

In Remote SimGrid

Protobuf

Writing such blog posts is rather time consuming and sometimes feels a bit vain. But shortly after my previous post, Lucas told me that my json approach was not optimal and that I should have used ProtoBuf instead. As I was not motivated to rewrite all my code, I first answered that I wanted a pure Java implementation too, without JNI. But there is such an implementation of Protobuf.

Then, I argued that I was not interested in performance yet (which is true) and that ProtoBuff would have make things complicated for nothing. But then, I lost half an hour chasing a segfault where I passed an object pointer to my communication library which were expecting a string, without any compiler warning. And then I realized that Protobuf would generate large parts of the Java code that I would have to write. So I decided to rewrite most of RSG and switch to ProtoBuf.

I first looked to gRPC, the RPC facility on top of ProtoBuf. It's still unstable and relies on the alpha release of Protobuf v3. After downloading 1.6Gb of dependencies and compiling 250Mb of binaries, I realized that it's not adapted to RSG where I need to have each SimGrid thread acting as a little server to get the commands sent by the remote process that they serve. But gRPC only allows to have one central server accepting from the network, and dispatching to the right thread afterward. That synchronization within the RSG server would have been very complex, so I scratched gRPC, all my POC code and started over with the raw ProtoBuf (v2, non-alpha).

Converting my little communication library to ProtoBuf was rather easy. I have a proto file which declares 2 messages (Request and Answer), plus an enum encoding the kind of request (sleep, exec, send, etc). Encoding the data using ProtoBuf instead of my lib also went well, and it do raise compilation errors when using the wrong type. Cool.

But sending the data over the wire was a major failure. It turns out that ProtoBuf is unable to properly frame its data. By default, it wants to consume all available data before starting to parse. When instructed to parse from a socket, it reads until the socket gets closed on the other side. Bummer. In Java, you have a "delimited protocol", where you can post several messages on the same wire and read them separately. But unfortunately, that's not ported to C++ because the main Protobuf author left Google before merging that change!

You have to send the encoded size of the message before the message so that the receiver can set a size limit to the protobuf parser. I tried to adapt the code provided in the above link, but failed. I'm not fluent enough in C++, which I never used before that summer so I could not convert from boost::asio to my good old sockets (and was not in the mood of switching). So instead, I read to a memory buffer (first the size then the data), and parse from that buffer that Protobuf happily parses until its end.

I did not manage to get my C++ version to work with the classical Java version, because they use so called variable length integers on the wire. In my case, most message sizes are smaller than 128 so they fit in the first byte of the Varint. But parsing it would have required to do one read() syscall per byte encoding the length (yurk), or to read 4 bytes and then reinject the unused ones (I failed), or something else. I resigned myself to send the size as 4 bytes, wasting 3 bytes per message, and to reimplement that inefficient data framing schema in Java. That's pity, but at least it works. Now that I live in a rainy area, I may get a week-end to fix that in the future

Having to frame myself the Protobuf data drove me nuts actually. I can't understand how do other people use that library. I guess that they use zeroMQ for the communication instead of the good old sockets? Or maybe HTTP? Using HTTP to send wire efficient data seems sick to me, but you never know: some people actually like javascript nowadays. I feel particularly angry against Google, that could integrate that piece of code. The weather is definitely not rainy enough in Mountain View, they should move to Redmond instead. They already have the right mindset anyway.

But comments and feedback as done by Lucas make it more interesting to blog about my ongoing research. Thanks. Please drop me an email to comment on this!

Threads

So I got a working prototype of RSG, with both C++ and Java interacting nicely. I handed it to the intern working at Octo with a large smile. But when he tried to inject RSG in Cassandra (with Aspect-J), it failed miserably. Cassandra starts, the injection works and RSG starts communicating with the server. But as every Java application, Cassandra starts a large bunch of threads, while RSG does not expect it.

When 2 threads send a request to the RSG server, their requests are queued and they both wait for an answer. But it may well be the case that the first answer is received by the second thread, as they are both blocked on a recv() on the same socket. So we have to make the RSG socket thread-specific on the client-side, open a new socket for each user thread that tries to use the RSG features that require a communication to the server, and start a new simgrid thread for each incoming thread.

It does not seem very complex, as accept() can be made non-blocking. So the server can try to accept from its master socket regularly, and spawn a simgrid thread when it gets something. On the client side, making the socket thread-specific seems very feasible too.

The main problem is that the summer is over. I cannot implement this right now, so I added many comments to the RSG code to not forget. I should have switched to the "preparing lectures" a few weeks ago already, and cannot delay it any more. I think that if I manage to do a status update in two weeks, it will be on PLM and other teaching related points than SimGrid.

We will see in time. Have a good one in the meanwhile.