This article is part of a serie about my recent work on SimGrid:


I wanted to blog every two weeks, and a month is gone already! Things did not exactly went as expected. I'm still initializing the development of SimGrid 4, the next major version. It will take some time, but I want to make it right, so that we can have 10 more years with that software.

I did some source code cleanups in August, but this month went even deeper: I cleaned the building infrastructure. Most notably, I spent an inordinate amount of time to get SimGrid building on Windows. This was important to me because there is quite a lot of code for that platform (both in C and in cmake), and I was unable to change it unless I got a working testing environment. A code that you are unable to change is a code that you cannot let evolve, and soon a dead code.

I'm here now: the Java bindings of SimGrid are usable on Windows. But that wasn't easy. Here is how I did it.

Travis on Mac OS X: Win

This month, Travis decided to allow Mac OS X compilations to free and open source projects. That worked rather nicely for me, so we now have fast portability tests on Travis (Ubuntu and Mac only, regular builds), and long and complete ones on ci.inria (Ubuntu, Debian, Fedora, FreeBSD, Mac; regular and MC builds; testing also the make dist). Cool it works.

Installing a windows slave: Failure

My first try was to attempt the installation of a windows node on http://ci.inria.fr Even with the precious help of Gabriel, we did not manage to make it work. We gave a spin to MinGW, in vain. Then we discovered that this project is long dead (without a notice on its webpage) and that you should use MinGW-w64, a still maintained fork that targets both Win64 and Win32 (despite the name).

That was also a FAIL with MinGW-w64: cmake runs, but the system seems to freeze when scanning the dependencies of the first compilation target. We were suspecting that something was wrong with the installation on the node, so I decided to move to another solution where I'm not the one in charge of installing the node.

MSVC on AppVeyor: EPIC FAILURE

AppVeyor is a nice continuous integration solution for Windows. And the best part: it's free for open source projects. You register to the site, you write a little appveyor.yml file at the root of your project, you push to github, and you're set. The Cloud tries to compile your project under Windows. Very cool.

But the default environment is MSVC, Microsoft Visual C++. There were at least 3 checks in SimGrid refusing to compile with MSVC, but since I failed to use MinGW on appveyor, I disabled these tests and proceeded anyway. And now, I have very good arguments when I say that windows and its main compiler are brain dead systems.

System modularity and headers

The system headers are not self-contained. You cannot load only windef.h, for example, because it misses some of the declarations in uses. As a result, you can only load the whole windows.h. Extra complication: you get all windows symbols everywhere. I had to rename _THROW because they use such a classical name (SimGrid was not more legitimate to pollute the namespace that way, but at least we are less used), and also send() in a code that was not using the network at all. But I was forced to load the whole windows.h in that file too.

Macros and parser

There is a macro to detect if you are in 64-bits mode, it's called _WIN64. But the macro _WIN32 is always declared on windows, regardless of whether you are currently in 32 bits or 64 bits.

You can detect the version of MSVC that you are using with the _MSC_VER macro. For example, if its value is between 1900 and 2000, then you are using Microsoft Visual C++ 2015. Easy to guess.

MSVC accepts variadic macros, but __VA_ARGS__ is seen as a single macro argument when you try to pass your arguments from macro to macro. At the end, I ended up manually expending some of our macros to never pass __VA_ARGS__ to an helper macro but directly doing the work in the first macro that gets the arguments. Evil code duplication :(

In our C++ code, we are commenting the arguments that we think we are not using, as follows:

function(int /* ignored */)
function(char */* ignored */)

Both are ok in C++, but MSVC spits the following error message on the second. So I had to add a space to circumvent that stupid MSVC parser.

warning C4138: '*/' found outside of comment

Standard compliance

Windows is not said to be POSIX, we are warned. The funny part is that even when you use a POSIX function that is supported such as strdup, you get the following error message. I like the way they try to pass the message that POSIX is deprecated :)

warning C4996: '_strdup': The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name: _strdup. 

Some functions have funny names (strcmpcase is called stricase), but some others are completely missing such as the ones related to path names (basename, getcwd) and I had to reimplement them. getcwd is not trivial to implement, as they happen to have one separate current directory per drive. That's a Windows feature, not a bug :)

Since unistd.h does not exist on Windows, you also have to clobber your source to load it only when it exists. This is particularly uneasy for flex-generated files. Another solution would have been to manually define your unistd.h for windows. I did not do that (yet), but I like the idea.

The good point is that I had to convert some calls to __sync_fetch_and_add into std::atomic calls. That's a real improvement to the code readability. Maybe the only one of this porting battle...

Dynamic linker madness

The worst part of Windows BY FAR is the dynamic linker. It is so sick that it is not safe to access a STL object that was created in another DLL. So sick! I cannot believe that they write "This behavior is by design" on that page!

At the end, I did not manage to get MSVC to edit the links of the simgrid library. It whines on some symbols being missing, despite the fact that it just compiled them in another file.

MinGW-w64 on appveyor: Epic Win

And then I discovered MSYS2 in the appveyor update of 9/13. So I gave the free tools another try, and after some frustration, I got the Java bindings of SimGrid (and SimGrid itself of course) compiling with MinGW-w64 on AppVeyor.

That was to discover that our test suites did not work at all on Windows. SimGrid have its own TEsting SHell (called tesh) to ensure that our 400+ integration tests have exactly the same output (including the timings computed by our models) over the time. Tesh is written in Perl because it's presumably portable, but it was using many Posix constructs, so I had to rewrite it with the IPC::Run module. The result is much cleaner, and I think that this is a win for SimGrid, somehow.

The tests failed even with my newly rewritten tesh, and the message was as informative as

^CTerminate batch job (Y/N)?

I failed to fix the issue, which seem linked to the fact that appveyor keeps the stdin of my processes open, leading cmd.exe to think that it can freely speak with me this way. But still. I should not get ant SIGINT. I tried to call Tesh directly instead of through ctest, and the "Terminate batch job" errors are gone. So SimGrid works on Windows despite the cmake friends, I like this idea.

Finally, the last problem was a segfault in the portability layer over threads. The Java bindings actually create system threads and register them to the JVM using AttachCurrentThread(). That's much faster, but it requires a portability layer using Windows threads instead of pthread on need. It turns out that I did an optimization a few month/year ago that induced moving some code from the common part to the pthread part, and I forgot to do the same in the windows part.

This one was a beast. I think I would never managed to track it down without the help of Guillaume Turchini, that provided me a nice backtrace of the issue. That's easier than working remotely with build daemons located somewhere in the cloud...

But at the end, here I am. SimGrid works on Windows again (at least its Java bindings). I can now release and proceed with the code cleanups without fearing to break untested code.

That's useless, but it fills me with joy.