During a talk on SimGrid at Louvain last November, I was asked whether SimGrid was scalable enough to simulate the whole Internet. I suspect that the expected answer was "no" but instead I challenged my interlocutor back by asking for a description of the whole internet Indeed, even if SimGrid were able to simulate this system, we have not enough information to actually instanciate this simulation.
That is why I was very interested in this article, where the author explains how he built and ran a botnet of 150,000 machines to scan several times per hour absolutely all IPv4 addresses of the Internet for monthes. The security "breach" he used is horrific, and the procedure is certainly on the border line of ethic, but the data he collected are amazing.
He provides 68 million records of traceroute, collected during three weeks of last summer (18Gb of data once unpacked). That's incredible: this could help us building a representation of the graph of the Internet. Unfortunately, there is not enough information to build a very interesting representation of the internet; I wish I had pathchar traces instead, but anyway. That would still fun to have.
And now the initial question strikes back: is SimGrid scalable enough to simulate the whole Internet? Hmm, challenge considered...