--------------------------------------------------------------
--- For your convenience, this form can be processed by EasyChair
--- automatically. You can fill out this form offline and then
--- upload it to EasyChair. Several review forms can be uploaded
--- simultaneously. You can modify your reviews as many times as
--- you want.
--- When filling out the review form please mind
--- the following rules:
--- (1) Lines beginning with --- are comments. EasyChair will
---     ignore them. Do not start lines in your review with ---
---     as they will be ignored. You can add comments to the
---     review form or remove them
--- (2) Lines beginning with *** are used by EasyChair. Do not
---     remove or modify these lines or the review will become
---     unusable and will be rejected by EasyChair
--------------------------------------------------------------
*** REVIEW FORM ID: 762221::392010
*** SUBMISSION NUMBER: 7
*** TITLE: A Framework for distributing Agent-based simulations: D-MASON
*** AUTHORS: Gennaro Cordasco, Rosario De Chiara, Ada Mancuso, Dario Mazzeo, Vittorio Scarano and Carmine Spagnuolo
--------------------------------------------------------------
*** REVIEW:
---  Please provide a detailed review, including justification for
---  your scores. This review will be sent to the authors unless
---  the PC chairs decide not to do so. This field is required.

This paper presents a new tool called D-Masson. Its purpose is to run in a
distributed fashion agent based simulation. Disclamer: I am quite new to this
kind of simulation, but I'm quite an expert in high performance simulation of
discret-event systems.

The paper is fairly good and presents an interesting piece of work. As with
most tool papers, the article by itself is not sufficient, but along with the
tool itself, it should constitute a very decent contribution. Too bad that the
tool is not available to download right now (only its documentation seems to
be). This made me change my overall status from "accept" to "weak accept".

Strong points of the article
============================

In my POV, it constitutes a good report of non specialists of HPC in the
development of a software targeting high performance (sorry if I'm wrong, no
offense intended ;). The literature of high performance simulation is very
decently presented. The crucial point of simulation (correctness and
reproducibility before performance) is betterly reflected than in most other
workshop papers presenting a tool I was given to read and review in the past.

Weakness of the article and possible improvements
=================================================

The grammar could be improved (sorry to say so with my own broken english). For
example, performance never takes an s in english and "improvemente" sounds more
italian than english to me ;)

The discussion of [5] seems more than strange to me. I seem to understand that
[5] is a framework to allow specialists of diverse communities to collaborate
toward the establishment of simulation of the human socities (I may be wrong
since I never encountered this before). This seems quite unrelated to the work
presented here. To say it another way, AFAIK, D-masson is about tactic of
running each simulation in a distributed settings while [5] is about
macro-economic strategy, about allowing collaboration between people not used
to it, and working in a distributed setting.

In any case, the authors state that their work is superior to [5], which
mandate heterogeneous resources to run. As HPC specialist, I fear a profund
misunderstanding from the authors since homogeneity is just a special case of
heterogeneity. This is even more astonishing when 2 paragraphs below, the
authors says that their target platform is heterogeneous by nature. I must have
misunderstood something here, please improve the explanation.

The main issue in the presented solution is about load balancing, I'd say. The
authors propose to split the simulated world into a grid, and dispatch regions
on workers. They also state that they have a horizontal world dispatch
algorithm, but it is not used in the evaluation. That's a pity because the
major issue of grid-like domain repartition is that they enforce power of two
hosts to work well, while the authors seem to have only 7 high ends machines at
hand. With horizontal domain repartition, that would have gone unoticed while
here, you struggle with your repartition 4,4,4,4,3,3,3 of the 25 regions over
the 7 hosts. It certainly hits your preformance. Also, adding some level of
dynamicity on horizontal dispatch seems far more feasible than with grid one
(althrough some fancy grid dispatch exist for the heterogeneous case. There
must be something like this in Parallel Algorithms, Henri Casanova, Arnaud
Legrand and Yves Robert, Chapman and Hall/CRC Press 2008 for example).

I don't see any performance hit that may due to this representation in your
case. Usually (ie, for matrix multiplication and similar algorithms), 1D splits
are bad because the communication costs vary with the perimeter of the regions
while the computation vary with their surface. Then, 1D induces larger
perimeters for the same surfaces. In your case, the communication don't seem to
be a performance cruncher. It's badly characterized in the article, and the
evaluation is only very vague about it: "communication consts increase
proportionally with the amount of regions". Not a word on wether this increase
is linear, polynomial or what ever. I think that you'd get beter communication
performance by increasing greatly the amount of JMS channels you are using, and
getting the neighbor worker only register to the subregion corresponding to the
frontier with their own region. I feel strange that communication have has as
little impact on your performance as you say, and I'd like to see more precise
figures about it. "our tests reports that the load on the communcation server
was always reasonably low" does not quite convince me here...

Another strange point concerning performance is the proposed approach for
proximity screaning. The article says that "the most simple (sic) way of do
(sic, again) this filtering consists in a O(n2) filtering". A n2 algorithm when
n=10^9 looks like not quite doable. It's good that simple algorithms allow to
break this down greatly (like the fact of splitting the world over a grid and
only considering the elements in the same cell and adjacent cells as potential
neighbor -- the smaller the cells, the less comparison you'll have to do but
the more dynamic your data structure becomes). We both know it, but stating so
in the article would be better. Maybe that stuff like "Skip List Data
Structures for Multidimensional Data" may help too, but your case seem too
dynamic for such data structures.

I don't see the point of using a P2P algorithm instead of JMS when the maximal
amount of hosts you use in 7. Even at a few dozen, I'd still think the same,
and I feel like you'll encounter other issues when trying to pass the 100 hosts
limit. I'd remove this reference to P2P DHT altogether at this point.

The evaluation process is not enough detailed in my POV, but it may be due to
my acointance to the system community. Did you run your experiment several
times, remove outliers and take averages or something? Are you sure that your
hosts did not swap during the experiments? Last point of Mason in Fig 5 seems
suspect to me. I'd like to see an evaluation of multithreaded aspects too
(what's the repartition of compute and syncronisation times within a step, for
example).

Another missing evaluation is the impact of the system over the reactivness of
the harnessed desktops. That'd be fully in the scope of the target use case
presented in the introduction.

Of course, evaluation with more than 7 hosts would be more than welcomed. 256
machines seems like a bare minimum to me, but I may be biased ;)

Points I'd like to see developed in the presentation (if any)
=============================================================

You seem to devote a very large importance to the ease of use (even passing
specific options to the JVM to increase the heap size seem too much for your
targeted users). I'd thus like to see some material demonstrating the ease of
use of your solution. Maybe a short screencast of a demo if time permits? At
least a couple of screenshots seem welcomed here.

Of course, as a in-depth evaluation freak, I'd like to see lots of curves
explaining where your time goes exactly and why it's too near to optimal to be
further improved. But with "only" 10^9 agents and 12 seconds per steps on a
decent machine, I'd say that there is still a lot to improve on the technical
side, so that's no strong request from me here.


--------------------------------------------------------------
--- In the evaluations below, uncomment the line with your
--- evaluation or confidence. You can also remove the
--- irrelevant lines
*** OVERALL EVALUATION:

---  3 (strong accept)
--- 2 (accept)
1 (weak accept)
---  0 (borderline paper)
---  -1 (weak reject)
---  -2 (reject)
---  -3 (strong reject)

*** REVIEWER'S CONFIDENCE:

---  4 (expert)
3 (high)
---  2 (medium)
---  1 (low)
---  0 (null)

*** END
--------------------------------------------------------------