This post is about an ongoing research effort that we are conducting on the PLM, with Alexandre and Gerald. The starting point is that we have a very respectable amount of student submissions. According to the numbers that I extracted for our presentation last week at ITiCSE, we had 133,636 code submissions on July 7th: 17,480 success; 42,693 compilation errors; 73,463 failures.
We are interested in these failures, that are submissions that compile correctly but do not lead to the right solution. We would like to see whether some errors are frequent enough to deserve a specific remediation text. If that's possible, it would be awesome if the PLM could provide a feedback such as "well, it looks like the stop condition of your loop is improvable, as you did one step too much".
Technically, the idea would be to add some CommonErrorEntity to the exercises. Once the student code is executed, the resulting world will be compared to the solution (to see if it's correct) but also to those common errors. If it matches, the error message will be the one associated to the common error and not the vanilla "oops, you could do better". This could be evaluated through the students' feedback to that message.
First, we had to mine the data to see if that idea is possible, and if yes, to extract the exercises that do need such a remediation mechanism.
Rerunning the student code was already quite a challenge as they naturally did every possible mistake in their code (including System.exit() that will stop the replayer...) so we had to use ugly tricks. We ended up executing the exercises in a random order each time, with a cache on disk plus a chaos monkey rebooting the tool forcefully every other minute. That's rather brutal, but this got us around the many cases where our tool hangs or breaks. After 2 days of batch processing, we managed to squeeze out 43000 executions (out of 73000 in the trace).
Getting the other failing submission would require to improve our replay infrastructure. But we are not inclined to do so because another student is currently working on creating a distributed infrastructure for the execution of PLM exercises, with proper message queues and dockerized judges. We will rely on this infrastructure when possible; we already have enough data to play with.
The big news is that those 43,000 failures seem to contain only 8600 distinct errors. In other words, those 43,000 failures raise only 8600 different error messages from the PLM (one given message -- for the Moria exercise -- appears 2600 times). Knowing that the provided error message contains a diff between the world achieved by the student and the objective world, this probably means that our students only managed to create 8600 differing errors over our exercise base.
So YES, there is some errors that happen very frequently, and YES, it seems possible to provide an automatic remediation text for some of them. YUHU, life is good and I love science.
Now, we have to dig into each of these clusters to investigate whether we can (1) understand the error from a pedagogical POV (2) provide an adapted error message that is at the same time positive/cheering, informative about the problem and not spoiling the exercise too much. And (3), we need to check that all errors in the cluster are similar: there could be the case that two distinct errors lead to the same final situation. In this case, we don't want to show the remediation of one error to the students doing the other error. To differentiate, we need to add more test cases to the exercises so that each group of errors lead to a separate error message.
Tools such as OverCode would be really helpful here, but we don't have them at hand so we dig manually. That should not be a problem for (1) and (2), but (3) will probably be harder. We will see: that's our next step; we could maybe develop our own tooling once the dockerized judges work.
The sure thing is that we have to hurry up: we can collect data only once per year with the PLM and that's the first week of September, when all the new students arrive at Telecom Nancy. That's so little time!