You are welcome to read (and contribute ideas to) this page even if you are not a student with me, of course. I will try to keep it short and efficient mainly targetting my students, even if I'd love if it could be informative enough to help others.
If you work with me, you were probably asked to address a given problem in a scientifically sound and original manner. Here are the big lines of your work during the next few months:
- Understand the problem I gave you and why it's interesting
- Read the bibliography to see what other solution were proposed
- Contribute a (set of) new solutions
- Evaluate your solutions on some use cases.
- Comment on your contributions and what would be improved by the next person working on that problem (possibly yourself).
This is not the only workflow in science, but this is by far the most common one in Computer Science. You may be asked to only review the literature and compare the previous findings on a topic.
In Natural Sciences (Physics, Biology, etc), you will often be asked to conduct an experiment under a specific protocol, and discuss the results you got and/or the protocol's qualities. In Maths, you will be given a set of axioms, and you will have to prove one or several theorems (under some hypotheses). This is because you not often contribute anything but findings in Natural Sciences (birds don't need you to breed), and you don't often run experiments in maths, while you need to do both in most Computer Science studies.
This page aims at explaining the methodological tools that I expect from you, in a very practical manner. Some elements of this approach can be discussed if you feel uncomfortable with my proposal. The uttermost goal is to do things right, and to do them efficiently. For that, your reporting is absolutely crucial.
Reporting
I ask you to write a little reporting regularly. Depending on the situation, it may be every day, every week or every month. In any case, your reporting is very important for the following reasons:
- It forces you to think about what you are doing, which may help you to unblock your problem by your own. Writing down the problems in a clear way is often sufficient to see the solution appearing.
- It helps me following your progress even between the meetings. I cannot unblock you if I don't detect that you are on a wrong lead or otherwise blocked.
- It keeps a track of the steps in your work. That's good for the day
where you want to write your final report (even if a final report
should never be presented in the chronological order). That's good
for the next after you who will be supposed to continue you
effort, or to build upon it.
That person may be yourself (if you go for a PhD program), another intern, myself or even someone else on the Internet: that's what we call Open Science, an effort where everyone can build upon the scientific work of everyone.
I want you to write your reporting in an org file (yep, you don't have a choice here). They are plain text files following a very simple syntax called Org-mode. The best to edit them is to use the emacs editor, even if you can also do a lot in vim (I dunno how -- link welcome).
Learning Org is not very difficult. Here is a ref card.
Learning emacs
Emacs is a really nice editor, but it comes with two main issues. The first one is its learning curve, that is definitely not nice. It's because emacs is one of the older program out there. It was invented before the conventions that we use in most other programs. So the first thing to do is to copy the following in your ~/.emacs file, so that you can use Ctrl-C and friends to copy, cut and paste.
(cua-mode t)
Once you load your file in emacs, you know that everything works
because it displays "(org)" in the footer line, as following (the
word "org" is manually highlighted in the screenshot below). If that
word is not to be read, try typing M-x org-mode
(check
the cheat sheet below to discover what M-x means in emacs). If it does
not work, check your installation of emacs and org-mode and try again.
If everything fails on you, then speak with me.
Another issue with emacs is that everything is configurable. This is good, but it sometimes drives my crazy when I feel that I should configure everything myself. This is why I'm using spacemacs, a default configuration for emacs. I use it myself and I think it's particularly precious for new users. If you prefer vi to emacs (VI VI VI, the editor of the beast), spacemacs is definitively made for you.
There is much more to say about configuring emacs and org-mode. Read the doc, have a look at this page (Arnaud's org-mode configuration), and print a cheat sheet about emacs in general and orgmode in particular. Here is one for emacs:
Learning git
You will need to use git to store your files. Here are some links for beginners in git.
- http://swcarpentry.github.io/git-novice/01-backup.html
- http://betterexplained.com/articles/aha-moments-when-learning-git/
- http://git-lectures.github.io/
- https://try.github.io/
- http://pcottle.github.io/learnGitBranching/
Reporting logistics
Once you're setup with all software installed and somehow configured, you need to create a reporting file in a place where I can see it and where it won't get lost if your disk crashes or something. Open a dedicated git repository (on github or gitorious) for that. After your internship, your report should be archived directly in the source tree of the software that you are working on, if any. But having your reporting located in the source tree may complicate things during your work.
Yes, it means that your file will be public at some point, but that's why we call it "Open Science", after all. Also, you should write it in English if possible. The part of your reporting that is called "Journal" (see below) may be written in French if you are more efficient this way but the rest must be in English. Don't make your tone too formal because the file is public. Make it efficient. Nobody will ever blame you for the work you did during an internship a long time ago. If you really want, we can even make this file anonymous. Just speak to me.
You want to write your reporting before leaving work. Weekly reporting should be written on Friday, one or two hours before leaving. That's the best solution to have a nice week end without thinking about work, and still lose no information that you would need on Monday morning.
Reporting Document Organization
Your reporting document should have four main parts:
- Findings: This section summarizes the general information that you gathered during your work. It is empty at the beginning of your internship, and gets fleshed with the important things that you find on your way. That's where bibliographical information go, for example. But that's definitely not where TODO notes go (see below).
- Development: This section presents the technical sides of your work. Don't write anything in there yet. Put it all in the Journal part for now.
- Journal: Describe the day-to-day work done for each period (day, week or month) of your internship. That's the most important part of your reporting, and we come back to it below.
- Conclusion: That's what you write in the next week of your
internship. You can see it as a letter to the next guy, explaining
the current state of your work, a few words about its technical
organization, and what should be done next on that topic.
Keep this part highly technical, the overall organization of your internship will be seen in your final report.
The Journal part is the only part that you may write in French on need. You want to add one subsection per period to your journal. Don't make it too long, or you would waste time writing long texts that very few will ever read. Don't make it too short or it will be impossible to understand it on Monday morning (or three months after). Finding the good balance is sometimes difficult, but I will provide feedback on your first entries, so don't worry.
Each of section describing a period should contain three subsubsections:
- Things done: a few words about what you've done. Something like 2 or 4 items with a few words describing what you've done. You can omit the title of that section and put the items directly in the upper section (see the example below).
- Blocking points and questions: try to explain clearly the
things that block you or slow you down.
If you found the solution already, then it should be part of the
previous subsection (but you should say a few words nevertheless).
Also ask every question that you may have for me in that section.
If the question are personal (e.g., about the logistics of your
internship such as salary or so), please prefer emails that are
not publicly visible.
If this section is empty for a given period, skip it all together (no empty subsubsections). - Planned work: A few items about what you plan to work on during the next period.
A template of reporting file is given at the end of this section. This is just a strong advice: If you really feel better with another file organization, then give it a try for one period, and ask for my feedback. I can adapt, and I do not pretend that my advice is the definitive answer. It's just the result of my experience so far.
Notice how TODO items are written: they are given as items in the Planned work sections of the journal. As explained in the documentation, you simply have to write "[ ]" in front of items that you plan to do in the future.
You should add a [1/] on the "Planned work" line, so that emacs keeps track of what is done and what is still to do. Once they are done, you type C-c C-C on their lines to change the blank box [ ] into a checked box [X]. Also, the [1/] will be changed to denote the amount of work that is still to be done.
At any point, you can see all ongoing TODO items with the following keystrokes: "C-c / t". More information on TODOs in orgmode's documentation. The important thing here is that most TODO items must only be written in the /Journal/ part (so that we know when they occured).
Do not edit past entries of your journal, unless you have very good reasons. If you must, make sure that you don't lose information about the path that you took (remember the Open Science thingy). You should always add information to past entries, such as:
- *edit* This hypothesis does not hold; see the entry of [the day where you found it] for more information.
The only exception are TODO entries, that should clearly be rewritten to DONE entries. If you need to adapt your TODO entry (because the initial goal was poorly stated or otherwise), change the initial entry from TODO to CANCELED (or check the box after stating in a subitem that it was not done but canceled, and why), and create a new TODO entry in the current period section.
* Introduction
This file contains the reporting for my beloved internship done on
this topic on that year. For now, just add the official title of
your internship (check the convention signed between your
university and my lab). After a few weeks, once you really
understand your internship, you should write a few paragraphs about
the context, problem and motivation of your work, with some
possible use cases. But don't do that right now.
* Bibliography
* Journal
** Week 2 feb
- read the doc about writing my reporting
*** Questions
- do I really have to use emacs?
*** Work Planed [1/2]
- [X] install emacs and setup orgmode
- [ ] read the provided articles
** Week 9 feb
- Installed emacs
(omit the Questions section if no question)
*** Work Planed
- do some useful work
Reading the Bibliography
That's the very first step when doing a scientific work: you need to understand how previous people tackled your problem (and similar problems) to build upon their work and propose an original solution. Fail to read the literature, and you will probably reinvent a (square) wheel. But if you read too much literature, you won't be able to propose any decent solution yourself in the allocated time frame...
If you don't pay attention, you will get drawn in the scientific literature that is incredibly dense and flourishing. Even for very narrow fields, dozens of new related articles appear every month, since years. First of all, learn to read scientific articles (and to not read irrelevant literature) from this very good memento: Efficient Reading of Papers in Science and Technology, Michael J. Hanson, Dylan J. McNamee (pdf). Print it, read it, keep it, read it again. And again.
The scientific literature is a web of knowledge: any article refers to other related articles, but no article is self-contained. When you start exploring a field, PhD thesis are a blessing as they are often present the classical background that you will find nowhere else. For every interesting article that you find, you want to explore its neighborhood within the web of knowledge. You want to check both the articles that your nice finding refers to. Even better, you want to explore the articles citing the paper that you find interesting. Going backward in time is easy: you type the title+author in a search engine, and hope that the article is available online. To go forward in time, http://scholar.google.com provides an intuitive interface to the whole web of knowledge.
Most of the time, you will hit the paywalls of the scientific editors. They provide the abstract online, but you have to pay to read the article. That's a pity, and these guys are definitely a pain in the ... scientific world. But that's how things go for now so when you hit a paywall, drop me an email. My institution pays these vampires and I can retrieve the articles that you need. At least for most of the publications in computer science.
Of course, you want to alternate backward and forward explorations of the literature for each papers that are interesting enough to continue exploring. But remember: you don't want to read it all. You must be selective about what you read. Refer again to the memento above before getting drawn in that huge amount of information.
For each paper that-you-do-not-throw-away-after-checking-the-abstract-because-it's-not-related-your-problem, try to understand which problem is addressed and how it relates to your own problem. Summarize the attempted approach, its advantages and drawbacks compared to the other methods you found in the literature so far, and discuss how these methods could be applicable to your specific case.
Actually, reading the bibliography is a never ending task. These days, I ask my students to read and present one new paper per meeting. Of course, I don't expect in depth reading here, but I want to open their eyes to their scientific field. Here is the practical steps to take for that:
- Find a bunch of papers to consider:
- Check on interesting conferences (they contain the papers you like)
- Check on interesting authors (they wrote the papers you like)
- Check the references of papers you like
- Use google scholar to find papers citing papers you like
- Select a paper to read:
- Read a lot of title, and discard most of them
- Read the remaining abstract, and discard some of them
- Read the remaining conclusions, and pick something looking interesting
- Read a (few) paper(s). Don't go too deep, you don't have the time
for that. Just answer these questions in a few words.
- Who they are, what they've done, who they worked with.
- Promise of the paper: what they try to achieve
And how it relates to our work - Method of the paper: how do they try to tackle their problem.
what are their good ideas, nice tricks and grunt work - Achievement: what do they really have at the end of the day?
How does it relate to their promise, to the achievements of other methods and more important, to our work? - Lessons learned: how does this reading contributes to our work?
New idea, new potential research lead, new body of literature to explore/read.
- Don't try to exhaustively read the bibliography!! (main trap)
- There is really too much scientific literature, you cannot read it all. Choose!
- Do not forget your initial interest, your current work. Choose wisely!
- You are reading too much to remember it all. Take notes! And organize your reporting wisely.
So. What is it that you want to study?
The goal of reading the bibliography is to get a better understanding of the "State of the Art" in your field. What people know, what remains to be found or at least explicitely expressed. While reading, you should maturate the question to which you want to contribute an answer. You probably need some guidance to find a promising question. That's fine: your advisor will love refining your ideas.
Once you understand the scope of your work, write it down quickly before getting distracted toward other more appealing questions! Finding ideas is just like dreaming: mandatory but not enough. Our job is to improve contributions up to the point where other people want to learn and read about it. Pick that nice idea that you had, and keep on that track for a while without getting distracted.
It is now time to split the Findings section of your reporting into two parts:
- First, the Introduction section should present, motivate and detail the problem that you are tackling. Having a clear and well defined scope is important to remain productive and have a pleasant journey. Narrowing or modifying your scope may be helpful to rule out other solutions of the literature upon which you cannot build (or that you cannot beat
- Then, the Related work section should summarize and discuss what you've learned about the previous attempts to solve that problem (or similar ones).
Writing is much easier if you imagine the ideal reader with whom you want to communicate. Once you've identified who will read your text, you can imagine their motivations and write to fulfill them.
You should write for the Next Guy, ie the next intern or scientist working on that topic. Be clear and informative so that s/he gets your point. Be concise to not waste her/his time. Remeber how those long and boring papers put you asleep, how you battled with those terse and impenetrable papers, how you hated summarizing unclear and unconclusive papers, how you felt frustrated reading these unconvincing arguments. And try to not be one of them, as a decency for your reader.
Also, you want to keep formal to not pollute your message with humor or other emotions. Humor is highly cultural while science tries to be timeless, so they rarely fit together. Don't do it.
You don't need to write a real text for now. An itemize of clearly organized narrative flow is much better at least until the very end of your internship, during the wrap up phase.
Building your Contribution
When your understanding of the State of the Art improves, some drawbacks in previous approaches will become clear to you, as well as the possible improvements. Some brand new solutions may also occur to you. All these ideas will eventually become your own contribution to the field.
Before trying to implement and evaluate these ideas, you want to clarify them. Add a new section in your Findings section: the Contribution section in which you present your set of new solutions.
The common pitfall is make this presentation too technical. For me, the difference between scientific findings (that should be written) and technical results (that should remain in the source code only) is that the former last at least 10 years. Focus your presentation on elements that will remain true when your technical settings become obsolete. For example, present only the parts of your work that will remain useful even when Linux becomes obsoleted by a new and radically different operating system.
Technical details should first be written as comments in the source code, that should be commited to a specific branch in a public repository. You may also want to write them down in the Development section of your file. Be careful, most interns become overly verbose when describing their technical contribution (and waste their precious time doing so). You just want to give the big lines in your file, the code should speak by itself.
The trick is to get your Findings/Contributions section to be both generic enough to remain useful when the technical details pass, AND to be specific enough to allow the Next Guy to build upon your solution. That's a tough balance to achieve. Target the level of details that you got in the papers that you read. Clarify the big lines of your contribution; say a word of the alternative designs and justify why you chose this one. Omit the technical details of your contribution: interested readers will turn to your source code, and to the Development section of your file.
That Contribution section must be drafted before writing the first line of code (don't be a mad coder). An itemize of the big ideas often suffice. This section must then adapted while you code. At the end, the scientific description must match your actual technical contribution, of course.
Evaluating your Contributions
The idea is simple: you wrote some contribution, you now devise a quality metrics adapted to your case, you come up with representative use cases and you measure the metrics that your contribution reaches on the use cases. If possible, you should strive to compare your metrics to the ones of previously existing solutions (possibly after reimplementing these other solutions).
But the devil is in the details. This is particularly true when you study distributed computer systems as I do. If you are not furiously meticulous, things that used to work will stop working. Things that once failed will start working with no good reason. The solution should come clear to you: you want to write everything done.
Running the Experiments
Add a Data Provenance subsection (under Development), and write down all steps of your experiments. Use the babel part of org-mode to include everything directly in your file. Add the scripts installing the needed software on the host machine. Add the little source codes that exercise your contribution under the selected use cases. Add your platform files (or your script that generate them).
When done correctly, you can even have org-mode reexecuting these scripts and code blocks automatically for you when you press "Ctrl-C Ctrl-C" on them. It can even run these blocks remotely from your own emacs on your own laptop to a remote Grid'5000 node!
The configuration is a bit tricky, however. Here is a working example: https://raw.githubusercontent.com/mquinson/simgrid-simpar/master/report.org This file does not exactly follow the organization described on the current page: the journal is in a separate file, and the Findings and Development sections are intermixed (the Development sections are just marked as :noexport: in the document). But you should still be able to extract the information that you need.
Evaluating the Experimental Results
Experiments in modern science tend to generate large amounts of data. You then need to be careful when analyzing this data to cast a conclusion. The best tool for that analysis is clearly the R language.
Add a Data Analysis subsection (under Development), and write down all steps of your analysis. As for the data provenance, you want to write all R code in blocks of your file, so that data can be re-analyzed easily when new data is produced.
Learning R is not trivial, and I will answer your questions the best I can. As before, please refer to this working example to see how things can be organized: https://raw.githubusercontent.com/mquinson/simgrid-simpar/master/report.org
At the end, you want your R blocks to produce informative figures analyzing the performance of your contribution on the predetermined use cases. But you also have to comment these figures in a section Evaluation subsection (under Findings). That section should describe your experimental settings and comment on the figures to present the conclusions that can be casted out of them.
Summarizing your Findings
This section should naturally be extended, but if you're doing it right, you may already have a very interesting scientific paper in your Findings section, along with the corresponding technical report in the Development section. I must confess that this was my plan from the beginning: having fun in Science is not enough. You should share that pleasure with the Next Guy looking at the same problem.
Further Information on How to Do Science
I try to collect here some useful documents for (future) interns. With the time, I added some insightful documents aiming PhD students and young colleages.
- dea.ps.gz: the advice I received when I was intern myself (in DEA). Sorry, that's in French.
- Poly rapport projet 2A.pdf advice given a few years ago to the interns of ESIAL during their second year Research internship. Yeah, that one in French too, sorry.
- efficientReading.pdf Efficient Reading of Papers in Science and
Technology, Michael J. Hanson, Dylan J. McNamee.
A very good paper on how to read a scientific article and in order to avoid getting over-flooded by the literature review that your advisor will give you... - ScientificWriting.pdf: A very good presentation on how to write easy-to-read text. It's not only about what is correct or not (as I learned at school somehow), but about what sounds great or not.
- A good presentation on how to write an article in practice. I regularly read that presentation myself when starting a new article.
Some examples of LabBooks provided for inspiration
- Luka Stanisic have maybe the best labbooks I know. Part of his PhD thesis was about designing a methodology for reproducible experiments in large scale distributed systems. Don't miss his postdoc LabBook, that's enlightening. Also check LabBook of his advisor, Arnaud Legrand. Arnaud also maintain a huge amount of resources on reproducible research.
- The students advised by Lucas Schnorr usually also write a very nice reporting. Check for example https://github.com/taisbellini/aiyra/blob/master/LabBook.org, https://github.com/mittmann/hpc/blob/master/LabBook.org, https://github.com/tcbozzetti/trabalhoconclusao/blob/master/LabBook.org
- Some other students: Léo Villeveygoux (co-advised by Luka Stanisic)
Here are the labbooks of some people working with me:
- Ezequiel Torti Lopez, M2R 2014. Report, with both the data provenance and the data analysis included in appendix.
- Betsegaw Lemma, M2R 2017. LabBook
- Gabriel Corona, engineer on SimGrid, 2015-2016. Journal, Blog (findings).
- Matthieu Nicolas, engineer on PLM, 2014-2016. Journal.
- The Anh Pham, PhD student, started in 2016. Journal.