#+TITLE: Internship on SimGrid with Martin Quinson and Anne-Cécile Orgerie. #+DATE: [2016-05-17 Mon]--[2016-05-15 Fri] #+AUTHOR: Simon Bihel #+EMAIL: [[mailto:simon.bihel@ens-rennes.fr]] #+WEBSITE: [[simonbihel.me]] #+LINK: [[https://github.com/sbihel/internship_simgrid]] #+LANGUAGE: en * Introduction Distributed systems have grown to be immensely complex. To help study and improve these systems simulators have been developed. Along with the ease of access they allow testing new pieces of software while having the exact same environment each time. Extracting results from experiments is also easier as a simulator can have perfect knowledge of all simulated component. The matter of study can be the energy consumption, computing time, bandwidth usage... There has been a lot of simulators but few have been able to keep up with the evolution of distributed systems and be able to simulate grids alike clouds for example. [[LMC03][SimGrid]] is one of them and the one I have worked on during my internship. My work was focused on the simulation of clouds with elastic/dynamic tasks. These kind of tasks can be used for website requests like rendering markdown on a wikipedia page where the task is triggered/run/activated for each request. Depending of the usage the resources needed will fluctuate and compared to a normal task there isn't one computation duration as requests are made during a certain time, not all at once. I proposed a way to formulate these task for the users of SimGrid and implemented it. Experiments have shown the validity~ of the contribution. * Findings ** Bibliography *** Writing *Background* + What's a cloud + Minimising the cost (in all forms) is a research challenge, particularly for fluctuating usage + Different types of scaling + Present the survey that has categorized these works + Simulation isn't used that much and it would give a lot + How are they categorized + Say that it gives good overview of what is needed to simulate + Say that it covered everything Cloud computing is a model that makes available infrastructures, platforms and software with a pay-as-you-go subscription. It aims to reduce the cost with a layer of virtualisation that allows virtual resources to be dynamically adjusted and occupied on-demand. The problem of using the minimal resources for the current demand/usage is still a research challenge that spans all layers and applications. This dynamic management of clouds is called cloud elasticity. <> has categorized works on cloud elasticity and allows to see which elements of a cloud infrastructure, platform or application/software are impacted. As it is for now most research works are evaluated on real clouds It is interesting for a distributed systems simulator to search what is needed for simulating cloud elasticity. If it is shown that research works on cloud elasticity can be evaluated on a simulator they would benefit from cost reduction, re-runable experiments, trust in results... In this survey proposals are categorized as follows. The scope is about what elements of a cloud the proposals work on. It can be the management of VMs, allocation of resources... Then there is the purpose of the proposal. Enhancing the /perfomances/ (to meet the SLA), reducing the /energy/ consumption/footprint, being /available/ when needed and reducing the overall /cost/. Another dimension is the decision making. This is what a proposal add to an existing cloud to pursue its purpose. In addition to the scope there is the elastic actions performed by the proposals. As the scope is about what elements of a cloud are concerned, the elastic action is about what is done to them. Then there is the provider dimension that tells if there is only one provider or multiple ones. At last there is the method used by the proposal to evaluate itself, through real cloud, simulation or emulation. The survey gives a good overview on what elements of a cloud are manipulated to achieve cloud elasticity. But in the end.. // How can I be sure that it has covered all cases? As the proposals are on reacting to variating usage, simulators need a way to express this fluctuating workload. We worked on elastic tasks that model tasks that are triggered regularly and with a usage that fluctuates over time. *State of the art* (Needs for elastic tasks and concurrent tools) + Which elements have to be simulated and how do they work ? + How workloads are generaly modelized + What others simulators have done Based on the classification of the survey, a simulator should allow the manipulation of scopes, the evaluation of the different purposes, make possible the elastic actions and allow multiple providers. // Tell that simgrid does all last 3 ? At the moment no simulator article talks about dynamic workload. On the other hand in the code of DCsim there was an interactive task and in the code of CloudSim there was an host with dynamic workload. // Go to contribution and go through each scope ? For en-actor scopes, the point of elastic tasks is just to generate usage so they can chose an application type depending on what kind of usage they want (cpu, disk...). For the different kinds of application type, elastic tasks have the same mechanism it is just the inherent micro-task that is repeated that changes. An elastic task will repeatedly execute an MSG task. Currently an MSG task can only simulate computing and message passing so only Multi-tier Applications can be simulated at the moment. Simulating disk and RAM usage would allow the simulation of databases, storage and thus generic application. *Contribution* What can my contribution do that is in the survey? + Only computational tasks for now, might be able to do storage/DBs tasks in the future. Generic ones won't be possible unless you can pass a task directly to an ET. + Need to use the host of a task when executing to allow vertical scaling and need to manage multiple hosts to allow horizontal scaling. + Multiple provider is possible but has to be coded. + Purpose? *** References + Clouds - <>[[http://link.springer.com/chapter/10.1007/978-3-319-29919-8_12][Cloud Elasticity Survey]]. Survey on research work on cloud elasticity. Good overview of all research done on cloud elasticity. It gives hint at what people might want in SimGrid. Tons of references to papers that gives better understanding on the way of formulating workload and other stuff. Highlight: "Finally, more research on benchmarks is needed to better assess the quality of each of the proposals.". - <>[[http://www.cs.rutgers.edu/~ricardob/papers/asplos12.pdf][DejaVu]]. Framework that enhance and accelerate resource allocation with e.g. caching. Used real traces for evaluation. Explains how to deal with dynamic workload. For their Hotmail traces they reference [[http://research.microsoft.com/pubs/144957/euro040-thereska.pdf][this]] article which acknoledge some people for it at the end. - <>[[http://ac.els-cdn.com/S0167739X1400003X/1-s2.0-S0167739X1400003X-main.pdf?_tid=4acfd48e-3871-11e6-afe5-00000aab0f6b&acdnat=1466597171_52db5c840097473a97294f899053a67b][Coordinating Managers]]. Uses RUBiS for experiments. + Simulation - <>[[https://hal.inria.fr/hal-01017319/PDF/simgrid3-journal.pdf][SimGrid]]. - <>[[http://research.microsoft.com/pubs/143358/socc10-spikes.pdf][Modeling workload spikes]]. Proposal for generating significant/realistic workload spike. "In the rest of the paper, workload volume represents the total workload rate during a five-minute interval." What differenciates them from some reated work is that they are interested in a minute scale. The use a normal workload and from it they multiply it to get spikes. Based on their generator they would only use triggerOnce for ET. They use Zipf's law. + Concurrent tools - <>[[http://www.buyya.com/papers/CloudSim2010.pdf][CloudSim]], [[https://github.com/Cloudslab/cloudsim][repo]]. It's a simulator of clouds. Quite famous but nothing on elastic tasks (HostDynamicWorkload in the code). Good background section, speciallly built for clouds. No elastic task and is apparently missing VM related stuff (see [[TKBL12]]). - <>DCsim's [[https://github.com/digs-uwo/dcsim][repo]], [[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6380046][paper1]], [[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6727859][paper2]] and [[https://www.dmtf.org/sites/default/files/svm2012_presentation1.pdf][slides]]. Simulator for data centres to evaluate resource management. Potential users of SimGrid among its users, InteractiveTasks in the code. - Searched who cited DCsim. [[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6380049][One]] paper was about comparing algorithms, [[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6572981][another]] about switching strategies at runtime. They both seem to give details even if the code isn't available. Well I have no idea how this could by useful as they are describing experiments that have nothing to do with elastic tasks. - <>[[http://rubis.ow2.org/][RUBiS]]. Benchmarking auction website. - <>[[http://delivery.acm.org/10.1145/1810000/1807152/p143-cooper.pdf?ip=131.254.104.45&id=1807152&acc=ACTIVE%20SERVICE&key=7EBF6E77E86B478F%2E9BD6B3DBCD4B0A3B%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=620961178&CFTOKEN=20477141&__acm__=1466600316_208bf65c16eed45e57cd254a778a1ecb][YCSB]]. Benchmarking Cloud Serving Systems. + Not relevant - [[http://ac.els-cdn.com/S1569190X1300124X/1-s2.0-S1569190X1300124X-main.pdf?_tid=0ede5a0c-2351-11e6-826f-00000aacb362&acdnat=1464274353_4043525da0d2e6c2cb9432f0a6955443][DCworms' paper]]. Simulation to study the energy-consumption of datacenters, part of CoolEmAll project. What's interesting for me is that it uses workflows to model workloads. Broad range of tools. But I think it's focusing on a model that allow better energy consumption analyzing. Globally it is very focused on having control on everything to get a precise evaluation of the energy consumption. - <>[[http://download.springer.com/static/pdf/46/chp%253A10.1007%252F978-3-642-31552-7_39.pdf?originUrl=http%3A%2F%2Flink.springer.com%2Fchapter%2F10.1007%2F978-3-642-31552-7_39&token2=exp=1463995249~acl=%2Fstatic%2Fpdf%2F46%2Fchp%25253A10.1007%25252F978-3-642-31552-7_39.pdf%3ForiginUrl%3Dhttp%253A%252F%252Flink.springer.com%252Fchapter%252F10.1007%252F978-3-642-31552-7_39*~hmac=81aa15290d88a2cbd2017547f69672bbe5f6ce338b05eba1489ca37d2cfb1fa2][ISim]]. Took a look because it was speaking of dynamic workload. But it is a meta-scheduler and it performs workload consolidation for power management. In the end I think it has nothing to do with what I looking for. + Misc. - <>[[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6779436&tag=1][Survey]]. - <>[[http://ieeexplore.ieee.org/ielx7/7092813/7092808/07092927.pdf?tp=&arnumber=7092927&isnumber=7092808][Complement to simulations]]. + Not categorized yet / Not read yet - <>[[http://ieeexplore.ieee.org/ielx7/6902666/6903436/06903474.pdf?tp=&arnumber=6903474&isnumber=6903436][Autoscaling]]. Autoscaling on heterogeneous resources and multiple levels of QoS requirements. It uses wikibench for the evaluation and runs it on real infrastructures... - <>[[http://faculty.cs.gwu.edu/~timwood/papers/icac13_final.pdf][Memory caching]]. Adaptative distributed (autoscaling, evenly distributed load) memory caching. It uses wikibench for the evaluation but runs it on real infrastructures... - <>[[http://ieeexplore.ieee.org/ielx5/6297612/6298144/06298161.pdf?tp=&arnumber=6298161&isnumber=6298144][Profit-Maximizing Resource Allocation]]. Again doing experiments for real. - <>[[http://www.cse.psu.edu/~bus145/MDCSIM.pdf][MDCSim]]. Simulation platform for in-depth analysis of multi-tier data centers. ** Contribution - Proposals. 1. Real traces. 2. Tasks like in DCsim with visit ratio (like how many times the task's triggered/launched). 3. Generator function. - Scenario: You have a website. Each time a page is loaded you have a task that is triggered. In real life you have one vm exclusively for this task and overall the amount of work depends on the activity of visitors overtime. Thus you want to express a task that has a fluctuating computing requirements and that lasts overtime (there is no fixed amount of computation to execute immediatly and use all resources available and kill when it's done). - Criteria of quality for proposals. + Complexity for the user: describing elastic tasks just be at least familiar to normal tasks. + Size on disc/in memory: real traces take a lot of space so the description of fluctuations for an elastic task just be lighter. + Computing speed: elastic tasks should be able to be precise enough to avoid wrong simulations but without taking too much longer than current perfomances. + Expressiveness: expressing elastic tasks should be natural and close to setting up real dynamic cloud tasks. + Implementable in SimGrid: avoiding massive refactoring and using current code would be appreciated. - e = new ElasticTask(comp_size); e.setTriggerRatioVariation(vector); OR e.setTriggerTrace(FILE*); e2 = new ElasticTask(comp_size); e.addOutputStream(e2); - Cases that the contribution should cover: + Horizontal scaling (number of VMs is modified). + Vertical scaling (dynamically configuring the CPU and the RAM and Disk size). /Should we deduce from that that DB tasks doesn't impact other stuff ?/ + (Application) Live migration where only specific DBs are migrated instead of full VMs. + Application reconfiguration (i.e. application architectural change). - Develop on S4U - See maxmin code to find out why it's difficult to write a callback for VMs - Processus Alice et Eve S4U 2 .hpp for deployment and execution doc S4U 3.14 Eve's a user that's gonne verify that the contribution's working See energy.cpp as an example of plugin * Development * Global Goals ** TODO Internship subject <2016-05-30 Mon> ** TODO Bibliography <2016-05-17 Tue>--<2016-05-27 Fri> ** TODO Contribution <2016-05-30 Mon>--<2016-06-17 Fri> ** TODO App + study <2016-06-20 Mon>--<2016-06-27 Mon> ** TODO Experiments <2016-06-28 Tue>--<2016-07-05 Tue> ** TODO Report writing <2016-07-06 Wed>--<2016-07-13 Wed> ** TODO Report 1.0 <2016-07-15 Fri> * Journal ** Week 1 <2016-05-17 Tue>--<2016-05-20 Fri> *** Things Done - Read Introduction, Background and Architecture parts of the CloudSim's paper [[CRBRB10]]. Gave better understanding of cloud's layers and the difficulties added to grids. - Opened the [[http://www.buyya.com/papers/gridsim.pdf][GridSim paper]], looked at some figures and closed it upon encountering pages of uml class diagram and code samples. - Meet-up with Anne-Cécile and Martin. Better understanding of my role (how to express elastic tasks) and the context (other simulators, the point of this work, ...). - Tweaked/Fixed vim/tmux/orgmode config stuff, [[https://github.com/sbihel/dotfiles][my dotfiles]]. - Looked resources on DCsim <>. Said in 2012 that CloudSim is missing VM replication, VM dependences, work conserving cpu... Talks about reallocating resources to VM (not wasting cpu's unused shares/resources) and managing resources following fluctuating usage in general, but not elastic tasks. In the few examples, there is one about StaticPeak as a SimulationTask but all examples look the same, I must have missed something. *** Blocking Points - +Can't connect on irc through Inria's network ??+ Currently using a ssh tunnel. - "lua5.2 found when lua5.3 is required" for -Denable_lua. Library for 5.3 not installed. /on OS X/ - libdw not found for -Denable_model-checking. /on OS X/ - +Should I focus on VM deployment (allocation, provisioning) or VM usage (management) ? ("les charges")+ VM usage. -> User is using the simulator to test it's allocator of VMs. *** Planned Work - [X] Install SimGrid from source - [X] Autoconnect #simgrid on irc.oftc.net - [X] Read tutorial [[http://simgrid.gforge.inria.fr/documentation.php]] - [X] Go through tutorial [[http://simgrid.gforge.inria.fr/simgrid/3.13/doc/tutorial.html]] - [X] See concurrent tools like DCsim and GridSim. Pay attention to VM charges. ** Week 2 <2016-05-23 Mon>--<2016-05-27 Fri> *** Things Done - DCsim's code. There is InteractiveTasks which might correspond to elastic tasks. It consists of default and max number of instances, resource size, normal service time, and visit ratio. I guess if the ratio changes over time the task become elastic. - CloudSim's code. There is HostDynamicWorkload which might correspond to elastic taks. List of processing elements... Meh, looks like it's just for keeping up to date with perfomance degradation of the VM. - Took a look at [[IS_p][ISim's paper]] because it was speaking of dynamic workload. But it is a meta-scheduler and it performs workload consolidation for power management. In the end I think it has nothing to do with what I looking for. - Contribution proposal 1. Elastic task is like a server's requests log. The parts that aren't over 100% of usage are reduced as one task. And we deal with the other parts. Cons: long non excessive part translated into one task can lose a lot information (lot of usage on a short time can have effect on bandwidth usage for example?); if there is lot of peaks over the limit then there is a lot to deal with if it goes down between each peak. Maybe maths could help having a smarter decomposition. - Contribution proposal 2. Like in DCsim a task is triggerred/visited regularly and to simulate the elasticity the ratio of visit has to be changed. Pros: the precision of the simulation depends on the precision of ratio changes given by the user, thus performances depend on the user (avoiding responsibilities ¯\_(ツ)_/¯); convenient for the user. - Contribution proposal 3~. If we consider that elastic tasks never really end, we could play with the resources of the VMs on which it is executed and the task would use it fully. I guess that would be a way of doing proposal 2. Cons: playing with resources induce not simulating the real world and make falsifying the results because resources management has a huge impact on other stuff. - Contribution proposal 4~. Generating function or history {date; value}*. - Read [[http://ac.els-cdn.com/S1569190X1300124X/1-s2.0-S1569190X1300124X-main.pdf?_tid=0ede5a0c-2351-11e6-826f-00000aacb362&acdnat=1464274353_4043525da0d2e6c2cb9432f0a6955443][DCworms' paper]]. Simulation to study the energy-consumption of datacenters. Part of CoolEmAll project. Broad range of tools. What's interesting for me is that it uses workflows to model workloads. But I think it's focusing on a model that allow better energy consumption analyzing. Globally it is very focused on having control on everything to get a precise evaluation of the energy consumption. - Explored wikibench.eu. Master thesis for large scale benchmark. Real traces from wikipedia with tools to reduce the intensity for example whilst keeping interesting properties. People like Guillaume Pierre are using it to evaluate autoscaling. More generally all work on cloud and application management can be evaluated with it. - Wrote some sort of scenario file for proposal 1 and 2. Needs more work to have correct C code. There is no task duration because I don't feel it's natural for a dynamic task to have a predetermined duration. I guess the user will have to kill it or reduce the visit ratio to 0. Still need some work to have satisfying description of the visits ratio fluctuations for proposal 2. And the base example chosen (cloud-two-tasks) might not be the best because the two tasks aren't concurrents and have to be killed before starting another one. - Criteria of quality for proposals. + Complexity for the user: describing elastic tasks just be at least familiar to normal tasks. + Size on disc/in memory: real traces take a lot of space so the description of fluctuations for an elastic task just be lighter. + Computing speed: elastic tasks should be able to be precise enough to avoid wrong simulations but without taking too much longer than current perfomances. + Expressiveness: expressing elastic tasks should be natural and close to setting up real dynamic cloud tasks. + Implementable in SimGrid: avoiding massive refactoring and using current code would be appreciated. - Searched who cited DCsim. [[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6380049][One]] paper was about comparing algorithms, [[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6572981][another]] about switching strategies at runtime. They both seem to give details even if the code isn't available. Well I have no idea how this could by useful as they are describing experiments that have nothing to do with elastic tasks. - While trying to write an introduction I think I wrote some sort of abstract. Well I guess I'll just have to fill-in to get a proper introduction. *** Blocking Points - [[https://books.google.fr/books?id=io6aBQAAQBAJ&pg=PA92&lpg=PA92&dq=cloud+simulation+dynamic+workload&source=bl&ots=HkoqPCSnzM&sig=Ko-BHh-jMjx_6IDhE67RnTHW3h4&hl=en&sa=X&ved=0ahUKEwih0d65lPDMAhVrB8AKHW0EBVwQ6AEIMjAC#v=onepage&q=cloud%20simulation%20dynamic%20workload&f=false][This paper]] says that [[http://www.ijsr.net/archive/v2i8/MTIwMTMxMjA=.pdf][this paper]] presents an approach at modeling dynamic workloads in CloudSim but I didn't understand why. - Can't seem to find stuff about dynamic tasks/workload, only stuff like dynamic resource allocation. - Haven't really found what injection is in NS-3. - People have dealt without elastic tasks just fine. Is it really useful ? Can't find stuff about it so I guess it's hard to find potential users and their needs. *** Planned Work - [X] Find other simulators. (e.g. survey cloud simulators). - [X] See concurrent tools like DCsim and GridSim. Pay attention to varying workload. Read doc and source. When reading articles, summarize it. - [ ] Connect to iwifi-interne. - [ ] Write introduction. - [X] Explain why DCworms isn't that useful. - [X] Discover [[http://www.wikibench.eu/]]. What is it ? Who's using it ? - [X] Write a formal scenario file that uses the proposals. - [X] Find criteria to quantify the quality of the proposals. (e.g. complexity for the user; size on disc/in memory; computing speed; expressiveness; implementable in SimGrid) - [X] Bibliography, which paper use DCsim, CloudSim, SimWare... Bibliography, find some papers of (potential) users that describe their setup. - [ ] See workload injection (injecteurs de charge) in NS-3. Should be similar to what we're trying to do. - [ ] Think about application workflows and interactions between interdependent (micro)(elastic)tasks. ** Week 3 <2016-06-06 Mon>--<2016-06-03 Fri> *** Things Done - Copied papers description in bibliography section. - Took a look at [[FPK14]] and it does its evaluation on real infrastructures with wikibench. Lame? Same for [[HW13]] and [[MVD12]]. - Partly read [[NGS15]] and [[ASPLOS12]]. As DejaVu clusters workloads into classes, the proposal 2 (visit ratio) might be more convenient to study its reaction/adaptation (I'm assuming that the clustering doesn't have problems). *** Blocking Points - Still have a hard time figuring out what potential users would prefer for the API. - Can a task know by itself when to update its visit ratio ? *** Planned Work - [X] More detailed entries for papers read. Abstract (1 sentence, objectives), link with my work, pros (what I'd like to reuse and what's worrying), cons (what I should say in my article). For the papers' names use the writers' names fist letters or the name of the conference. - [X] Put the papers descriptions in the bibliography section (write it like a related work section). - [X] Write a scenario file (needs description). Put it in the contribution section. - [X] Search for potential users through wikibench citations. - [ ] See load injectors of NS-3 because it's similar to what we're trying to do. - [ ] See papers "multi-tiers applications" in [[<>][this.]] - [X] Organize bibliography with categories. - [ ] Propose clearer formulation of the elastic tasks API. ** Week 4 <2016-06-06 Mon>--<2016-06-10 Fri> *** Things Done - Worked on writing ElasticTask.hpp with the declaration of the class ElasticTask and an example of its use. - [[https://github.com/sbihel/simgrid-1][Forked SimGrid.]] Started integrating Elastictask in s4u but that might change later to become a plugin. - Examples of internship reports (bests from last year at ENS Rennes): [[http://perso.eleves.ens-rennes.fr/people/Timothee.Haudebourg/public/work/ecofen.pdf]], [[http://perso.eleves.ens-rennes.fr/people/Alexandre.Debant/work/rapport_stage_l3.pdf]], [[http://perso.eleves.ens-rennes.fr/people/Dominique.Barbe/derivationAI_long.pdf]], [[http://perso.eleves.ens-rennes.fr/people/Raphael.Berthon/docs/Berthon_Internship_2015.pdf]]. - What work is left to do compared to others? A friendly approach to the problem. A more developed analysis of the state of the art. More meaningful purpose of the work. *** Blocking Points *** Planned Work - [X] .hpp of elastic task (API proposition). - [X] Read the survey in detail to avoid missing uses/POVs of clouds. - [ ] Develop the idea of resizing VMs for another POV of clouds (where you search to lower price of overcost of what you make available to users) - [X] Compared to good interns reports say what's left to do. ** Week 5 <2016-06-13 Mon>--<2016-06-17 Fri> *** Things Done - Filled the holes in the code. - Worked on background and state of the art. - Meeting notes State of the art is about models used Don't write sentences, use itemize The contribution is a model See the article modeling workload spikes cause they do what we want Use set/getData(), attach data to actor (data examples: ) ElasticTask should be call ElasticTaskManager MSG_task can't be create once and executed multiple times -> give what's needed to create the tasks - Meeting notes The ETM is global and ET changes the datas of the ETM and when it wakes up it look what it has to do. - Meeting notes Wake up using samephor timeandwait execute(flops) for each micro task no tasks just nextEventQueue when a microtask is executed and you add another execute_init() execute_start() - Meeting notes write what I understood of the modeling spikes paper, look what proba law they use use class instead structs which parts of the API that answers to applications of the survey think of examples *** Blocking Points *** Planned Work - [X] <2016-06-13 Mon 17:00> Compared to good interns reports say what's left to do. - [X] Setup your own project ; don't touch pimpl_ just use regular msg tasks - [X] <2016-06-15 Wed 09:00> Write background and state of the art using the survey. (Explain what information there is in it, how the studies are classified, the good ideas, its limits...) - [X] Read the paper on modeling workload spikes. - [X] Work on the code - [ ] Which part of the survey is covered by the API, which might in the future and which won't. ** Week 6 <2016-06-20 Mon>--<2016-06-24 Fri> *** Things Done - If we try to simulate the workload generator of <>. Normally we have each client thread that execute a request in a loop. Each thread selects a requests type, selects parameters, sends the requests, waits for a response and repeats. If we had to translate it we'd need to create a task that trigger one time the ET when it is finished. As request have parameters I guess we would need one ET for each request and parameters, then clients trigger one of them. We don't use the repeating triggering (ratio stuff) here. - If we try to simulate DejaVu <>. "Both traces contain measurements at 1-hour increments during one week, aggregated over thousands of servers.". For each kind of request there would be an ET and we would put a constant triggering over 1 hour and change the ratio each hour. - Attended "Journées scientifiques". "Vérfier et corriger les logiciels", "Modélisation pour la biologie et la médecine" and "Vers une informatique ouverte et reproductible". - A lot of papers use RUBiS. It's an auction website benchmark. Three kind of users session : visitor, buyer and seller. We could have juste 3 ET with maybe complex microtasks as users can see bids and bid themselves. - A lot of papers use YCSB. "Each workload represents a particular mix of read/write operations, data sizes, request distributions, and so on, and can be used to evaluate systems at one particular point in the performance space." Four different kind of ET and it choses one random each time, so if we compute proactively the number of time operations will be chosen we could use the repeating characterisitc of ETs. One thing, as there are multiple records to read/write we would have more than 4 ETs. We would need more than computing tasks as reading records can vary depending on the writes. - Meeting notes Ask if there is a detach for microtasks, Still a while(1) and use a semaphor_acquire talk with gabriel - Probable don't need ETM to be an Actor - Meeting notes Use futures to do the microtasks Still a msg_task in the end, future is controling the execution *** Blocking Points - Segfault when calling etm->run(); *** Planned Work - [ ] Write examples. - [ ] Write correct execute code for ETM. ** Week 7 <2016-06-27 Mon>--<2016-07-01 Fri> *** Things Done - Meeting notes Examples and see the time to do it and the load which is equivalent to a paper (run multiple experiments by increasing the rate/number of ET and see the overall time). One figure that shows the number of microtasks over the time (the little boxes with the start and end...). - Experiments planning DejaVu. "Two servers. Intel SR1560 Series rack servers with Intel Xeon X5472 processors (eight cores at 3 GHz), 8 GB of DRAM, and 6 MB of L2 cache per every two cores. EC2 cluster of 20 virtual machines. To demonstrate DejaVu’s ability to scale out, we vary the number of active instances from 2 to 10 as the workload intensity changes, but resort only to EC2’s large instance type. In contrast, we demonstrate its ability to scale up by varying the instance type from large to extra-large, while keeping the number of active instances constant." Multi-tier apps, serving static and dynamic content, DB interractions... Evaluate time for increasing ETs rate. + Platform similar to DejaVu. - Meeting notes 2Gflops for one host seems pretty standart multiple hosts, one cluster try to imitate the papers that use simulate One experiment to show performances Show with experiments which (interisting) studies it allows One slide : what they wanted to do, how do we do it in simgrid 4 subsection in experimentation : performances, functionnalities (one for each paper) see cfg=tracing, should be autocmatic for performances do n hosts and n ET (one for each) 5 pages of what people do, why they need to evaluate using my work - 5 papers using simulation: OMNeT++, home-made python discret event simulator that models a service deployed in the cloud (WC98 traces), ??, SPECjEnterprise2010 (WC98 traces), SPECjEnterprise2010 (WC98 traces) 5 papers using simulation: generic??, vertical scaling, vertical scaling, generic??, vertical scaling?? - Meeting note ET should be able to able to take a file text of timestamps from WC98 do one ET with an average flops and use ^ keep the file opened and add tasks over time (like add a new parameter like repeating) translate WC98 to a timestamp (one timestamp per line) use XBT::translateinteger 3 types of experiments: functionnalities, traces, perf add deadline with outputFunction add probalistic law - Meeting notes use deployment file put scripts in begin{example} in reporting.org, so that it can be executed with C-c C-c look [[https://github.com/taisbellini/aiyra/blob/master/LabBook.org]] - Experiments of the weekend + 0.80s user time to execute the 1000 requests test log file of WC98, with one ET and 10 hosts (18+ seconds from real traces) and ~14000 MB max memory Tried the day20 of WC98 with ~2million requests with 2000 hosts but after a few hours it bricked my macbook and it restarted. + 2nd experiment for raw perfs, - Lost a day figuring out my queue has the biggest element on top instead of the lowest FeelsGoodMan #+BEGIN_SRC cpp #include #include "simgrid/s4u.h" #include "ElasticTask.hpp" #include "simgrid/msg.h" XBT_LOG_NEW_DEFAULT_CATEGORY(s4u_test, "a sample log category"); void eve(std::shared_ptr etm, double loadIncrease) { XBT_INFO("Starting"); simgrid::s4u::ElasticTask *e1 = new simgrid::s4u::ElasticTask(simgrid::s4u::Host::by_name("cb1-2"), 5.0, 0.0, etm.get()); simgrid::s4u::ElasticTask *e2 = new simgrid::s4u::ElasticTask(simgrid::s4u::Host::by_name("cb1-3"), 5.0, 0.0, etm.get()); e1->setOutputFunction([e2]() { e2->triggerOneTime(1.5); }); simgrid::s4u::ElasticTask *e3 = new simgrid::s4u::ElasticTask(simgrid::s4u::Host::by_name("cb1-4"), 5.0, 0.0, etm.get()); for(int i = 5; i < 20; i++) { e3->addHost(simgrid::s4u::Host::by_name("cb1-" + std::to_string(i))); } e3->setTimestampsFile("d81_timestamp_wc.txt"); simgrid::s4u::this_actor::sleep(99999999); etm->kill(); XBT_INFO("Done."); } int main(int argc, char **argv) { simgrid::s4u::Engine *e = new simgrid::s4u::Engine(&argc, argv); std::shared_ptr etm = std::make_shared(); e->loadPlatform("dejavu_platform.xml"); simgrid::s4u::Actor("ETM", simgrid::s4u::Host::by_name("cb1-1"), [etm] { etm->run(); }); simgrid::s4u::Actor("main", simgrid::s4u::Host::by_name("cb1-1"), [etm] { eve(etm, 1.0); }); e->run(); return 0; } #+END_SRC #+BEGIN_SRC cpp #include #include "simgrid/s4u.h" #include "ElasticTask.hpp" #include "simgrid/msg.h" XBT_LOG_NEW_DEFAULT_CATEGORY(s4u_test, "a sample log category"); void eve(std::shared_ptr etm, int n) { XBT_INFO("Starting"); simgrid::s4u::ElasticTask *ets[n]; for(int i = 0; i < n; i++) { ets[i] = new simgrid::s4u::ElasticTask(simgrid::s4u::Host::by_name("cb1-" + std::to_string(i+1)), 5.0, 1.0, etm.get()); } simgrid::s4u::this_actor::sleep(100); etm->kill(); XBT_INFO("Done."); } int main(int argc, char **argv) { int argcE = 1; simgrid::s4u::Engine *e = new simgrid::s4u::Engine(&argcE, argv); std::shared_ptr etm = std::make_shared(); e->loadPlatform("dejavu_platform.xml"); simgrid::s4u::Actor("ETM", simgrid::s4u::Host::by_name("cb1-1"), [etm] { etm->run(); }); simgrid::s4u::Actor("main", simgrid::s4u::Host::by_name("cb1-1"), [etm, argv] { eve(etm, std::stoi(argv[1])); }); e->run(); return 0; } #+END_SRC #+BEGIN_SRC #+END_SRC *** Blocking Points *** Planned Work - [ ] Write what I'm planning to do with the expreriments, what I wanna show... - [ ] Write what I'm planning to say in my final report. ** Week 8 <2016-07-04 Mon>--<2016-07-08 Fri> *** Things Done - I upgraded the platform twice and it's visible in the memory usage *** Blocking Points *** Planned Work ** Week 9 <2016-07-11 Mon>--<2016-07-13 Wed> *** Things Done - Meeting notes - Use LNCS latex template + Look back at tables to shows what papers need and what I can do - Paper notes: NP -> theory isn't enough as there are too much NP problems - Only me as author and put advisors in acknoledgement - Before clouds people runned their own stuff, now with pay as you go for some parts - When saying that clouds are complex, say that so many problems are NP complete so theory isn't enough + using simulations that are simple, reproductible, simplistic - at the end of the introduction put an itemize to say what the main contributions are (tell simgrid): modelization of workload, implementation of an API, evaluation + the contribution is about unraveling concepts about cloud workload + main contribution of the survey: categorizing works + the problem I solve is answering a need - Put state of the art at the end and name it Related Works + Say what a typicial simulation is: resources, algorithms used, what they are evaluating, scenario + talk about simgrid in background + on start of contribution tell what is the characterization of a workload (what is a task/gridlet/cloudlet (a constant computing load) which doesn't match a fluctuating reactive cloud workload) we're proposing a discrete representation of a continous event (one argument is that a simulator is discrete, it's easier) + link the models to elements of the contribution (an ET represents a flux) + instead of floprate we used taskrates to be generic (make sure to define atomic/classic tasks) + from this, elastic actions types come naturally... list all actions possible (and using tables makes it easier) one paragraph for each - Change "Nb of ET" to "number of elastic task" - Say the total number of microtasks - Do another experiment with only one ET and an increasing rate - Add conclusion to experiments - Use table for real traces instead of tons of numbers and use same plan for other experiments + Use table for C to show which elastic actions are possible and which papers use them - Meeting notes + to detect threshold we would need to set an alarm and be killed if we don't kill it before (that would be online, offline is easy we juste compute the time used) (the online way we can talk about it in conclusion as we didn't do it) (the callback is defined by the user) + say in introduction why I did this internship say in conclusion what I learned + conclusion: conclusion of what was done, what has still to be done, what I learned (what was a simulator, what was a research code, life in research environment) - Experiment with only one ET and growing number of triggers per second #+BEGIN_SRC #include #include "simgrid/s4u.h" #include "ElasticTask.hpp" #include "simgrid/msg.h" XBT_LOG_NEW_DEFAULT_CATEGORY(s4u_test, "a sample log category"); void eve(std::shared_ptr etm, int n) { XBT_INFO("Starting"); simgrid::s4u::ElasticTask *e3 = new simgrid::s4u::ElasticTask(simgrid::s4u::Host::by_name("cb1-2"), 1.0, n, etm.get()); for(int i = 3; i < 200; i++) { e3->addHost(simgrid::s4u::Host::by_name("cb1-" + std::to_string(i))); } simgrid::s4u::this_actor::sleep(100); etm->kill(); XBT_INFO("Done."); } int main(int argc, char **argv) { int argcE = 1; simgrid::s4u::Engine *e = new simgrid::s4u::Engine(&argcE, argv); std::shared_ptr etm = std::make_shared(); e->loadPlatform("dejavu_platform.xml"); simgrid::s4u::Actor::createActor("ETM", simgrid::s4u::Host::by_name("cb1-1"), [etm] { etm->run(); }); simgrid::s4u::Actor::createActor("main", simgrid::s4u::Host::by_name("cb1-1"), [etm, argv] { eve(etm, std::stoi(argv[1])); }); e->run(); return 0; } #+END_SRC - Meeting notes increase figures slides First small sentence followed by a really long one -> not good *** Blocking Points *** Planned Work - [ ] Give back keys and pass and ethernet adapter. * Conclusion