SHADI IBRAHIM

Tenured Inria Research Scientist

Inria Rennes – Bretagne Atlantique Research Center

Contact details
Inria Rennes Bretagne – Atlantique
Campus Universitaire de Beaulieu
35042, Rennes

Rennes
Phone : +33 (0) 2 99 84 25 34
Fax : +33 (0) 2 99 84 71 71
Email: shadi DOT ibrahim AT inria DOT fr

Inria

 

Job Openings [Master Internships]

Several Master internships Funded by the ANR KerStream project and the Apollo Connect Talent project are available. Please feel free to email me your resume.


Master Internship [Container image management in the Fog]

Advisors: Shadi Ibrahim (STACK team), Jad Jad Darrous (Avalon team)

Main contacts: shadi.ibrahim (at) inria.fr

Application deadline: as early as possible

Location: Inria, Nantes

Containers are Operating System virtualization technology that has been widely spread in industry after the launch of Docker project [1]. Large-scale container deployment can be found in big cloud data centers alongside with traditional Virtual Machines (VMs) [2]. However, containers are more lightweight in terms of provisioning time and image size compared to VMs. Therefore, containers are emerging as key technology to facilitate the deployment of Fog/Edge computing [3, 4].

This internship will investigate possible solutions for container image management in Fog/Edge environment. The proposed solution should leverage the specificity of container images (i.e., images are represented as layers) and take into account the constraints imposed by the Fog/Edge environment (e.g., low network quality, limited storage) to achieve efficient container provisioning.

Experimental evaluation of the proposed solution will be performed on top of Gid’5000 [5]. Thus, good programming skills in addition to good problem analysis and critical thinking is required. Depending on the progress and results, this internship could lead to a publication.

[1] https://www.docker.com

[2] Improving Docker Registry Design Based on Production Workload Analysis, FAST, 2018

[3] Evaluation of docker as edge computing platform, ICOS 2015

[4] Feasibility of fog computing deployment based on docker containerization over raspberrypi

[5] http://grid5000.fr


Master Internship [On the efficiency of Stragglers mitigation for Big Data applications]

Advisors: Shadi Ibrahim, Thomas Lambert (STACK team), Amelie Chi Zhou (ShenZhen University, China)

Main contacts: shadi.ibrahim (at) inria.fr

Application deadline: as early as possible

Location: Inria, Nantes

Many Big Data processing applications nowadays run on large-scale multi-tenant clusters. In such clusters, the heterogeneous hardware models and diverse application categories result in unavoidable performance variability. Despite that there have been many studies aiming to design improved task/job schedulers [1, 2], the un- expected long tails in job execution still exist and have become the norm rather than an exception [4]. These heavy-tailed tasks, i.e., stragglers, can severely prolong the job execution time [3]. More importantly, those heavy tails can result in high resource and energy consumption and hence adversely impact the resource utilization and energy efficiency of the cluster.

This internship will investigate how to efficiently mitigate stragglers in heterogeneous environments by (1) introducing an accurate detection mechanism that considers the network dynamics during the execution time of the big data application and (2) extend our heterogeneity-aware copy allocation scheduler [5] by enabling resource sharing in-between multiple users and thus improves the resource utlization of the cluster.

This work will be an opportunity to learn how to make experiments with the Hadoop/Spark systems for a rigorous evaluation of Hadoop/Spark in a large-scale environment. Depending on the completion and on the detection mechanism and the speculation-aware scheduler, this work could lead to the publication of a research article.

[1] Shadi Ibrahim, Hai Jin, Lu Lu, Bingsheng He, Gabirel Antoniu, and Song Wu. 2012. Maestro: Replica-Aware Map Scheduling for MapReduce. In CCGrid’12. 59–72.

[2] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In EuroSys ’10. 265–278.

[3] Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. 2016. Firmament: Fast, Centralized Cluster Scheduling at Scale. In OSDI’16. 99–115.

[4] Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In OSDI’08. 29–42.

[5] Amelie Chi Zhou, Tien-Dat Phan, Shadi Ibrahim, Bingsheng He, Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters . in ICPP 2018.


Master Internship [On mitigating data skew in Big Data applications under failures]

Advisors: Shadi Ibrahim, Thomas Lambert (STACK team), Jad Darrous (Avalon Team)

Main contacts: shadi.ibrahim (at) inria.fr

Application deadline: as early as possible

Location: Inria, Nantes

Failures are part of everyday life, especially in today’s datacenters, which comprise thousands of commodity hardware and software devices. For instance, Dean [1] reported that in the first year of the usage of a cluster at Google there were around a thousand individual machine failures and thousands of hard drive failures. Consequently, MapReduce was designed with hardware failures in mind. In particular, Hadoop handles machine failures (i.e., fail-stop failure) by re-executing all the tasks of the failed machines (i.e., executing recovery tasks), by leveraging data replication. This may introduce huge overhead, especially if the failure occurs after the map phase is completed as all map tasks need to be re-executed and data should be retransferred to the reduces [2].

There are few work on improving intermediate data availability (map outputs) by storing those data in HDFS with several replicas [3], however, these may come at the cost of high storage and network overhead. In this internship we want to investigate new techniques to improve intermediate data availability while ensuring load balance (mitigating data skew and hotspots [4]) when data are transferred to the reduce tasks.

This work will be an opportunity to learn how to make experiments with the Hadoop/Spark systems for a rigorous evaluation of Hadoop/Spark in a large-scale environment. Depending on the completion and on the detection mechanism and the speculation-aware scheduler, this work could lead to the publication of a research article.

[1] J. Dean,“Large-scale distributed systems at google: Current systems and future directions,” in Keynote speach at The 3rd ACM SIGOPS Interna- tional Workshop on Large Scale Distributed Systems and Middleware, Big Sky, MT, USA, 2009.

[2] F. Dinu and T. E. Ng, “Understanding the effects and implications of compute node related failures in hadoop,” in Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC’12), 2012, pp. 187–198.

[3] Ko, S.Y., Hoque, I., Cho, B. and Gupta, I., 2009, May. On Availability of Intermediate Data in Cloud Computations. In HotOS.

[4] Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B. and Qi, L., 2010, November. Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on (pp. 17-24). IEEE.


© Shadi Ibrahim 2018