Resource-Efficient Distributed Computer Systems for Data-Intensive Applications
I research adaptive resource management methods with the aim of making it easier to run distributed data-intensive applications (e.g. data analytics, scientific workflows, AI/ML) more efficiently on computing infrastructures ranging from resource-constrained devices to large-scale clusters. I enjoy working with collaborators and partners as well as taking an iterative systems research approach.
Most organizations today make use of data-driven and AI-enabled applications, be it to search through billions of websites, to recommend songs or TV shows to millions of users, to accurately identify genetic disorders by comparing terabytes of human genomic data, to monitor current environment conditions in urban areas using large sensor networks, or to detect fraudulent behavior in millions of business transactions. For this, businesses, research institutions, municipalities, and other large and small organizations employ scalable, fault-tolerant, distributed systems (e.g. for data analytics and AI/ML) and distributed computing infrastructures.
The distributed computing infrastructures used for data-intensive applications are becoming increasingly diverse, distributed, and dynamic. Beyond the data center, there are edge and fog resources as well as IoT devices. This enables to run applications closer to data sources and users, allowing for lower latencies, improved security and privacy, and reduced energy consumption for wide-area networking, but also creates distinctively heterogeneous new computing environments. At the same time, cloud data centers are becoming more diverse as well. In public clouds, users can choose among hundreds of different virtual machines, including instances optimized for compute, memory, and storage or for accelerated computing. Similarly, dedicated compute infrastructures at larger organizations are becoming more heterogeneous. Scientists at universities, for instance, often have access to several clusters, each potentially again with multiple different types of machines, such as machines equipped with large amounts of memory or graphics processors.
It remains very difficult to run scalable distributed systems on today’s diverse and dynamic distributed computing environments, so that applications provide the required performance and dependability yet also run efficiently. Even for expert users, configuring distributed systems to run as required on a particular computing infrastructure is often not straightforward, since anticipating the behavior of distributed systems on specific infrastructures is inherently difficult. It depends on many factors, there are usually numerous options to configure and large parameter spaces, and environments and workloads often change dynamically over time.
As a result, data-intensive distributed applications deployed in practice commonly suffer from low resource utilization, severe failures, and limited energy efficiency. Meanwhile, computing's environmental footprint already rivals aviation's and is projected to continue to rise sharply over the next years and decades. So, as an increasing number of businesses, scientific organizations, municipalities, and government bodies develop and deploy data- and AI-driven applications, it is critical, both economically and environmentally, that computing infrastructures are used efficiently.
The main objective of my work is to support organizations and users in making efficient and dependable use of distributed computing infrastructures for their data-driven and AI-enabled applications. Towards this goal, I work with doctoral/postdoc researchers, collaborators, and partners to develop novel methods and useful tools that make the design, implementation, testing, and operation of resource-efficient and resilient distributed systems easier. Ultimately, we aim to realize systems that adapt to diverse computing infrastructures, dynamically changing workloads, and high-level performance and dependability requirements automatically as well as autonomously.
In line with this, we are working towards a more adaptive resource management for data-intensive applications running on distributed computing infrastructures – from small IoT devices to large clusters of virtual resources – following three strategies:
Central methods for realizing an adaptive and data-driven resource management according to high-level objectives and constraints are techniques that enable an effective modeling and optimization of the performance, dependability, and efficiency of distributed data-intensive applications. Additionally, we investigate methods for automated and efficient monitoring, profiling, and experimentation (e.g. using sampling, hybrid testbeds, and simulations).
Systems research: I mostly do empirical systems research. Therefore, together with doctoral/postdoc researchers, collaborators, and partners, I evaluate new ideas by implementing them prototypically in context of relevant open-source systems (such as Kubernetes, Flink, Spark, Nextflow, Flower, and FreeRTOS) and by conducting experiments on actual hardware, with authentic applications, and real-world data. For this, we use state-of-the-art infrastructures, including diverse cluster infrastructures, private and public clouds, as well as IoT devices and sensors.
Iterative approach: I am convinced that iterative processes and short feedback cycles are vital for research. We, consequently, implement a multi-staged approach to research: We first present new ideas in focused workshops and work-in-progress tracks of conferences, then submit rigorously evaluated work to the main tracks of renowned international conferences, and finally, where sensible, compile more extensive findings for publication in reputable journals. At the same time, I am convinced that it is essential to also be involved in interdisciplinary, applied, joint research projects with partners from both the private and public sectors. Such projects, in my view, offer unique opportunities to directly experience relevant problems first-hand and, thereby, allow us to uncover new avenues for well-motivated, impactful foundational research.
Research teams and training: I place great value on building a close-knit, cohesive research team with a shared, specific agenda around me, so that there are ample opportunities to involve multiple perspectives, learn from each other, get feedback, and iterate. To support this, I facilitate a variety of opportunities to connect (e.g. lab meetings, lunches, coffee breaks, seminars, away days, retreats), and I make a concerted effort to integrate new starts into my team and the institutional environment. Similarly, I carefully explain the changes I am suggesting and repeatedly provide detailed feedback on paper drafts. In addition, I organize opportunities for my PhD students to collaborate with other groups, as well as research seminars.
Open exchange: Believing in discourse and feedback, I frequently collaborate with researchers from other academic institutions and with industry. I also actively serve the international scientific community by co-organizing international conferences and workshops and by reviewing research manuscripts and grant applications. Moreover, we make our results available as widely and quickly as possible via open-access publications, open-source software, and research-led teaching.