OptSCORE
Local project leader
Prof. Hans P. Reiser
Research team members
Johannes Köstler
Project partners
- Institute of Distributed Systems, Ulm University (Prof. Franz J. Hauck)
Summary
Networked IT systems today have high demands on reliability, availability and security. Replication of services is a fundamental mechanism to meet these requirements. In order to achieve scalability at the same time, many approaches exist, especially with regard to storage services which require weak consistency requirements however stronger consistency guarantee for e.g. for replicated data. This applies, for example, to coordination services such as ZooKeeper, HDFS's nameode, or identity management services. If Byzantine fault models are used, weak consistency models are also unsuitable, since for consistency reasons divergent values are not distinguishable from faulty ones. This project is therefore focused on replication procedures suitable for services with strong consistency requirements and for Byzantine fault models. State-machine replication (SMR) is an established method for replication. In this approach a distributed agreement or a totally ordered multicast and a deterministic execution of all activities are used. These mechanisms are very complex and open up a wide range of configurable parameters ranging from selecting different protocols to setting timeout values. Today the deterministic execution is usually achieved by sequential processing of the requests, which is unacceptable with the increasing distribution of multi-core systems.
In practice, the behavior of a system depends, among other things, on the communication latencies, the network throughput, the frequency of errors, the number of parallel CPUs, and the internal concurrency of the application. The overall goal of this project is to explore dynamically adaptable algorithms for group communication and deterministic multithreading as well as strategies for self-configuration and self-optimization for both sub-aspects. As results, we expect elementary knowledge about the relationships between environmental conditions, application behavior and configuration parameters, or the various algorithms that can be used.
A prototype implementation for a reconfigurable and self-adapting group communication system as well as for a self-optimizing deterministic scheduler is to be designed and integrated into a framework for replicated services with which finally practical evaluations are possible. We expect self-adapting systems to behave better than rigidly configured or non-configurable systems. The project is thus a basic contribution to ultimately lead SMR-based systems closer to practice.
Funding
Deutsche Forschungsgemeinschaft
2021
DOI: 10.1145/3493499.3493501
https://doi.org/10.1145/3493499.3493501
2020
DOI: 10.1109/TDSC.2020.3030605
2019
2018
2016
Talks and other publications
Johannes Köstler, Hans P. Reiser |
Johannes Köstler, Hans P. Reiser PEDSEWAN: Platform for the Evaluation of Distributed Systems in Emulated Wide-Area Networks Presentation, Frühjahrstreffen der GI-Fachgruppe Betriebssysteme, Graz, Feb. 2016 |