Projects
SENLP - Software Engineering knowledge of NLP models
The transformer architecture has changed the field of Natural Language Processing (NLP) and paved the way for models such as BERT and GPT. These models have in common that they use transfer learning in the form of pre-training to learn a general representation of language, which can then be fine-tuned or prompted to perform various downstream tasks. While these models achieve remarkable results in a variety of NLP tasks, it is often unclear why they perform well on specific tasks and how well they work in different domains, such as Software Engineering (SE). Within our prior work, we looked at the impact of domain-specific pre-training on NLP tasks within the SE domain and we found that for polysemous words like "bug" (insect vs. defect) or "root" (plant vs. User), the domain-specific pre-training did help with the understanding of the meaning in the SE domain and that this also led to better performance in domain-specific downstream task. Within this project, we want to deepen our understanding of the capability of NLP models to capture concepts from the SE domain, with a focus on SE definitions and commonsense knowledge. We will use the analogy of NLP models as students to understand how they would perform in SE exams. For example, we will test if the NLP models contain accurate SE definitions and terminology: can the NLP models spot the correct definition of a term in a multiple-choice test, can they generate accurate definitions given a prompt, are they able to understand if definitions are synonyms, and can they differentiate between similar concepts with important differences and, given a prompt, even explain the small differences. A known limitation of large language models for the general domain is that they always answer, even if you give them inputs based on wrong assumptions. We will try to understand if we find similar aspects for the SE domain, e.g., by looking at how models react on prompts asking them which tools can be used to execute automated manual tests or what the best object-oriented design patterns for Haskell are. Through our work, we not only try to identify if we get nonsense responses, but also if we can find methods to infer that generated responses are nonsense, as is possible in the general domain. Additionally, we study the above aspects for different types of models: smaller models with an encoder-only transformer architecture (e.g., BERT), larger encoder-only models (e.g., RoBERTa), models with variations of the transformer architecture to allow for longer contexts (e.g., Big Bird), GPT-style decoder-only models (e.g., GPT-NeoX), and encoder-decoder models (e.g., T5). We will consider both SE-specific pre-training, as well as models trained on general domain data. Since some general domain models were already pre-trained with a corpus that included SE data this also allows us to understand if SE knowledge is sufficiently captured if this is only a smaller part of a very large data set.
Deutsche Forschungsgemeinschaft DFG - Projektnumber 524228075
Maturity from 2024
Externally Funded Projects
KI4ALL – Interdisziplinary teaching of data-centric method and application competences (2021-ongoing)
Artificial intelligence (AI) is everywhere and quickly becoming a standard method for solving problems with data. Within this project, we develop new and enhance existing teaching material for AI for everyone to use. We follow an interdisziplinary approach, in which we also produce materials for teaching AI in different domains, e.g. health care, transport, or engineering.
This project is a joint project of the TU Clausthal, the TU Braunschweig, and the Ostfalia HAWK, in which we are involved as an external partner.
Funded by the BMBF for four years
DEFECTS – Comparable and Externally Valid Software Defect Prediction (2018-2022)
The comparability and reproducibility of empirical software engineering research is, for the most part, an open problem. This statement holds true for the field of software defect prediction. Within this project, we create a solid foundation for comparable and externally valid defect prediction research. Our approach rests on three pillars. The first pillar is the quality of the data we use for defect prediction experiments. The current studies on data quality do not cover the impact of mislabeled data. The second pillar is the replication of the current state of the Art. The third pillar are guidelines for defect prediction research. In case we cannot get researchers to avoid anti-patterns that led to bad validity of results, our efforts to combat the replication crisis of defect prediction research will only have a short-term effect. To make our results sustainable, we will work together with the defect prediction community to define guidelines that allow researchers to conduct their defect prediction experiments in such a way that we hopefully never face such problems with replicability again.
GAIUS – Maintenance activities for the sustainability of AUGUSTUS (2018-2022)
AUGUSTUS is a tool for the structural annotation of genes in genomic sequences. Within this joint project with Prof. Dr. Mario Stanke from the University of Greifswald, we will work on the maintenance of AUGUSTUS. While the prediction methods of AUGUSTUS were advanced over the years, the maintainability and sustainability. This is highlighted by usability issues, but also general issues within the codebase. This project will focus solely on the maintenance of AUGUSTUS to improve upon this, i.e., improve the usability of AUGUSTUS, as well as the maintainability of the codebase.
Pilot Study: Defect Prediction at Continental (April 2017-December 2017)
SmartSHARK is a versatile tool for software repository mining. Within this project, we performed a pilot study in cooperation with the Continental GmbH to assess potential benefits of using SmartSHARK for defect prediction of C and C++ software developed in-house at Continental.