Details

DFG-Funded Project TYPES4STRINGS: Searching for Errors in Character Strings

A team from the University of Passau, led by Professor Gordon Fraser, is developing tools for checking strings in programming languages. String-related errors are to be detected during testing and in the production of programs with, among other things, the help of generative language models.

‘Strings are the Swiss army knife of data structures,’ says Professor Gordon Fraser, holder of the Chair of Software Engineering II at the University of Passau. Programs use strings to represent all kinds of textual data, including names, credit card numbers, email addresses, URLs, bank accounts, colour codes and much more.

A team from the University of Passau, led by Professor Fraser, is now using these specific properties for new ways of searching for software errors related to strings: In cooperation with the Technical University of Vienna and the Helmholtz Centre for Information Security (CISPA), the researchers in the DFG-funded project TYPES4STRINGS are developing a method for developing string types that can check string results for lexical, syntactic and semantic correctness.

The background of the project: ‘Programming languages offer little support for checking whether the contents of these strings actually meet expectations’, says Professor Fraser. Most programming languages and type systems leave such checks to the operating systems, if at all. Consequently, a string that should contain a phone number could suddenly contain an SQL query; a string that should contain a colour could instead contain executable JavaScript code – with all the resulting problems. This can lead to functional errors or frequent attacks such as script or SQL injections.

Massive automated testing of string results

The TYPES4STRINGS project therefore aims to introduce string types in order to express the valid values of strings in an understandable way in formal languages, such as regular expressions and grammars. This should enable procedures to determine which sets of strings are acceptable as values and to check, dynamically and statically, whether the programme is correct with regard to the specified string types. Furthermore, the comprehensible representation is also to be created automatically for given programs, partially using generative language models.

The use of formal languages makes it possible to generate instances of strings for different string types automatically. This in turn allows for massive automated testing of string processing functions. Here, string types in turn check string results for lexical, syntactic and semantic correctness. Finally, means will be introduced to learn such specifications from the code and its executions, so that string types can be easily introduced.

In addition to the University of Passau, Professor Jürgen Cito from the Technical University of Vienna and Professor Andreas Zeller from the CISPA Helmholtz Center for Information Security are also involved in TYPES4STRINGS. The German Research Foundation (DFG) is funding the project over a period of three years.

This text was machine-translated from German.

Principal Investigator(s) at the University	Prof. Dr. Gordon Fraser (Lehrstuhl für Software Engineering II)
Project period	01.09.2024 - 31.08.2027
Source of funding	DFG - Deutsche Forschungsgemeinschaft > DFG - Sachbeihilfe