Bachelor and Master thesis projects
We are always looking for computer science or physics students with experience in python as well as machine learning and ideally with first experiences with high-performance computing (HPC) for exciting and ambitious thesis projects.
Master thesis in computer science: Large language models for the automatic extraction of data from scientific literature
The goal of this project is to use state-of-the-art large language models (LLMs) to extract tabular data from scientific literature. Multiple models will be tested, fine-tuned and included in various extraction strategies in order to maximize extraction accuracy. A manually annotated dataset which was developed in cooperation with materials scientists at KIT will be used as a proof-of-principle test case. The downstream goal of this project is to extend existing databases used for the prediction of properties and synthesis conditions of so far unknown materials.
In your work, you deploy and fine-tune pretrained LLMs (e.g. for text generation or question answering) on HPC infrastructure (BWUniCluster and HoreKa at KIT), systematically develop extraction strategies, and apply them to our database of synthesis paragraphs. Your main task will be the development of a workflow for automatic extraction of experimental parameters such as temperature or solvents used in scientific experiments from literature. This will require testing and optimizing different prompts/questions and possibly also fine-tuning models on scientific literature or more specifically on the descriptions of experimental setups. The development of strategies to generate and incorporate synthetic data can be beneficial for the project.
Contact: tobias.schloeder∂kit.edu, pascal.friederich∂kit.edu
Master thesis in physics: Machine learning based analysis of powder X-ray diffraction patterns
In the context of state-of-the-art automated high-throughput experiments, automatic data analysis methods are becoming essential, since manual data analysis quickly becomes infeasible. The goal of this project is the automated analysis of powder X-ray diffraction patterns (pXRD) with neural networks. pXRD is a method for the structural resolution of crystals which is used in a large number of physics and materials science laboratories worldwide.
The work offers the possibility to develop state-of-the-art ML models such as ResNet on large synthetic data sets and to apply them to the automated analysis of experimental data. Furthermore, you will learn to work with high performance computing resources. Based on prior work by the AiMat laboratory on pXRD space group classification, you will be able to use and further develop an automated computational workflow for the simulation of diffractograms of synthetically generated crystal structures, which are used as training data for supervised learning of deep CNN models (e.g. ResNet).
Currently, there are several possibilities and ideas to go beyond state-of-the-art methods developed in our group and reported in literature. Details can be discussed individually adapted to your interests. Possible directions (but not all of them have to be worked on!):
-
ML Model development: Further development of ML models to predict lattice parameters and unit cell structure.
-
Transfer from synthetic to experimental data: Collecting and collating of experimental data for transfer and testing. Synthetic diffractograms must be adapted to match experimental data.
-
Multi-phase XRD: Distinguishing single-phase from multi-phase based on the symmetry elements. Development of appropriate synthetic data and ML models.
Contact: henrik.schopmans∂kit.edu, pascal.friederich∂kit.edu
Are you looking for a different topic?
We constantly have new ideas of potential thesis projects, so don’t hesitate to contact pascal.friederich∂kit.edu or any of our group members to find out about other open Bachelor/Master theses. Please also check out our thesis guidelines for more details on the process of a Bachelor/Master thesis in our group!