Read this first

If you contact me and it is obvious from your e-mail that you did not read this page, I will ignore your request and I will not feel bad about it. If you have some idea in mind, read up on this short, less than perfect, but better than nothing guide to proposing your own project idea. If you do not, scroll down to the list of projects I have some interest in supervising. Some are better thought out than others, but I expect you to bring your ideas in them as well. I am not the kind of supervisor who gives weekly to-do lists.

List of potential projects

NOTE: I regularly update these projects, so while you can read the current projects to get an idea of the type of things that interest me, more will probably pop up now and then. If you see something interesting and want to add a twist to it, feel free to reach out to discuss it with me.

Here are a bunch of project ideas I would like to supervise in some form. They are not fixed in so far as you can come up with a slight variation of them and we can talk them out. They can be done at the undergraduate or the MSc level but might require slight adaptations in some cases to better fit the timeline of your degree. Some of those are research-oriented, and would fit well a student aiming for further study. Some are more engineering-focused, and would fit well a student who wants to build something cool (hopefully). I classify projects into three wide categories: (1) projects somewhat affiliated to my research group and might build on one another ; (2) one-shot projects which I think are fun and interesting but completely unrelated to my work ; (3) general lines of investigation that I am interested in, but without a clear direction (you will be expected to bring a lot more of your ideas into this).

The following projects are ongoing lines of interest for me. They might have been done to some extent by previous students, but that does not mean you can’t propose to give your own twist on it during our initial discussion phase.

Changelog

Added project(s) 1.7, a set of 3 different projects that aim to embed modern LLM-based AI in professional practice.
Added project 1.8 on scalable model benchmarking, in collaboration with Zijian Ling from ApplyU

Theme 1. Human-AI Interaction and Large Language Models

Project 1.1 – Committee-of-LLMs: LLM-to-LLM dialogue and distributed decision making

In this project, you will build a platform to allow multiple (local) Large Language Models to communicate with each other. The platform will allow the user to upload LLM personas (in an undefined form you will need to investigate) as well as a topic of debate. It will then allow the personas to debate the topic until a conclusion is reached. You will have to investigate ways to efficiently operate multiple personas in parallel (and the hardware challenges this will provide) as well as ways to perform conversations that go beyond the context size limit of standard large language models. While we have access to servers for experiments, it is probably better that you possess a machine that can run some small local LLM in order to do this project.

Note: this project has already been done by a previous student, leading to the publication of a paper: APOLLO: An Open Platform for LLM-Based Multi-agent Interaction Research so you will need to really propose a new spin on it or a way to extend the idea further.

Project 1.2 – An LLM-powered social bot that knows when to talk (and when to listen)

For embodied AI to be accepted in human society, we need voice interfaces to be able to determine with good accuracy when their human interlocutor is done talking. In this project, we will be investigating multimodal techniques to detect when a speaker is about to finish talking. As a stretch goal, we will study the possibility of giving conversational AIs the ability of interrupt their user and the impact of this action on their trustworthiness.

Project 1.3 – An LLM-powered social bot that knows what to remember, and what to forget

Retrieval-Augmented Generation (RAG) is a general framework for retrieving facts from a knowledge base and feeding them into a large language model in order to generate output grounded in a specific set of facts. The goal of this project is to investigate the use of RAG for continuously updating knowledge bases, by building a system that analyse user input for factual statements and use them to update the knowledge base, as well as updating (“update my address to X”) and forgetting (e.g., “forget my address”, or “forget anything related to X”) commands. You will need to possess a machine that can run some small local LLM in order to do this project.

Note: this project has already been done by a previous student, leading to the publication of a paper: Remembering Things Makes Chatbots Sound Smarter, but Less Trustworthy, so you will need to really propose a new spin on it or a way to extend the idea further.

Project 1.4 – Large Language Models with multiple memories

In this project I would like to explore two things: (1) potential architectures of integrating different types of memory in large language models (for example semantic, procedural, and episodic memory) in a way that leads to interesting results; and (2) the impact of those memories on the perception of those conversational agents by a human user. This is a large project and could potentially be done by multiple students at the same time, as it involves not only a great deal of creativity and engineering, but also a user experiment in order to validate the work.

Project 1.5 – Trustworthy scrutable and teachable intelligent agents

One way to make LLM-based agents more trustworthy in order to deploy them in the wild is to ensure that their reasoning can be interrogated, both from an inference perspective (making their reasoning explicit), and from a grounding perspective (making their memory scrutable). In this project, I would like to explore ways in which either/both of those two things can be made interactable with human users. What is the optimal way of presenting memory and reasoning traces to ensure that the users understand them? What is the optimal way to ensure that the users can potentially amend reasoning and memories in order to correct the agent?

Project 1.6 – Measuring the impact of embodiment on LLM trustworthiness and blame allocation

When a Large Language Model makes a mistake, who does the user blame: the developer, the data, or the AI itself? In this experimental study, you will investigate how the way the AI presents itself (e.g., giving it a name and a face vs. presenting it as a command-line tool) influences the allocation of blame and legal liability in the mind of the user. It will involve building the tooling to produce this experiment, as well as running a user study with participants.

Project(s) 1.7 – Embedding LLMs in professional practice

Project 1.7a – Building a legal assistant

Drafting formal legal correspondence, such as a letter before action, is a knowledge-intensive task requiring knowledge of the relevant statutes, applicable case law, and established conventions of legal writing. While Large Language Models (LLMs) can generate fluent text, their outputs often lack the necessary legal accuracy, hallucinate non-existent cases, and fail to ground their arguments in verifiable sources, making them unreliable for professional use. The goal of this project is to develop and evaluate an AI assistant that overcomes these limitations by combining an LLM with a structured legal knowledge base and an autonomous agent architecture. You will design and implement a multi-component system, “The Legal AIde,” focused on drafting letters related to a specific area of UK law. From a technical perspective, the system will be able to combine conceptual knowledge from knowledge graphs and more typical data stored as documents.

Project 1.7b – Building a pedagogical co-pilot

Designing effective lesson plans is a knowledge-intensive task at the core of a lecturer’s professional life. It requires an understanding of pedagogical theory, basic principles of instructional design, and the specific subject matter that is being taught. However, we are increasingly burdened by administrative overhead and the high cognitive load of planning, which stifles creativity and leads to burnout. While LLMs can generate fluent lesson outlines, their outputs often lack pedagogical soundness, promote lower-order thinking, contain factual inaccuracies (hallucinations), and are not grounded in verifiable educational theory. This project aims to develop and evaluate an AI assistant that overcomes these limitations by combining an LLM with a structured knowledge base of educational theory and a specific academic domain, all centred on a human-in-the-loop philosophy. You will design and implement a multi-component system (“The Pedagogical Co-Pilot”) focused on assisting a lecturer in constructing a lesson plan and detailed lecture outlines for a specific higher education course (to be decided later).

Project 1.7c – LLM-based student simulation for guided deliberate teaching practice

Large language models are good at roleplaying. But are they good at roleplaying defined characters? In this project, you will build a tool that allows higher education professionals (such as lecturers) to simulate a student with specific challenges (e.g., knowledge gap, tiredness, etc.) in order to practice their teaching technique. It will involve working with speech to text, text to speech, and (hopefully local) large language models and running a user study with some of your favourite people: your lecturers.

Project 1.8 – Scaling the benchmark of multimodal large language models on label-intensive tasks with generative multimodal synthetic data

Note that this project was proposed by and will be supervised in collaboration with Zijian Ling, from ApplyU (and an UoN alumnus).

Multimodal large language models (MLLMs) have recently achieved strong performance across a wide range of tasks, yet their evaluation is still dominated by benchmarks in reasoning, mathematics, coding, and general question answering. In contrast, label-intensive multimodal tasks, such as document parsing, key information extraction, table and chart understanding, are pervasive in both academia and industry, and early evidence suggests that modern MLLMs can already compete with, and potentially replace, traditional vision–language models (VLMs) and OCR-centric pipelines on many of these workloads. However, systematic comparison is currently hindered by the scarcity of large-scale public benchmarks, largely due to annotation cost, data sensitivity, and privacy constraints. At the same time, LLM-generated synthetic data has become a prevalent paradigm for training and evaluating text-only LLMs. This project investigates whether analogous generative multimodal synthetic data can be used to construct scalable, label-rich synthetic benchmarks for evaluating MLLMs on label-intensive tasks, and to study their comparative strengths and weaknesses against prior VLM/OCR systems under realistic application and regulatory constraints.

Theme 2. Applied Machine Learning and Modelling

Project 2.1 – Modelling brain data using dynamic system theory

This project will be done as part of the Brain and Physiological Data Group and in collaboration with Dr. Mustafa Hamada from Mendi. Dynamic System Theory (DST) provides a mechanistic framework for understanding how complex systems, e.g. the brain, change over time. It emphasises interaction and timing within the system and between the system and its environment, moving beyond static descriptions of localised activity. DST highlights key properties like stability, flexibility, nonlinearity, and history dependence in neural processing. It is widely used to explain phenomena like tipping points, cycles, and chaos in various complex systems. In this project, you will design, build, and evaluate one or more DST-based model(s) (benchmarked against more standard machine learning approaches) in order to predict properties of brain data. You will then build a small prototype application that can stream data directly from an fNIRS device in order to classify the mental workload of the people wearing it.

Project 2.2 – Building a brain-responsive user interface

Current interfaces display information regardless of the user’s capacity to process it. In this project, you will build a simple reading application that regulates text complexity based on real-time brain activity. You will train a mental workload classifier (using the Tufts fNIRS2MW dataset) to distinguish between low and high cognitive load. You will then integrate this with a local Large Language Model. When the system detects high workload, it will automatically prompt the LLM to simplify the text. This project effectively builds a closed-loop control system using physiological data to control a generative AI agent.

Theme 3. Natural Language Processing

Project 3 – Cross-lingual disinformation tracking

Most misinformation detection tools focus on English. This project investigates the propagation of specific narratives across language barriers. The student will build a pipeline that identifies a false claim in one language (e.g., English) and attempts to detect its translated variants in a target low-resource language, analysing how the narrative adapts to the new cultural context.

Jeremie Clos

Project ideas (UG/PGT)