I take inspiration from neural dynamics, translating the brain’s mechanisms into AI that is interpretable, efficient, and robust for real-world deployment. Living organisms anticipate, coordinate, and adapt with remarkable precision. For example, when humans sync to a musical beat or localize a sound in a noisy room. By modeling neural rhythms and Hebbian learning, I create algorithms that replicate these abilities, supporting applications from music therapy and motor rehabilitation, to human-AI collaboration.
A central challenge in AI is the scarcity of diverse, high-quality data. To address this, I developed SpatialScaper, an open-source multimodal tool for simulating realistic 3D scenes that enable AI systems to learn sound localization entirely from simulations. I have also pioneered self-supervised methods that allow machines to construct high-resolution, real-time “sound maps.” These can be used to teach robots to track voices in noisy factories or security systems to pinpoint incidents with precision.
My lab extends beyond perception to affective and multimodal intelligence. Students at all levels (from high school to PhD and beyond) have contributed to projects that internalize emotional intensity for empathetic AI, predict human actions from gaze, hand, and object cues, or decode audio from electroencephalography recordings. Others have developed interpretable audio features for efficient classification, analyzed musical pitch structures in traditional instruments, or transferred singing styles across genres. These projects integrate neuroscience, machine learning, and music information retrieval, providing students with hands-on experience in building AI systems with immediate societal relevance.
By combining neural principles, signal processing, and modern AI, my research creates systems that process data to anticipate, adapt, and connect with people and their environments. Students working with me gain the tools and knowledge to develop the next generation of intelligent systems that are interpretable, human-centered, and ready for real-world impact.
Dr Iran Roman is Lecturer in Sound and Music Computing, you can read more about the module he teaches on his profile. He is part of the Centre for Multimodal AI.