Keynote Speakers

Wenwu Wang

University of Surrey

Wenwu Wang is a Professor in Signal Processing and Machine Learning, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 300 papers in these areas. He has been recognized as a (co-)author or (co)-recipient of more than 15 accolades, including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2020 and 2023 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, and LVA/ICA 2018 Best Student Paper Award. He is an Associate Editor (2020-2025) for IEEE/ACM Transactions on Audio Speech and Language Processing. He was a Senior Area Editor (2019-2023) and Associate Editor (2014-2018) for IEEE Transactions on Signal Processing. He is the elected Chair (2023-2024) of IEEE Signal Processing Society (SPS) Machine Learning for Signal Processing Technical Committee, a Board Member (2023-2024) of IEEE SPS Technical Directions Board, the Vice Chair (2022-2024) of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, an elected Member (2021-2026) of the IEEE SPS Signal Processing Theory and Methods Technical Committee. He was a Satellite Workshop Co-Chair for INTERSPEECH 2022, a Publication Co-Chair for IEEE ICASSP 2019, Local Arrangement Co-Chair of IEEE MLSP 2013, and Publicity Co-Chair of IEEE SSP 2009. He is a Satellite Workshop Co-Chair for IEEE ICASSP 2024, Special Session Co-Chair of IEEE MLSP 2024, and Technical Program Co-Chair of IEEE MLSP 2025.

Keynote Title: Generative AI for Text to Audio Generation

Abstract: Text-to-audio generation aims to produce an audio clip based on a text prompt which is a language description of the audio content to be generated. This can be used as sound synthesis tools for film making, game design, virtual reality/metaverse, digital media, and digital assistants for text understanding by the visually impaired. To achieve cross modal text to audio generation, it is essential to comprehend the audio events and scenes within an audio clip, as well as interpret the textual information presented in natural language. In addition, learning the mapping and alignment of these two streams of information is crucial. Exciting developments have recently emerged in the field of automated audio-text cross modal generation. In this talk, we will give an introduction of this field, including problem description, potential applications, datasets, open challenges, recent technical progresses, and possible future research directions. We will focus on the deep generative AI methods for text to audio generation. We will start with our earlier work on conditional audio generation published in MLSP 2021 which was used as the baseline system in DCASE 2023. We then move on to the discussion of several algorithms that we have developed recently, including AudioLDM, AudioLDM2, Re-AudioLDM, and Wavjourney, which are getting increasingly popular in the signal processing, machine learning, and audio engineering communities.

Mark Salisbury

University of St. Thomas

Mark Salisbury is a computer scientist, professor, leader, speaker, author, consultant, and expert on the future. After completing his Ph.D. at the University of Oregon, Mark worked for eleven years at The Boeing Company where he worked in research and development in the field of artificial intelligence. After leaving Boeing, Mark founded Vitel, Inc., a knowledge management solution provider for the U.S. Department of Energy and the National Laboratories. Finally, Mark was a professor and program director at the University of New Mexico for seventeen years where he published extensively in artificial intelligence and knowledge management. Currently, he is a professor of computer science and organizational development & change at the University of St. Thomas. Mark’s latest book, Leadership in the Era of Personal AI, published by Routledge, will be released in the fall of 2024.

Keynote Title: Leadership in the Era of Personal AI

Abstract: The arrival of artificially intelligent avatars and the automation they bring is worrying many of us, not only for our livelihood but for the jobs that may be lost to our kids. We worry about what our place will be as leaders in this new economy where much of it will be conducted online in the metaverse – in a network of 3D virtual worlds – working with intelligent machines. This presentation addresses these fears and shows what our place will be – the right place – in this new economy of AI avatars, automation, and 3D virtual worlds. However, this presentation is about more than the AI avatars that we will work with in the metaverse. It’s about how to lead the effort for growing Organizational intelligence (OI) -- the capability of an organization to comprehend and create knowledge relevant to its purpose; in other words, it is the intellectual capacity of the entire organization. To increase organizational intelligence requires a new kind of leadership for a new kind of knowledge worker, a wisdom worker. This presentation begins your story for how to become a wisdom worker, lead other wisdom workers, and be successful in the emerging wisdom economy.