Panel Discussion

LLMs and Machine Translation for Low-Resource Languages: Bridging Gaps or Widening Divides?

24th June 15:00-16:00

Abstract: LLMs such as ChatGPT, Claude and Gemini 1.5 have come to dominate the AI landscape, through their ability to perform well across a wide range of tasks and languages. They have excellent abilities in machine translation for high-resource languages, often performing on par with dedicated translation models, and with exciting use-cases including stylization, post-editing, and human-in-the-loop approaches. Nevertheless, these models’ capabilities are much more limited in languages with less digital representation: performance in lower-resource languages can be regarded as a byproduct rather than a focus and the reliance on English language training data reinforces English language cultural hegemony, with particularly high representation of American English cultural knowledge in model weights. In downstream evaluation, claims of multilinguality typically belie the dependence on English-centric data: the FLORES dataset, for example, which contains MT evaluation data in over 200 languages, is largely translated from English. This panel will explore the challenges and opportunities associated with LLMs for translating low-resource languages, investigating the dangers of exacerbating existing linguistic and cultural biases, the potential of LLMs to democratise information access, and how to ensure that these models benefit rather than marginalise underrepresented linguistic communities.

Panelists

Adaeze Ngozi Ohuoba
University of Leeds, UK

Adaeze Ngozi Ohuoba is a PhD researcher at the School of Languages, Cultures and Societies, University of Leeds. Her PhD research focuses on using large language models to detect and predict English medical source texts that could produce potentially harmful outputs when machine translated into a low-resource language like Igbo.

Prior to commencing her PhD studies, she worked as a lecturer at the Department of Foreign Language and Translation Studies, Abia State University, Nigeria. She is also a freelance translator/ editor specialising in legal, medical and literary translations from French/Igbo into English and English/French into Igbo.

Her research interests include Machine Translation for Low-Resourced Languages, Computational Linguistics, French as a Foreign Language and Language in Health

Alexandra Birch
University of Edinburgh, UK

Alexandra Birch is a Reader in Natural Language Processing in the Institute for Language, Cognition and Computation (ILCC), School of Informatics, University of Edinburgh. She is a leader of the StatMT group and a co-founder of Aveni.ai - an award winning startup in speech analytics and conversational AI. Her main research focuses on machine translation and multilingual dialogue, but she has a broad interest in leveraging NLP to create compelling applications that improve people's lives.

Chris Oakley
ZOO Digital, UK

Chris Oakley is the Chief Technology Officer (CTO) of ZOO Digital, a leading provider of cloud-based localization and digital distribution services for the global entertainment industry. With a career spanning over two decades in the technology and digital media sectors, Chris brings a wealth of experience and a visionary approach to his role at ZOO Digital.

As CTO, Chris Oakley is responsible for overseeing the development and implementation of cutting-edge AI and ML technologies that power ZOO Digital's innovative services. Under his leadership, the company has continued to pioneer advancements in AI and ML cloud-based solutions, enabling efficient and scalable workflows for the localization and distribution of movies, TV shows, and other digital content.

Helena Moniz
President of EAMT & IAMT. University of Lisbon, Portugal. INESC-ID, Portugal

Helena Moniz is the President of the European Association for Machine Translation (2021-) and President of the International Association for Machine Translation (2023-). She is also the Vice-Coordinator of the Human Language Technologies Lab at INESC-ID, Lisbon. Helena is an Assistant Professor at the School of Arts and Humanities at the University of Lisbon, where she teaches Computational Linguistics, Computer Assisted Translation, and Machine Translation Systems and Post-editing. She is now in a very exciting project, coordinated by Unbabel, the Center for Responsible AI (https://centerforresponsible.ai), within the Portuguese Recovery and Resilience Plan, as Chair of the Ethics Committee.

Helena graduated in Modern Languages and Literature at the School of Arts and Humanities, University of Lisbon (FLUL), in 1998. She took a Teacher Training graduation course in 2000, a Master’s degree in Linguistics in 2007, and a PhD in Linguistics at FLUL in cooperation with the Technical University of Lisbon (IST) in 2013. She has been working at INESC-ID/CLUL since 2000, in several national and international projects involving multidisciplinary teams of linguists and speech processing engineers. Within these fruitful collaborations, she participated in more than 20 national and international projects.

From 2015/09 to 2024/04, she was the PI of a bilateral project between INESC-ID and Unbabel, a translation company combining AI + post-editing, working on scalable Linguistic Quality Assurance processes for crowdsourcing. She was responsible for the implementation on 2015 of the MQM metric, the creation of the Linguistic Quality Assurance processes developed at Unbabel for Linguistic Annotation and Editors' Evaluation. She also worked on research projects, involving Linguistics, Translation, and Responsible AI, and products developed by the Labs Team, mostly cultural transcreation, high risk products, and silently controlled language metrics for dialogues.

In a sentence, she is passionate about Language Technologies in a human-centric perspective and always feels like a child eager to learn!

Mirko Lorenz
Deutsche Welle, Germany

Mirko Lorenz is an Innovation Manager working for Deutsche Welle, Germany's international broadcaster. He has been a member of the Research and Cooperation Team (ReCo) since 2008. One main outcome of his work is plain X, a 4-in-1 software to simplify content adaptation. In plain X, users can transcribe, translate, subtitle, and create (synthetic) voice-overs.

Mirko has a master's in economics and history from the University of Cologne and a professional background in journalism. He co-founded Datawrapper, a tool to create charts and maps which is used in many large newsrooms worldwide.

Valter Mavrič
DG TRAD, European Parliament

Valter Mavrič is Director-General of the Translation Service (DG TRAD) at the European Parliament (since 2016), where he was previously acting Director-General (from 2014), Director (from 2010) and Head of the Slovenian Translation Unit (from 2004). With an MA in applied linguistics and further training in translation, interpretation, linguistics and management, he has a long experience as manager, translator, interpreter and teacher of languages. He works in Slovenian, Italian, English, French, and Croatian and is currently preparing a PhD in strategic communication.

Report abuse