WiNLP 2025 Workshop - Program

Our annual workshop will be co-located with EMNLP 2025 in Suzhou, China. The virtual and in-person events will take place in Room A303 on November 8th.

Full Schedule of Events

Event	Time (in China Standard Time [UTC+8])
Virtual Poster Session	8:00-9:00
Opening Remarks	9:00-9:15
Keynote A: David Adelani	9:15-10:15
Coffee Break	10:30-11:00
Panel Discussion	11:00-12:00
Mentorship Session: Apple	12:00-12:40
Lunch Break	12:40-14:00
In-Person Poster Session	14:00-15:30
Coffee Break	15:30-16:00
Keynote B: Jen-Tse Huang	16:00-17:00
Closing Remarks & Best Paper Award	17:00-17:15

Keynote A (Time: 9:15-10:15)

Title: Scaling Multilingual Evaluation of LLMs to Many Languages

Speaker: David Adelani

Abstract: Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. In this talk, I would describe different approaches to scaling evaluation to several languages. First, I would describe simple strategies for extending multilingual evaluations by re-purposing existing English datasets to over 200 languages for both text (SIB-200) and speech modalities (Fleurs-SLU). Second, I would introduce our recent bench IrokoBench – a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference, mathematical reasoning, and multi-choice knowledge-based question answering. This evaluation expands the evaluation of many low-resource languages from simple text classification tasks to more challenging knowledge and reasoning tasks. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Gemma 2 27B, only at 60\% of the best-performing proprietary model GPT-4o performance. These findings suggest that more efforts are needed to develop and adapt LLMs for low-resource languages. Finally, I will highlight some of our recent projects that make some of these challenging datasets more multicultural for Visual question answering and intent detection tasks, to encourage practical usage of LLMs within the low-resource communities.

Bio: Dr. David Adelani is an Assistant Professor at the McGill University School of Computer Science, a Core Academic Member at Mila - Quebec AI Institute, an IVADO Professor, and a Canada CIFAR AI Chair. He received his Ph.D in Computer Science at the Department of Language Science and Technology, Saarland University, Germany. His research interests include multilingual natural language processing with a focus on low-resource languages, speech processing, privacy and safety of large language models. With over 20 publications in leading NLP and Speech Processing venues like ACL, TACL, EMNLP, NAACL, COLING, and Interspeech, he has made significant contributions to NLP for low-resource languages. Notably, one of his publications received the Best Paper Award (Global Challenges) at COLING 2022 for developing AfroXLMR, a multilingual pre-trained language model for African languages. Other notable awards include an Area Chair Award at IJCNLP-AACL 2023, Outstanding Paper Award and Best Theme Paper Award at NAACL 2025.

Keynote B (Time: 16:00-17:00)

Title: Language Models Do Not Have Human-Like Working Memory

Speaker: Jen-Tse Huang

Abstract: While Large Language Models (LLMs) exhibit remarkable reasoning abilities, we demonstrate that they lack a fundamental aspect of human cognition: working memory. Human working memory is an active cognitive system that enables not only the temporary storage of information but also its processing and utilization, enabling coherent reasoning and decision-making. Without working memory, individuals may produce unrealistic responses, exhibit self-contradictions, and struggle with tasks that require mental reasoning. Existing evaluations using N-back or context-dependent tasks fall short as they allow LLMs to exploit external context rather than retaining the reasoning process in the latent space. We introduce three novel tasks: (1) Number Guessing, (2) Yes-No Deduction, and (3) Math Magic, designed to isolate internal representation from external context. Across seventeen frontier models spanning four major model families, we consistently observe irrational or contradictory behaviors, indicating LLMs’ inability to retain and manipulate latent information. Our work establishes a new benchmark for evaluating working memory in LLMs and highlights this limitation as a key bottleneck for advancing reliable reasoning systems.

Bio: Jen-Tse (Jay) Huang is a postdoctoral researcher at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University, working with Mark Dredze. He received his Ph.D. in Computer Science and Engineering from the Chinese University of Hong Kong and his B.Sc. from Peking University. His research explores the evaluation of large language models (LLMs), both as individual agents and as collectives in multi-agent systems, through the lens of social science. His work has been published in top-tier AI venues, including an oral presentation at ICLR 2024. He actively serves as a reviewer for major conferences and journals such as ICML, NeurIPS, ICLR and serves as an area chair in ARR.

Panel Discussion (Time: 11:00-12:00)

Topic: After a PhD, What is Waiting for us? A Discussion and Experiences from Industry, Academia, and Startups

Panelists:

Christos Christodoulopoulos

Christos Christodoulopoulos is a Principal Technology Adviser in the AI Policy \& Compliance teams of the Information Commissioner’s Office, UK’s Data Protection regulator. Before joining the ICO, he was an Applied Scientist at Amazon, starting in 2016 on the Alexa AI Knowledge team and ending as a Senior Applied Scientist at Amazon’s Responsible AI team working on multimodal and agentic FM development. Before Amazon, he was a postdoctoral researcher at UIUC working with Dan Roth on Semantic Role Labeling and Cindy Fisher on computational models of child language acquisition. He has an MSc and PhD from the University of Edinburgh. He is a Program Chair for EMNLP 2025, an organiser for the FEVER, GenBench, and TrustNLP workshops and has served as a reviewer, area chair and senior area chair for many *CL conferences.

Julia Kreutzer

Julia Kreutzer is a Senior Research Scientist at Cohere Labs, where she conducts research on large language models, currently focused on multilinguality, evaluation and inference. Previously, she worked at Google Translate, and completed her PhD at Heidelberg University on learning from human feedback in machine translation. She’s been an active contributor to multiple open-science communities and a co-organizer of COLM, WMT shared tasks and various NLP workshops.

Pittawat Taveekitworachai

Pittawat (Pete) Taveekitworachai is a research scientist on the Typhoon team at SCB 10X in Thailand. His research interests include reasoning models, test-time scaling, prompt engineering, and reinforcement learning. He completed his Master’s degree (as valedictorian) at Ritsumeikan University, Japan, under the Japanese Government Scholarship (MEXT), where his research focused on prompt engineering, large language models, and their applications in gaming, healthcare, and autonomous driving. At SCB 10X, he leads research collaborations with academic and industry partners, both domestically and internationally. He is passionate about translating cutting-edge research into real-world applications and values both the scientific rigor and engineering practicality that drive impactful innovation.

Zhisong Zhang

Zhisong Zhang is currently an Assistant Professor in the Department of Computer Science of City University of Hong Kong. He holds a PhD from the Language Technologies Institute at Carnegie Mellon University. His doctoral research focused on advancing natural language processing (NLP) systems, particularly in data-limited scenarios, where his work aimed to reduce the need for labor-intensive manual data labeling while improving task performance. After PhD graduation, he had also worked as a researcher in Tencent before joining CityUHK. His current research focuses on natural language processing (NLP) and large language models (LLMs), with particular interests in long-context language modeling, LLM-based agent systems, and understanding the underlying mechanisms of language models. Please refer to his homepage for more details: https://zzsfornlp.github.io/

Mentorship Session (Time: 12:00-12:40)

We are excited to have a mentorship session hosted by our sponsor Apple! This session will provide an opportunity for attendees to engage with Apple researchers, ask questions, and gain insights into career development and research.

Mentors: Tianjun Ye, Shu W., Vivien Zhao