Adaptive Computation in Language Models: Does Every Token Require the Same Effort?

Transformer-based large language models (LLMs) have achieved remarkable success, yet significant challenges remain. In this talk, I will explore two key questions: (1) Does every token require the same level of computational effort? and (2) Why do LLMs sometimes struggle with seemingly simple tasks, such as arithmetic operations? To address these questions, techniques like mixture of experts (MoE), speculative decoding, and early exit strategies have been developed to dynamically adjust computational demands based on task complexity. However, these approaches are not sufficient.

Xifeng Yan,
University of California at Santa Barbara

I will share our insights and introduce some early ideas for tackling these challenges, with the goal of inspiring further research beyond the Transformer architecture. Lastly, I will briefly discuss our ongoing AI research in materials science and finance.

Xifeng Yan is a professor at the University of California at Santa Barbara. He holds the Venkatesh Narayanamurti Chair of Computer Science. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2006. He was a research staff member at the IBM T. J. Watson Research Center between 2006 and 2008. His work is centered on knowledge discovery, knowledge bases, and artificial intelligence. He received NSF CAREER Award, IBM Invention Achievement Award, ACM-SIGMOD Dissertation Runner-Up Award, IEEE ICDM 10-year Highest Impact Paper Award, 2022 PLDI Distinguished Paper Award, 2022 VLDB Test of Time Award and the first place prize of Amazon SocialBot Grand Challenge 5. His team is the first leveraging Transformer for time-series forecasting, opening a new area.

Adaptive Computation in Language Models: Does Every Token Require the Same Effort?

Departments