Recent advances in large models have significantly expanded the capabilities of artificial intelligence. However, when AI systems are deployed in human-centered settings, strong performance on generic benchmarks does not guarantee effective interaction with people. Many existing models are designed around coarse visual representations and lack explicit mechanisms to model human bodies, expressions, and behaviors that are fundamental to human-centered applications.

Hezhen “Alex” Hu,
The University of Texas at Austin
In this talk, I present a research agenda toward human-centered AI agents that can perceive, represent, and interact with humans in an inclusive and context-aware manner. The agenda is structured around three complementary directions. (1) human-centric foundation model pretraining, which addresses the scarcity of labeled human data by learning transferable representations from large-scale, weakly structured human videos; (2) embodiment, which enables high-fidelity modeling of hands, faces, and articulated motion from minimal inputs, supporting realistic and inclusive human modeling; and (3) context-aware human understanding, which integrates structural and situational cues to move beyond isolated perception toward intent-level reasoning. Together, these directions aim to shift AI systems from detecting humans to understanding and interacting with them as embodied, expressive, and contextualized agents. I conclude by outlining future directions towards establishing human-centered interaction as a core capability of next-generation AI systems.
Hezhen “Alex” Hu is a postdoctoral fellow at The University of Texas at Austin, working with Prof. Atlas Wang and Prof. Georgios Pavlakos. His research aims to build human-centered AI systems that treat human communication and interaction as first-class capabilities, grounded in embodiment and real-world context, enabling physically and socially meaningful interactions between people and AI. This vision has translated into sustained real-world deployments for deaf communities: for example, his SignBERT series (ICCV’21, TPAMI’23) serves as the core AI foundation for iFLYTEK service robots that can understand and communicate with deaf users in real time; it has also powered the sign-language video question–answering system for municipal public-service scenarios, now integrated into city hotline platforms to support accessible civic interaction. His work has received four awards in CVPR/ECCV sign-language competitions and has resulted in six granted patents. In parallel, he is deeply engaged in education and outreach, including co-authoring a nationally adopted AI textbook series for K–12 learners that has been distributed in over 80,000 copies across 500+ Chinese primary schools, reaching more than 300,000 students to date.