Our visual world is naturally open, containing visual elements that are dynamic, vast, and unpredictable. However, existing computer vision models are often developed inside a closed-world paradigm, for example, recognizing objects or human actions from a fixed set of categories. If these models are exposed to the realistic complexity of the open world, they will be brittle and fail to generalize.
For example, an autonomous driving vehicle may not be able to avoid an accident if it sees a turned over truck, because it has never seen this novelty in its training data. In order to enable visual understanding in the open world, vision models need to be aware of the unknown novelties, and eventually adapt themselves to these novelties.
In this talk, I will describe our recent work on open-set recognition, where unknown novelties may be absent or present in training. I will also talk about how to recognize any visual objects, leveraging the knowledge from large language models. Together, these approaches make strides towards assurance AI in the open world.
Dr. Yu Kong is now an assistant professor directing the ACTION Lab in the Department of Computer Science at Michigan State University (MSU). He received a B.Eng. in automation from Anhui University, and a Ph.D. in computer science from Beijing Institute of Technology. Before joining MSU, he was an assistant professor at Rochester Institute of Technology.
Dr. Kong’s research interests are open world recognition, video understanding, and vision-language modeling. His work is supported by the National Science Foundation, Army Research Office, and Office of Naval Research, etc. He has been serving as an Associate Editor for Springer Journal of Multimedia Systems, and an Area Chair for CVPR, IJCAI, AAAI, and ACM Multimedia. He also serves as reviewers and PC members for prestige journals and conferences, including T-PAMI, T-IP, T-NNLS, T-CSVT, CVPR, ICLR, NeurIPS, AAAI, and IJCAI, etc. More information can be found on his webpage at https://www.egr.msu.edu/~yukong/.