Merging Models for Memory-Efficient Real-Time Video Analytics at the Edge

Processing video data using ML is often done at the edge instead of powerful cloud servers to address privacy concerns and avoid the additional time of sending data to the cloud and back. However, edge devices tend to be less equipped – in particular, they have less memory. We study the task of running ML models on edge devices and find that memory is often a bottleneck when running real-world workloads. To address this, we propose a model merging strategy, in which we find layers that are redundant between models and try to keep only one copy of them.

In this talk, I’ll give background about the memory needs of running ML models, discuss ways of addressing memory concerns, and present our model merging strategy.

Arthi Padmanabhan is an assistant professor of computer science at Harvey Mudd College. Her research is in building systems to make machine learning more resource-efficient, enabling it to run on constrained edge devices. She is currently working on running machine learning on low power energy-harvesting devices. She also enjoys thinking about CS education and teaching courses in networking and systems. She received her Ph.D. from UCLA in 2022 and before that, a B.S. from Pomona College in 2014.

Merging Models for Memory-Efficient Real-Time Video Analytics at the Edge

Departments