Software reuse practices and hazards in the pre-trained neural network supply chain

Deep neural networks are widely used in computing systems, from image recognition in autonomous vehicles to detecting anomalies in system logs. Creating and specializing neural networks is growing more difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to exchange and reuse pre-trained neural networks (PTNNs).

Understanding this software engineering process is the first step to optimizing and securing it, e.g. through model search engines for reuse, improved testing techniques for validation, and better definitions for PTNN packaging. However, the details of real-world PTNN reuse remain unknown. In this talk, I present results from our empirical software engineering work to define the PTNN supply chain and evaluate aspects of trust in this context. I will discuss three projects: (1) Characterizations of the kinds of PTNN registries; (2) Interviews with software engineers describing their processes and challenges; and (3) Measurements of the research-to-practice pipeline for PTNNs.

James C. Davis is an assistant professor of electrical and computer engineering at Purdue University. He worked for IBM from 2012-2015 and received his Ph.D. from Virginia Tech in 2020. His research is in software engineering, with applications to computing systems and cybersecurity. His work appears at venues such as ICSE, FSE, and IEEE S&P, and has been recognized with three ACM SIGSOFT distinguished paper awards. His lab is supported by the US National Science Foundation, Google, Rolls Royce, and Cisco.

Software reuse practices and hazards in the pre-trained neural network supply chain

Departments