Published 2024 | Version v1
Journal article Open

Data Isotopes for Data Provenance in DNNs

  • 1. University of Chicago
  • 2. University of California, Berkeley
  • 3. Cornell University

Description

Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data, and in particular their images, are used to train models. To empower users to counteract unwanted use of their images, we design, implement and evaluate a practical system that enables users to detect if their data was used to train a DNN model for image classification. We show how users can create special images we call isotopes, which introduce ``spurious features'' into DNNs during training. With only query access to a model and no knowledge of the model-training process, nor control of the data labels, a user can apply statistical hypothesis testing to detect if the model learned these spurious features by training on the user's images.

Isotopes can be viewed as an application of a particular type of data poisoning. In contrast to backdoors and other poisoning attacks, our purpose is not to cause misclassification but rather to create tell-tale changes in confidence scores output by the model that reveal the presence of isotopes in the training data. Isotopes thus turn DNNs' vulnerability to memorization and spurious correlations into a tool for data provenance. Our results confirm efficacy in multiple image classification settings, detecting and distinguishing between hundreds of isotopes with high accuracy. We further show that our system works on public ML-as-a-service platforms and larger models such as ImageNet, can use physical objects in images instead of digital marks, and remains robust against several adaptive countermeasures.

Files

Data-Isotopes-for-Data-Provenance-in-DNNs.pdf

Files (3.1 MB)

Name Size Download all
md5:5b19a510bbbf0967ffb3b630fa6e04d3
3.1 MB Preview Download

Additional details

Identifiers

DOI
10.56553/popets-2024-0024
Other
oai:uchicago.tind.io:10336

Funding

U.S. National Science Foundation
CNS-2241303
U.S. National Science Foundation
CNS- 1949650
U.S. National Science Foundation
DARPA GARD program
Amazon (United States)
Unknown funder
C3 AI
U.S. National Science Foundation
1916717
Unknown funder
GFSD Fellowship
Unknown funder
Harvey Fellowship
University of Chicago
Neubauer Fellowships

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computer Science