Published July 8, 2024 | Version v1
Journal article Open

A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

  • 1. University of Chicago
  • 2. IBM Research

Description

Deep learning (DL) training is data-intensive and often bottlenecked by fetching data from remote storage. Recognizing that many samples' sizes diminish during data preprocessing, we explore selectively offloading preprocessing to remote storage to mitigate data traffic. We conduct a case study to uncover the potential benefits and challenges of this approach. We then propose SOPHON, a framework that selectively offloads preprocessing tasks at a fine granularity in order to reduce data traffic, utilizing online profiling and adaptive algorithms to optimize for every sample in every training scenario. Our results show that SOPHON can reduce data traffic and training time by 1.2-2.2x over existing solutions.

Files

Selective-Preprocessing-Offloading-Framework-for-Reducing-Data-Traffic-in-DL-Training.pdf

Additional details

Identifiers

DOI
10.1145/3655038.3665947
Other
oai:uchicago.tind.io:12778

Funding

National Science Foundation
CCF-2119184
National Science Foundation
CNS-2027170

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computer Science