A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

Wang, Meng; Waldspurger, Gus; Sundararaman, Swaminathan

doi:10.6082/tkn2g-g2834

Published July 8, 2024 | Version v1

Journal article Open

A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

1. University of Chicago
2. IBM Research

Deep learning (DL) training is data-intensive and often bottlenecked by fetching data from remote storage. Recognizing that many samples' sizes diminish during data preprocessing, we explore selectively offloading preprocessing to remote storage to mitigate data traffic. We conduct a case study to uncover the potential benefits and challenges of this approach. We then propose SOPHON, a framework that selectively offloads preprocessing tasks at a fine granularity in order to reduce data traffic, utilizing online profiling and adaptive algorithms to optimize for every sample in every training scenario. Our results show that SOPHON can reduce data traffic and training time by 1.2-2.2x over existing solutions.

Files

Selective-Preprocessing-Offloading-Framework-for-Reducing-Data-Traffic-in-DL-Training.pdf

Files (1.1 MB)

Name	Size	Download all
Selective-Preprocessing-Offloading-Framework-for-Reducing-Data-Traffic-in-DL-Training.pdf md5:ddbea951e6a80884536195756f456f46	1.1 MB	Preview Download

Additional details

DOI: 10.1145/3655038.3665947
Other: oai:uchicago.tind.io:12778

National Science Foundation
CCF-2119184
National Science Foundation
CNS-2027170

Division(s): Physical Sciences Division
Department(s): Computer Science

Views

Downloads

Show more details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

University of Chicago

Published in

Proceedings of the ACM Workshop on Hot Topics in Storage and File Systems, 2024.

Languages

English

License

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more
Distribution License

No further description.

Copyrights

Technical metadata

Created: May 22, 2026
Modified: May 22, 2026

A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

Files

Selective-Preprocessing-Offloading-Framework-for-Reducing-Data-Traffic-in-DL-Training.pdf

Files (1.1 MB)

Additional details

Identifiers

Funding

UChicago Information

A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

Creators

Description

Files

Selective-Preprocessing-Offloading-Framework-for-Reducing-Data-Traffic-in-DL-Training.pdf

Files (1.1 MB)

Additional details

Identifiers

Funding

UChicago Information