Published February 20, 2024 | Version v1
Journal article Open

YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs

Description

We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.

Files

YFlows.pdf

Files (2.5 MB)

Name Size Download all
md5:f35af592c7a237f4ce52512b84435d3f
2.5 MB Preview Download

Additional details

Identifiers

DOI
10.1145/3640537.3641566
Other
oai:uchicago.tind.io:11118

Funding

University of Chicago
Quad Undergraduate Research Fellowship
University of Chicago
Metcalf Fellowship Grant

UChicago Information

Division(s)
Physical Sciences Division
Department(s)
Computer Science