Published September 11, 2024 | Version v1
Journal article Open

The simplicity of protein sequence-function relationships

  • 1. University of Chicago

Description

How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence—which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis—or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.

Data availability

All sequence-function data were gathered from published studies (Table 1) and are available on GitHub (https://github.com/whatdoidohaha/RFA) and Zenodo (https://doi.org/10.5281/zenodo.8307147).

All scripts used for data analysis as well as tutorial scripts for performing reference-free analysis are available on GitHub (https://github.com/JoeThorntonLab/RFA) and Zenodo (https://doi.org/10.5281/zenodo.8307147).

Files

Simplicity-of-protein-sequence-function-relationships.pdf

Files (7.9 MB)

Name Size Download all
Article
md5:5050a4e0b1d066f316f7dcdea275d7da
2.5 MB Preview Download
md5:989350a1aa71b00932f125cd56b63f91
5.4 MB Preview Download

Additional details

Identifiers

DOI
10.1038/s41467-024-51895-5
Other
oai:uchicago.tind.io:13531

Funding

National Institutes of Health
R35GM145336
National Institutes of Health
R01GM131128
National Institutes of Health
R01GM121931
National Institutes of Health
F32GM122251
Samsung Scholarship

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Ecology and Evolution, Genetics, Genomics, and Systems Biology, Human Genetics