Published September 11, 2024
| Version v1
Journal article
Open
The simplicity of protein sequence-function relationships
Description
How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence—which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis—or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.
Data availability
All sequence-function data were gathered from published studies (Table 1) and are available on GitHub (https://github.com/whatdoidohaha/RFA) and Zenodo (https://doi.org/10.5281/zenodo.8307147).
All scripts used for data analysis as well as tutorial scripts for performing reference-free analysis are available on GitHub (https://github.com/JoeThorntonLab/RFA) and Zenodo (https://doi.org/10.5281/zenodo.8307147).
Files
Simplicity-of-protein-sequence-function-relationships.pdf
Files
(7.9 MB)
| Name | Size | Download all |
|---|---|---|
|
Article md5:5050a4e0b1d066f316f7dcdea275d7da |
2.5 MB | Preview Download |
|
md5:989350a1aa71b00932f125cd56b63f91
|
5.4 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1038/s41467-024-51895-5
- Other
- oai:uchicago.tind.io:13531
Funding
- National Institutes of Health
- R35GM145336
- National Institutes of Health
- R01GM131128
- National Institutes of Health
- R01GM121931
- National Institutes of Health
- F32GM122251
- Samsung Scholarship