Published October 16, 2023 | Version v1
Journal article Open

Pervasive, conserved secondary structure in highly charged protein regions

  • 1. University of Pennsylvania
  • 2. California Institute of Technology
  • 3. University of Chicago

Description

Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder—high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length—are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.

Data availability

Data availability Data used in this study are from publicly available datasets: AlphaFold protein structure prediction available at https://alphafold.ebi.ac.uk/download#proteomes-section, yeast proteome available from the Saccharomyces Genome Database http://sgd-archive.yeastgenome.org/?prefix=sequence/S288C_reference/orf_protein/ AYbRAH fungal ortholog database available at https://github.com/LMSE/aybrah, and DisProt yeast disordered regions https://www.disprot.org/browse?sort_field=disprot_id&sort_value=asc&page_size=20&page=0&release=current&show_ambiguous=true&show_obsolete=false&ncbi_taxon_id=559292. All additional data generated in this study are available at https://github.com/drummondlab/highly-charged-regions-2022. Code availability All analyses and code used to generate the figures in this work can be found at https://github.com/drummondlab/highly-charged-regions-2022.

Files

journal.pcbi.1011565.pdf

Files (51.7 MB)

Name Size Download all
md5:a7c18bcc5853c59546804990f21f17ee
47.7 MB Preview Download
Article
md5:27679bf3c80122a9a5bfddcbf06e603b
4.0 MB Preview Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1011565
Other
oai:uchicago.tind.io:9286

Funding

Damon Runyon Cancer Research Foundation
Damon Runyon Postdoctoral Fellowship
University of Chicago
Biological Sciences Collegiate Division Summer Fellowship
Liew Family College
Research Fellows Fund
University of Chicago
Quantitative Biology Summer Fellowship
NIH
GM144278
NIH
GM127406
US Army Research Office
W911NF-14-1-0411
NIH
R35 GM136381

UChicago Information

Division(s)
Biological Sciences Division, Physical Sciences Division
Department(s)
Biochemistry and Molecular Biology, Chemistry, Medicine