Published December 12, 2023 | Version v1
Journal article Open

Everyday language input and production in 1,001 children from six continents

  • 1. Harvard University
  • 2. University of Manitoba
  • 3. Stockholm University
  • 4. Max Planck Institute for Psycholinguistics
  • 5. University of Connecticut
  • 6. Purdue University
  • 7. Stockhold University
  • 8. Basque Center on Cognition Brain and Language
  • 9. PSL University
  • 10. University of Chicago
  • 11. Ohio State University
  • 12. Royal Dutch Kentalis Utrecht

Description

Language is a universal human ability, acquired readily by young children, who otherwise struggle with many basics of survival. And yet, language ability is variable across individuals. Naturalistic and experimental observations suggest that children's linguistic skills vary with factors like socioeconomic status and children's gender. But which factors really influence children's day-to-day language use? Here, we leverage speech technology in a big-data approach to report on a unique cross-cultural and diverse data set: >2,500 d-long, child-centered audio-recordings of 1,001 2- to 48-mo-olds from 12 countries spanning six continents across urban, farmer-forager, and subsistence-farming contexts. As expected, age and language-relevant clinical risks and diagnoses predicted how much speech (and speech-like vocalization) children produced. Critically, so too did adult talk in children's environments: Children who heard more talk from adults produced more speech. In contrast to previous conclusions based on more limited sampling methods and a different set of language proxies, socioeconomic status (operationalized as maternal education) was not significantly associated with children's productions over the first 4 y of life, and neither were gender or multilingualism. These findings from large-scale naturalistic data advance our understanding of which factors are robust predictors of variability in the speech behaviors of young learners in a wide range of everyday contexts.

Data availability

Anonymized (tabular) data and all relevant code have been deposited with the Open Science Foundation (https://osf.io/9v2m5/?viewonly=50df17fcf0844145ae692c35b78c6b08) (88). The raw audio recordings are not able to be shared given the consent process participants underwent, but all derived tabular data are fully shared.

Files

bergelson-et-al-2023-everyday-language-input-and-production-in-1-001-children-from-six-continents.pdf

Files (31.3 MB)

Name Size Download all
Article
md5:fa186169d66607fbfcc458aad108f575
29.7 MB Preview Download
Supporting information
md5:98ab4d7113f82151678fd5d109f6284a
1.6 MB Preview Download

Additional details

Identifiers

DOI
10.1073/pnas.2300671120
Other
oai:uchicago.tind.io:10429

Funding

MechELex
ANR-16-DATA-0004 ACLEW
MechELex
ANR-14-CE30-0003
McDonnell Foundation
ExELang
ERC H2020
NEH
HJ-253479-17
NIH
DP5-OD019812
NSF
BCS-1844710
NSF
SBE-0354453
ESRC
ES/L008955/1
SSHRC
435-2015-0628
SSHRC
869-2016-0003
NSERC
501769-2016-RGPDD
Netherlands Organisation for Scientific Research
275-89-033
NIMH
K23MH111955
NIDCDD
F31DC018219
MAW
2011.0070
MAW
2013.0056
Basque Government
BERC 2022-2025 program
Spanish State Research Agency
BCBL Severo Ochoa excellence accreditation
Unknown funder
Ramon y Cajal Fellowship
Unknown funder
ARC CE140100041

UChicago Information

Division(s)
Social Sciences Division
Department(s)
Comparative Human Development