Published April 16, 2025 | Version v1
Journal article Open

A Primer for Evaluating Large Language Models in Social-Science Research

  • 1. University of Southern California
  • 2. University of Illinois Chicago
  • 3. University of Chicago

Description

Autoregressive large language models (LLMs) exhibit remarkable conversational and reasoning abilities and exceptional flexibility across a wide range of tasks. Subsequently, LLMs are being increasingly used in scientific research to analyze data, generate synthetic data, or even write scientific articles. This trend necessitates that authors follow best practices for conducting and reporting LLM research and that journal reviewers can evaluate the quality of works that use LLMs. We provide authors of social-scientific research with essential recommendations to ensure replicable and robust results using LLMs. Our recommendations also highlight considerations for reviewers, focusing on methodological rigor, replicability, and validity of results when evaluating studies that use LLMs to automate data processing or simulate human data. We offer practical advice on assessing the appropriateness of LLM applications in submitted studies, emphasizing the need for transparency in methodological reporting and the challenges posed by the nondeterministic and continuously evolving nature of these models. By providing a framework for best practices and critical review, in this primer, we aim to ensure high-quality, innovative research in the evolving landscape of social-science studies using LLMs.

Files

Primer-for-Evaluating-Large-Language-Models-in-Social-Science-Research.pdf

Files (376.3 kB)

Additional details

Identifiers

DOI
10.1177/25152459251325174
Other
oai:uchicago.tind.io:14892

Funding

Defense Advanced Research Projects Agency
Influence Campaign Awareness and Sensemaking
Air Force Office of Scientific Research
A9550-23-1-0463

UChicago Information

Division(s)
Booth School of Business
Department(s)
Marketing