Using public clinical trial reports to probe non-experimental causal inference methods

Steinberg, Ethan; Ignatiadis, Nikolaos; Yadlowsky, Steve; Xu, Yizhe; Shah, Nigam

doi:10.6082/6zjh4-g0j31

Published September 9, 2023 | Version v1

Journal article Open

Using public clinical trial reports to probe non-experimental causal inference methods

1. Stanford University
2. University of Chicago
3. Google

Background: Non-experimental studies (also known as observational studies) are valuable for estimating the effects of various medical interventions, but are notoriously difficult to evaluate because the methods used in non-experimental studies require untestable assumptions. This lack of intrinsic verifiability makes it difficult both to compare different non-experimental study methods and to trust the results of any particular non-experimental study.

Methods: We introduce TrialProbe, a data resource and statistical framework for the evaluation of non-experimental methods. We first collect a dataset of pseudo "ground truths" about the relative effects of drugs by using empirical Bayesian techniques to analyze adverse events recorded in public clinical trial reports. We then develop a framework for evaluating non-experimental methods against that ground truth by measuring concordance between the non-experimental effect estimates and the estimates derived from clinical trials. As a demonstration of our approach, we also perform an example methods evaluation between propensity score matching, inverse propensity score weighting, and an unadjusted approach on a large national insurance claims dataset.

Results: From the 33,701 clinical trial records in our version of the ClinicalTrials.gov dataset, we are able to extract 12,967 unique drug/drug adverse event comparisons to form a ground truth set. During our corresponding methods evaluation, we are able to use that reference set to demonstrate that both propensity score matching and inverse propensity score weighting can produce estimates that have high concordance with clinical trial results and substantially outperform an unadjusted baseline.

Conclusions: We find that TrialProbe is an effective approach for probing non-experimental study methods, being able to generate large ground truth sets that are able to distinguish how well non-experimental methods perform in real world observational data.

Data availability

Our code is available at https://github.com/som-shahlab/TrialProbe. The source clinical trial records can be found at clinicaltrials.gov. The data we used in our case study, Optum's Clinformatics Data Mart Database, is not publicly available as it is a commercially licensed product. In order to get access to Optum's Clinformatics Data Mart Database, it is generally necessary to reach out to Optum directly to obtain both a license and the data itself. Contact information and other details about how to get access can be found on the product sheet [39]. Optum is the primary long term repository for their datasets and we are not allowed to maintain archive copies past our contract dates.

Files

Using-public-clinical-trial-reports-to-probe-non-experimental-causal-inference-methods.pdf

Files (1.9 MB)

Name	Size	Download all
Using-public-clinical-trial-reports-to-probe-non-experimental-causal-inference-methods.pdf md5:18cceea8fe25f02303b8a1f63bbe4895	1.9 MB	Preview Download

Additional details

DOI: 10.1186/s12874-023-02025-0
Other: oai:uchicago.tind.io:7960

NLM
R01-LM011369-05

Division(s): Physical Sciences Division
Department(s): Statistics

Views

Downloads

Show more details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

More info on how stats are collected....

DOI

Resource type

Journal article

Publisher

University of Chicago

Published in

BMC Medical Research Methodology, 2023.

Languages

English

License

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more
Distribution License

No further description.

Copyrights

© The Author(s) 2023 This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Technical metadata

Created: May 22, 2026
Modified: May 22, 2026

Using public clinical trial reports to probe non-experimental causal inference methods

Data availability

Files

Using-public-clinical-trial-reports-to-probe-non-experimental-causal-inference-methods.pdf

Files (1.9 MB)

Additional details

Identifiers

Funding

UChicago Information

Using public clinical trial reports to probe non-experimental causal inference methods

Creators

Description

Data availability

Files

Using-public-clinical-trial-reports-to-probe-non-experimental-causal-inference-methods.pdf

Files (1.9 MB)

Additional details

Identifiers

Funding

UChicago Information