Published February 27, 2019 | Version v1
Journal article Open

Script of Scripts: A pragmatic workflow system for daily computational research

  • 1. University of Chicago
  • 2. University of Texas

Description

Computationally intensive disciplines such as computational biology often require use of a variety of tools implemented in different scripting languages and analysis of large data sets using high-performance computing systems. Although scientific workflow systems can powerfully organize and execute large-scale data-analysis processes, creating and maintaining such workflows usually comes with nontrivial learning curves and engineering overhead, making them cumbersome to use for everyday data exploration and prototyping. To bridge the gap between interactive analysis and workflow systems, we developed Script of Scripts (SoS), an interactive data-analysis platform and workflow system with a strong emphasis on readability, practicality, and reproducibility in daily computational research. For exploratory analysis, SoS has a multilanguage scripting format that centralizes otherwise-scattered scripts and creates dynamic reports for publication and sharing. As a workflow engine, SoS provides an intuitive syntax for creating workflows in process-oriented, outcome-oriented, and mixed styles, as well as a unified interface for executing and managing tasks on a variety of computing platforms with automatic synchronization of files among isolated file systems. As illustrated herein by real-world examples, SoS is both an interactive analysis tool and pipeline platform suitable for different stages of method development and data-analysis projects. In particular, SoS can be easily adopted in existing data analysis routines to substantially improve organization, readability, and cross-platform computation management of research projects.

Data availability

SoS is hosted at https://github.com/vatlab/SoS and is distributed under a 3-clause BSD license. It can be installed alone as a command line tool or as part of the SoS suite, in which an IDE and notebook interface are provided by SoS Notebook. Both classic Jupyter and JupyterLab are supported although the JupyterLab extension jupyterlab-sos is still evolving with development of JupyterLab. The SoS website (https://vatlab.github.io/sos-docs/) contains documentation, tutorials, examples of SoS, and a video library demonstrating the design and syntaxes of SoS. Although we frequently release new versions of SoS following a “release early, release often” development philosophy, we created and deposited version 0.18.1 of SoS to the Zenodo research data depository (doi: 10.5281/zenodo.1291523) for evaluation with this report. Examples described herein are available in the Publication section of the SoS documentation, as well as at Zenodo (doi:10.5281/zenodo.2537428).

Files

journal.pcbi.1006843.pdf

Files (1.7 MB)

Name Size Download all
Article
md5:3fd5a5c0042347697d649eb4149e00b3
1.7 MB Preview Download
md5:35c3b1c22c6bcc3050febd9d8a495a8c
25.4 kB Download

Additional details

Identifiers

DOI
10.1371/journal.pcbi.1006843
Other
oai:uchicago.tind.io:6309

Funding

National Human Genome Research Institute
R01HG008972
National Human Genome Research Institute
1R01HG005859
NCI
CA143883
Cancer Prevention and Research Institute of Texas
RP130397
Gordon and Betty Moore Foundation
GBMF #4559
Mary K. Chapman Foundation
Michael & Susan Dell Foundation
NCI
P30CA016672

UChicago Information

Division(s)
Biological Sciences Division
Department(s)
Human Genetics