Building a Data Infrastructure for the Bioeconomy
- Gopal P. Sarma and Melissa Haendel, “Building a Data Infrastructure for the Bioeconomy,” Issues in Science and Technology, May 18, 2021 [Journal]
Formal Methods for the Informal Engineer (FMIE) was a workshop held at the Broad Institute of MIT and Harvard in 2021 to explore the potential role of verified software in the biomedical software ecosystem. The motivation for organizing FMIE was the recognition that the life sciences and medicine are undergoing a transition from being passive consumers of software and AI/ML technologies to fundamental drivers of new platforms, including those which will need to be mission and safety-critical. Drawing on conversations leading up to and during the workshop, we make five concrete recommendations to help software leaders organically incorporate tools, techniques, and perspectives from formal methods into their project planning and development trajectories.
The intersection of medicine and machine learning (ML) has the potential to transform healthcare. We describe how physiology, a foundational discipline of medical training and practice with a rich quantitative history, could serve as a starting point for the development of a common language between clinicians and ML experts, thereby accelerating real-world impact.
Vannevar Bush enshrined the ‘basic’ and ‘applied’ research dichotomy on which much of science policy is still built 75 years later. However, it is time to assess whether this vision for science best serves the purposes of medical research and physician-scientists in the 21st century.
The training of physician-scientists lies at the heart of future medical research. In this commentary, we apply Narayanamurti and Odumosu’s framework of the “discovery-invention cycle” to analyze the structure and outcomes of the integrated MD/PhD program. We argue that the linear model of “bench-to-bedside” research, which is also reflected in the present training of MD/PhDs, merits continual re-evaluation to capitalize on the richness of opportunities arising in clinical medicine. In addition to measuring objective career outcomes, as existing research has done, we suggest that detailed characterization of researchers’ efforts using both qualitative and quantitative techniques is necessary to understand if dual-degree training is being utilized. As an example, we propose that the application of machine learning and data science to corpora of biomedical literature and anonymized clinical data might allow us to see if there are objective “signatures” of research uniquely enabled by MD/PhD training. We close by proposing several hypotheses for shaping physician-scientist training, the relative merits of which could be assessed using the techniques proposed above. Our overarching message is the importance of deeply understanding individual career trajectories as well as characterizing organizational details and cultural nuances to drive new policy which shapes the future of the physician-scientist workforce.
We describe a biologically-inspired research agenda with parallel tracks aimed at AI and AI safety. The bottom-up component consists of building a sequence of biophysically realistic simulations of simple organisms such as the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the zebrafish Danio rerio to serve as platforms for research into AI algorithms and system architectures. The top-down component consists of an approach to value alignment that grounds AI goal structures in neuropsychology, broadly considered. Our belief is that parallel pursuit of these tracks will inform the development of value-aligned AI systems that have been inspired by embodied organisms with sensorimotor integration. An important set of side benefits is that the research trajectories we describe here are grounded in long-standing intellectual traditions within existing research communities and funding structures. In addition, these research programs overlap with significant contemporary themes in the biological and psychological sciences such as data/model integration and reproducibility.
Abstract: The adoption of powerful software tools and computational methods from the software industry by the scientific research community has resulted in a renewed interest in integrative, large-scale biological simulations. These typically involve the development of computational platforms to combine diverse, process-specific models into a coherent whole. The OpenWorm Foundation is an independent research organization working towards an integrative simulation of the nematode Caenorhabditis elegans, with the aim of providing a powerful new tool to understand how the organism’s behaviour arises from its fundamental biology. In this perspective, we give an overview of the history and philosophy of OpenWorm, descriptions of the constituent sub-projects and corresponding open-science management practices, and discuss current achievements of the project and future directions.
Abstract: Integrative biological simulations have a varied and controversial history in the biological sciences. From computational models of organelles, cells, and simple organisms, to physiological models of tissues, organ systems, and ecosystems, a diverse array of biological systems have been the target of large-scale computational modeling efforts. Nonetheless, these research agendas have yet to prove decisively their value among the broader community of theoretical and experimental biologists. In this commentary, we examine a range of philosophical and practical issues relevant to understanding the potential of integrative simulations. We discuss the role of theory and modeling in different areas of physics and suggest that certain sub-disciplines of physics provide useful cultural analogies for imagining the future role of simulations in biological research. We examine philosophical issues related to modeling which consistently arise in discussions about integrative simulations and suggest a pragmatic viewpoint that balances a belief in philosophy with the recognition of the relative infancy of our state of philosophical understanding. Finally, we discuss community workflow and publication practices to allow research to be readily discoverable and amenable to incorporation into simulations. We argue that there are aligned incentives in widespread adoption of practices which will both advance the needs of integrative simulation efforts as well as other contemporary trends in the biological sciences, ranging from open science and data sharing to improving reproducibility.
Abstract: The growth of the software industry has gone hand in hand with the development of tools and cultural practices for ensuring the reliability of complex pieces of software. These tools and practices are now acknowledged to be essential to the management of modern software. As computational models and methods have become increasingly common in the biological sciences, it is important to examine how these practices can accelerate biological software development and improve research quality. In this article, we give a focused case study of our experience with the practices of unit testing and test-driven development in OpenWorm, an open-science project aimed at modeling Caenorhabditis elegans. We identify and discuss the challenges of incorporating test-driven development into a heterogeneous, data-driven project, as well as the role of model validation tests, a category of tests unique to software which expresses scientific models.