OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans

Abstract: The adoption of powerful software tools and computational methods from the software industry by the scientific research community has resulted in a renewed interest in integrative, large-scale biological simulations. These typically involve the development of computational platforms to combine diverse, process-specific models into a coherent whole. The OpenWorm Foundation is an independent research organization working towards an integrative simulation of the nematode Caenorhabditis elegans, with the aim of providing a powerful new tool to understand how the organism’s behaviour arises from its fundamental biology. In this perspective, we give an overview of the history and philosophy of OpenWorm, descriptions of the constituent sub-projects and corresponding open-science management practices, and discuss current achievements of the project and future directions.

  • Gopal P. Sarma, et al. “OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans.” Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 373.1758 (2018).[Journal]

AI Safety and Reproducibility: Establishing Robust Foundations for the Neuropsychology of Human Values

Abstract: We propose the creation of a systematic effort to identify and replicate key findings in neuropsychology and allied fields related to understanding human values. Our aim is to ensure that research underpinning the value alignment problem of artificial intelligence has been sufficiently validated to play a role in the design of AI systems.

  • Gopal P. Sarma, Nick J. Hay, and Adam Safron, “AI Safety and Reproducibility: Establishing Robust Foundations for the Neuropsychology of Human Values.” International Conference on Computer Safety, Reliability, and Security, pp. 507-512 (2018) [Proceedings][Preprint]

Integrative biological simulation praxis: Considerations from physics, philosophy, and data/model curation practices

Abstract: Integrative biological simulations have a varied and controversial history in the biological sciences. From computational models of organelles, cells, and simple organisms, to physiological models of tissues, organ systems, and ecosystems, a diverse array of biological systems have been the target of large-scale computational modeling efforts. Nonetheless, these research agendas have yet to prove decisively their value among the broader community of theoretical and experimental biologists. In this commentary, we examine a range of philosophical and practical issues relevant to understanding the potential of integrative simulations. We discuss the role of theory and modeling in different areas of physics and suggest that certain sub-disciplines of physics provide useful cultural analogies for imagining the future role of simulations in biological research. We examine philosophical issues related to modeling which consistently arise in discussions about integrative simulations and suggest a pragmatic viewpoint that balances a belief in philosophy with the recognition of the relative infancy of our state of philosophical understanding. Finally, we discuss community workflow and publication practices to allow research to be readily discoverable and amenable to incorporation into simulations. We argue that there are aligned incentives in widespread adoption of practices which will both advance the needs of integrative simulation efforts as well as other contemporary trends in the biological sciences, ranging from open science and data sharing to improving reproducibility.

  • Gopal Sarma and Victor Faundez, “Integrative biological simulation praxis: Considerations from physics, philosophy, and data/model curation practices”, Cellular Logistics 7(4) (2017). [Journal]

Robust Computer Algebra, Theorem Proving, and Oracle AI

Abstract: In the context of superintelligent AI systems, the term “oracle” has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research problems related to oracles which align with long-term research goals of AI safety. We examine existing question answering systems and argue that their high degree of architectural heterogeneity makes them poor candidates for rigorous analysis as oracles. On the other hand, we identify computer algebra systems (CASs) as being primitive examples of domain-specific oracles for mathematics and argue that efforts to integrate computer algebra systems with theorem provers, systems which have largely been developed independent of one another, provide a concrete set of problems related to the notion of provable safety that has emerged in the AI safety community. We review approaches to interfacing CASs with theorem provers, describe well-defined architectural deficiencies that have been identified with CASs, and suggest possible lines of research and practical software projects for scientists interested in AI safety.

  • Gopal P. Sarma and Nick J. Hay, “Robust Computer Algebra, Theorem Proving, and Oracle AI”, Informatica 41(3) (2017). [Journal][Preprint]

Doing Things Twice (Or Differently): Strategies to Identify Studies for Targeted Validation

Abstract: The “reproducibility crisis” has been a highly visible source of scientific controversy and dispute. Here, I propose and review several avenues for identifying and prioritizing research studies for the purpose of targeted validation. Of the various proposals discussed, I identify scientific data science as being a strategy that merits greater attention among those interested in reproducibility. I argue that the tremendous potential of scientific data science for uncovering high-value research studies is a significant and rarely discussed benefit of the transition to a fully open-access publishing model.

  • Gopal P. Sarma, “Doing Things Twice (Or Differently): Strategies to Identify Studies for Targeted Validation” (2017). [Preprint]

Scientific Literature Text Mining and the Case for Open Access

Abstract: “Open access” has become a central theme of journal reform in academic publishing. In this article, I examine the relationship between open access publishing and an important infrastructural element of a modern research enterprise, scientific literature text mining, or the use of data analytic techniques to conduct meta-analyses and investigations into the scientific corpus. I give a brief history of the open access movement, discuss novel journalistic practices, and an overview of data-driven investigation of the scientific corpus. I argue that particularly in an era where the veracity of many research studies has been called into question, scientific literature text mining should be one of the key motivations for open access publishing, not only in the basic sciences, but in the engineering and applied sciences as well. The enormous benefits of unrestricted access to the research literature should prompt scholars from all disciplines to lend their vocal support to enabling legal, wholesale access to the scientific literature as part of a data science pipeline.

  • Gopal P. Sarma, “Scientific Literature Text Mining and the Case for Open Access”, The Journal of Open Engineering (2017). [Journal]

Mammalian Value Systems

Abstract: Characterizing human values is a topic deeply interwoven with the sciences, humanities, political philosophy, art, and many other human endeavors. In recent years, a number of thinkers have argued that accelerating trends in computer science, cognitive science, and related disciplines foreshadow the creation of intelligent machines which meet and ultimately surpass the cognitive abilities of human beings, thereby entangling an understanding of human values with future technological development. Contemporary research accomplishments suggest increasingly sophisticated AI systems becoming widespread and responsible for managing many aspects of the modern world, from preemptively planning users’ travel schedules and logistics, to fully autonomous vehicles, to domestic robots assisting in daily living. The extrapolation of these trends has been most forcefully described in the context of a hypothetical “intelligence explosion,” in which the capabilities of an intelligent software agent would rapidly increase due to the presence of feedback loops unavailable to biological organisms. The possibility of superintelligent agents, or simply the widespread deployment of sophisticated, autonomous AI systems, highlights an important theoretical problem: the need to separate the cognitive and rational capacities of an agent from the fundamental goal structure, or value system, which constrains and guides the agent’s actions. The “value alignment problem” is to specify a goal structure for autonomous agents compatible with human values. In this brief article, we suggest that recent ideas from affective neuroscience and related disciplines aimed at characterizing neurological and behavioral universals in the mammalian kingdom provide important conceptual foundations relevant to describing human values. We argue that the notion of “mammalian value systems” points to a potential avenue for fundamental research in AI safety and AI ethics.

  • Gopal P. Sarma and Nick J. Hay, “Mammalian Value Systems”, Informatica 41(3) (2017). [Journal][Preprint]

Unit Testing, Model Validation, and Biological Simulation

Abstract: The growth of the software industry has gone hand in hand with the development of tools and cultural practices for ensuring the reliability of complex pieces of software. These tools and practices are now acknowledged to be essential to the management of modern software. As computational models and methods have become increasingly common in the biological sciences, it is important to examine how these practices can accelerate biological software development and improve research quality. In this article, we give a focused case study of our experience with the practices of unit testing and test-driven development in OpenWorm, an open-science project aimed at modeling Caenorhabditis elegans. We identify and discuss the challenges of incorporating test-driven development into a heterogeneous, data-driven project, as well as the role of model validation tests, a category of tests unique to software which expresses scientific models.

  • Gopal P. Sarma, Travis W. Jacobs, Mark D. Watts, S. Vahid Ghayoomie, Stephen D. Larson, and Rick C. Gerkin, “Unit Testing, Model Validation, and Biological Simulation”, F1000Research 5:1946 (2016). [Journal][Preprint]