Computational systems biology has emerged as the dominant framework for analyzing high-dimensional “omics” data in order to uncover the relationships among molecules, networks and disease. In particular, many of the core methodologies are based on statistical modeling, including machine learning, stochastic processes and statistical inference. We will cover the key aspects of this methodology, including measuring associations, testing multiple hypotheses, and learning predictors, Markov chains and graphical models. In addition, by studying recent important articles in cancer systems biology, we will illustrate how this approach enhances our ability to annotate genomes, discover molecular disease networks, detect disease, predict clinical outcomes, and characterize disease progression. Whereas a solid foundation in probability and statistics is necessary, no prior exposure to molecular biology is required (although helpful) and the course may be accessible to advanced undergraduates.
A “readings” course organized around exceptional research articles in the recent literature in translational bioinformatics, systems biology and computational molecular medicine. Biomedical research has been transformed by the development of “omics” and other technologies. As a result, statistical learning, modeling and inference have emerged as core methodologies for analyzing these data to uncover the relationships between molecules, networks and disease, where knowledge extraction is formulated as a problem in high-dimensional pattern recognition. The papers are selected to illustrate how this methodology enhances our ability to annotate genomes, discover molecular disease networks, detect disease subtypes, predict clinical outcomes, and characterize disease progression, focusing mostly on human cancers. One major objective is to prepare students to comfortably read articles which involve machine learning and mathematical modeling. The papers will be presented by the students. However, all student expositions will be preceded by comprehensive “tutorials” by the instructor on the various computational and theoretical aspects required for understanding the papers.
Statistical modeling and inference, inductive learning and information theory together provide a cohesive framework for machine perception, which amounts to building a data-description machine converting physical measurements (images, molecular counts, etc.) to interpretations or descriptions. Recurring themes include quantifying uncertainty, estimating generalization error, model complexity, the bias/variance dilemma, small-sample learning and estimating interactions. Various problems in computational vision, speech and biology will be analyzed in this context, including visual tracking, object recognition, language modeling, molecular cancer diagnosis and learning gene networks.