One of life’s deepest mysteries is how a single fertilized egg gives rise to an entire organism — a body built from many distinct cell types, each in its proper place and performing its own job, yet all carrying exactly the same genome. How can identical instruction manuals produce a neuron, a muscle fiber, and a skin cell? The answer lies in which genes each cell switches on or off. The genome acts like a computer program: it tells genes when to turn on and off, wires them into intricate networks in which genes regulate one another, and ultimately orchestrates the functions that keep us alive. Decoding this program would transform human health. It would help us understand how an organism develops, why mutations and breakdowns in regulation cause birth defects and disease, and how we might redirect a cell’s fate to treat illness, regenerate damaged organs, or even slow aging. The challenge is daunting: the sheer number of molecules involved, and the combinatorial ways they regulate one another, make the program extraordinarily hard to read.
The era of big data offers new hope. Modern single-cell techniques have produced an unprecedented wealth of data — in particular, detailed snapshots of which genes are active in individual cells. A central goal of my lab is to extract the underlying biology from these snapshots. The task is like being handed thousands of high-resolution photographs and having to assemble them into a movie — and then to figure out what drives the decisions and behavior of each character on screen. Like many researchers, we use powerful machine-learning and artificial-intelligence tools. What sets our approach apart is that we build methods grounded in rigorous physics and mathematics to describe how molecules within and between cells communicate and regulate one another — what we call physics-based virtual cells and virtual organisms.
Two ongoing projects illustrate how we learn biology from data. With collaborators at Harvard Medical School and UCLA, we compared the regulatory dynamics of young and aged muscle stem cells using single-cell data, revealing how aging disrupts cellular function. Our analysis pointed to specific interventions that might rejuvenate aged cells — predictions our collaborators then confirmed in the laboratory. In a second project, funded by the NIH and carried out with colleagues at the Hillman Cancer Center, we are investigating why head and neck cancer patients with and without human papillomavirus (HPV) infection respond so differently to radiation and immunotherapy, and how combination treatments might sensitize their responses.
Biology is entering an exciting era, maturing into a quantitative and predictive science much as physics and chemistry did before it. The history of physics offers a useful guide: the progression from careful data collection (Tycho Brahe) to data-mining (Kepler’s laws) to fundamental theory (Newton’s equations). Only at that final stage did physics gain a self-consistent framework that powered centuries of progress. Today’s excitement — including much of the work on large language models — still lives mostly at the data-mining stage. My lab aims for the next step: to formulate theory in the life sciences, to find the “Newton’s equations” of cells and organisms. Such physics-based virtual cells and organisms promise not only a mechanistic understanding of life, but also a rational guide for medical intervention.