Carnegie Mellon University
July 23, 2024

By the Numbers

Quantitative Biology Embraces New Techniques to Tackle Big Data

By Kirsten Heuring

Heidi Opdyke
  • Interim Director of Communications, MCS
  • 412-268-9982

All biology starts with data.

"In biology, there is so much data to be had," said Veronica Hinman, head of Carnegie Mellon University's Department of Biological Sciences and the Dr. Frederick A. Schwertz Distinguished Professor of Life Sciences. "Having new tools to understand this data is of profound importance."

Carnegie Mellon University's Department of Biological Sciences is bringing big data and new technology to the forefront by putting an emphasis on quantitative biology.

"What we really want to do is uncover mechanisms of biology that are important for humanity, whether it be health, the environment or basic processes of biology," Hinman said. "Experimentation is the core of what we do, but there are new tools and technologies that allow you to get masses of data, and you have new statistical and AI tools to make sense of that data. At Carnegie Mellon, we're well poised to take advantage of that."

The Department of Biological Sciences has fostered a community that uses quantitative biology as part of research. Quantitative biology uses big data and new tools, like artificial intelligence (AI) and machine learning, to answer research questions.

"We've targeted hiring people who would be generating a lot of data and who would be interested in using new quantitative biology approaches. We've got a lot of faculty that are smart and savvy," Hinman said.

New Tools, New Solutions

Professors are using big data to create new tools, that - in turn - are used to acquire more big data. Technology is providing a self-perpetual model for scientific advancements.

"There are new tools that allow you to get masses of data. The problem is trying to find patterns and trying to distill what that data is telling you," Hinman said.

Eric Yttri, Eberly Family Associate Professor of Biological Sciences, developed a machine learning program known as A-SOiD with Ph.D. alumnus Alex Hsu and University of Bonn researchers. A-SOiD allows researchers to upload data, such as video of a human waving, and train the program recognize identifiable patterns.

The difference between A-SOiD and other machine learning models is its guided learning. Most machine learning programs are black boxes, Yttri said, where researchers input data and the program reports results. But, researchers do not know exactly how or why the program comes to the final results.

A-SOiD allows researchers to show the program what it did wrong and re-learn from its mistakes. With proper data and training, A-SOiD can tell the difference between a healthy person's wave and the tremors of a patient with Parkinson's disease.

"This technique works great at learning classifications of animal behaviors," Yttri said. "We're going to be using it to address neuroscience questions."

Yttri said he plans to use A-SOiD in combination with other techniques to make connections between neural mechanisms and spontaneous behaviors. This new tool will allow Yttri and other researchers to understand nuances human and animal behavior.

Yongxin (Leon) Zhao, Eberly Family Associate Professor of Biological Sciences, is building tools to pull big data from microscopic material.

Zhao's expansion microscopy technique known as Magnify is a set of protocols that use hydrogel to expand cells while keeping the cell's organelles, proteins and nucleic acids intact.

AI tools are enhancing his work further. Using Magnify, he expanded both cancerous and healthy cells and imaged them. Zhao and collaborators are using the images to train an AI tool to identify which cancerous cells could potentially respond to specific treatments.

"Magnify showed some features that allows AI to actually predict whether cancer patients can respond to chemotherapy, which is something that pathologists do not currently have the capability to do," Zhao said. "If it's successful, pathologists will have that capability, and oncologists will be able to optimize their plans."

Microscopic Components, Big Data

Many biological science professors are gathering large quantities of data from RNA and cellular mechanisms.

Joel McManus, associate professor of biological sciences, investigates gene expression, particularly how messenger RNA (mRNA) are used to synthesize proteins, a process called translation. He uses RNA sequencing and ribosome profiling to determine how efficiently mRNA from different genes are translated into proteins. Since ribosomes bind to mRNA, ribosome profiling allows researchers to see the locations of ribosomes on mRNA, the amount of mRNA expressed and the number of ribosomes bound to each mRNA.

The McManus lab has also developed two massively parallel reporter assays to comb through tens of thousands of mRNA sequences to uncover how many ribosomes they can load and how much protein is synthesized.

"We generate a lot of data," McManus said.

With the help of former Ph.D. student Christina Akirtava, and lab research scientist Gemma May, the lab used machine learning to develop models that explain how the features of these sequences affect ribosome loading in yeast. Now, the lab is applying the same approach to investigate translation of human genes, with a broad goal of cracking the code for mRNA translation.

"When we compare mammalian genomes to the human genome, there are regions where everyone has the same sequence," McManus said. "These regions are very likely to have important functions."

En Cai, assistant professor of biological sciences, investigates T-cell activation. T-cells are a type of white blood cells that detect and destroy cells that carry pathogens in the body.

Cai uses a combination of expansion microscopy and high-resolution imaging to create 3D images that show how T-cells respond at different points of activation, both before and after being exposed to pathogens. She has built a large data set that includes various markers of the different phases of T-cell activation. The next step is to collaborate with researchers in the Ray and Stephanie Lane Department of Computational Biology to design a machine-learning based image analysis platform that can precisely identify the cell's signaling stages based on these markers.

"There are a lot of features that we can already see, but they're difficult to quantify," Cai said. "What we're trying to do first is get to the quantification of features reliably. We want to be able to use an image to see what kind of cell we're seeing and in what state."

Next-Gen Researchers

Cai and other faculty members aim to have students in the Department of Biological Sciences ready for a future with big data, machine learning and AI. Cai teaches a graduate course on statistical and computational techniques that includes introducing students to the programming language Python.

"Students run into a lot of different types of data, for example, sequencing data, protein structure data, and imaging data. I want them to become confident that they know how to work with biological data and quantify it, so they can use it in their own research," Cai said.

For the final project, students are encouraged to apply the tools they've learned in the course to analyze raw data collected from real experiments. When she first taught the course in 2022, she had approximately 40 students; it has since expanded to more than 60 students.

Hinman said that the department will continue to emphasize the importance of quantitative biology tools, methods and data collection techniques. She said as the department expands, students, faculty and staff will continue to break new ground.

"I hope we maintain our focus on making important findings, doing rigorous, careful science that's going to stand the test of time," Hinman said. "We want to make the sort of findings that will make people 20 years from now say, 'that changed our way of thinking about something, and it's held true.'"

— Related Content —