by Robert Thurman, Principal Computational Biologist,
Computational biologists come in two types: those who were originally trained mathematically or computationally and then gravitated towards biological problems, and those who were formally trained on the biological side but couldn’t stay away from computers. That generalization is slightly dated, because many colleges now offer interdisciplinary degree programs in computational biology and bioinformatics, but those programs tend to be small and it is safe to say most practioners currently in the field started out as one of the two types. Both perspectives are important.
I started with bachelor’s degrees in mathematics and computer science, and then took a break between my Master’s and PhD to work as a programmer for NASA’s Jet Propulsion Laboratory. I felt a call back to mathematics, and after following a traditional academic path from PhD to post-doc to tenure-track university teaching position, I switched gears again and took a programming position in the research division of a statistical software company. It was there I was exposed to machine learning techniques applied to biological problems, setting a course towards my current career in computational biology. There are many such winding paths into the field.
Recently I gave a presentation in Research Forum, which is a weekly opportunity at my company to share current results with the rest of the research community. We make targeted therapies for cancer, and I was hired to establish a devoted computational biology function. The forum was the first opportunity to try to neatly summarize what we do as computational biologists. It was a challenge. On the one hand most people in the audience had some exposure to our work, because we collaborate with every group in research. And some functions are well-established and well-known — we do a lot of genomics, for instance, trying to untangle which genes are regulated under treatment, or which ones might presage resistance to therapy. But the nature of our role is also highly varied. In some ways we are analytical “fixers,” and we are happy to take on any kind of problem related to data analysis. In trying to concisely categorize this type of work for my presentation, the best I could come up with was…”Math.” It’s maybe a bit far from my PhD in complex analysis, but a definite path can be traced back to those roots. And there is a lot of math in this work, albeit in service to a specific (and valuable) purpose.
It’s truly an exciting time to be working in the field of computational biology, especially as it is applied to finding treatments for devastating, tough-to-treat diseases like cancer. Advances in biological understanding and experimental capabilities on the one side, and computational capacity and algorithmic sophistication on the other, have opened the way to new treatments and new tests to get the best therapies to the right patients. Breakthrough advances like immunotherapy have dramatically changed the prognosis for some patients. Advanced non-small-cell lung cancer (NSCLC), for example, has a terrible prognosis, with a 5 year survival rate near 0% for more advanced cases1. But so-called checkpoint inhibitors like nivolumab and atezolizumab, which target the cell surface proteins PD-1 and PD-L1 and free up the body’s immune system to attack cancer, have in some cases doubled overall survival rates compared to previous standards of care2. This level of improvement is virtually unheard of for new cancer therapies, and it means that the field now cautiously uses the word “cures” in cases it never could before. However, only a subset of patients respond to this type of therapy. So the race is on to 1) find biomarkers, that is, some measurable patient characteristics that can predict who is most likely to respond or not respond; and 2) find other immune checkpoints that are successfully druggable.
Computational biology and bioinformatics have prominent roles to play in both of these endeavors. The search for biomarkers involves sifting through data in which dozens to thousands of variables are collected on patients: from height, weight, age and gender, to the number, length and types of previous treatments, to genomic features like gene expression and mutations measured across potentially hundreds or thousands of genes. All of these patient characteristics are then compared to clinical results to see if any variable, alone or in combination with others, could be related to response. Because it is often the case in these types of problems that there are more variables than patients, modern machine learning techniques, such as regularization and random forests, can be used to overcome the limitations of under-determined systems and identify which variables are most important in predicting response. In my own work I use these techniques as well, to try to understand, for instance, what measurable characteristics of our drugs (which are fairly complicated in their mechanisms of action) contribute most to their potency in an in vitro setting. (This would be a good place to add, as a general recommendation to others as well, that I wish I had taken more statistics!)
Finding new immune checkpoints is a special case of the general problem of finding new drug “targets”. This usually means identifying a host molecule like a protein or gene product that is in some way important for the progression of a disease, and whose function can be altered or co-opted with a drug. Computational biology contributes in important ways to this as well. While a traditional approach to finding new targets might be to follow up on a research article that addresses some specific fundamental biology, modern data mining techniques can be applied to vast public data resources like the The Cancer Genome Atlas (TCGA)3 to scan the entire genome, across all cancers, for genes that are, say, preferentially expressed in cancer compared to normal tissue.
Such an exercise ties directly into one of the pleasures of the field — a lot of the data is public, and most of the tools are open source. So a new “experiment” for computational biology practitioners can be as easy as clicking a few links, downloading some data (making sure you have enough local storage space — the datasets can be huge), and writing some code. Speaking of which, another recommendation to those interested in the field is this: learn R. This open-source statistical package is an industry standard and my daily workhorse. Through its vast contributor network, R has seemingly a package to do everything, including providing an easy-to-use framework for making web apps for visualizing and sharing data.
So, what does mathematics (at least, the math I spent all that time studying for my PhD) have to do with my new career? While I’m not proving theorems anymore, I would argue that my PhD experience provided important training for my work in a number of ways. A critical, analytical perspective is obviously important for both endeavors. Also, having a PhD background means mathematics is not a barrier to understanding new statistical techniques, and I can focus instead on the ideas. A love of learning, and a humility and curiosity about what you don’t know, are also crossover values. In my job, as in my PhD study, each day means another opportunity to learn, keeping things fresh and interesting. Finally, this is not a job for those who prefer to work alone. Creating new therapies is a complex, collaborative, multi-disciplinary endeavor, requiring clear communication with all the stakeholders. One of the joys of the position is to work with scientists who are not computationally or mathematically oriented and help translate their questions into concrete analytical problems. Teaching experience in academia has really helped in that regard, since it strengthened my skills of listening and explaining.
For those who love math, love programming, and love learning new things, computational biology is a great career option, and provides an opportunity to make a concrete difference in people’s lives.
1 American Cancer Society, https://www.cancer.org/cancer/non-small-cell-lung-cancer/detection-diagnosis-staging/survival-rates.html
2 “Further Evidence that Immunotherapy Provides a Longterm Survival Benefit for Lung Cancer Patients,” R&D online, 12 Apr 2018, https://www.rdmag.com/news/2018/04/further-evidence-immunotherapy-provides-longterm-survival-benefit-lung-cancer-patients
3 The Cancer Genome Atlas, https://cancergenome.nih.gov/