A Winding Path from Complex Analysis to Computational Biology

by Robert Thurman, Principal Computational Biologist,  

Seattle Genetics,

Bothell, WA

Thurman 300

Computational biologists come in two types: those who were originally trained mathematically or computationally and then gravitated towards biological problems, and those who were formally trained on the biological side but couldn’t stay away from computers. That generalization is slightly dated, because many colleges now offer interdisciplinary degree programs in computational biology and bioinformatics, but those programs tend to be small and it is safe to say most practioners currently in the field started out as one of the two types. Both perspectives are important.

I started with bachelor’s degrees in mathematics and computer science, and then took a break between my Master’s and PhD to work as a programmer for NASA’s Jet Propulsion Laboratory. I felt a call back to mathematics, and after following a traditional academic path from PhD to post-doc to tenure-track university teaching position, I switched gears again and took a programming position in the research division of a statistical software company. It was there I was exposed to machine learning techniques applied to biological problems, setting a course towards my current career in computational biology. There are many such winding paths into the field.

Recently I gave a presentation in Research Forum, which is a weekly opportunity at my company to share current results with the rest of the research community. We make targeted therapies for cancer, and I was hired to establish a devoted computational biology function. The forum was the first opportunity to try to neatly summarize what we do as computational biologists. It was a challenge. On the one hand most people in the audience had some exposure to our work, because we collaborate with every group in research. And some functions are well-established and well-known — we do a lot of genomics, for instance, trying to untangle which genes are regulated under treatment, or which ones might presage resistance to therapy. But the nature of our role is also highly varied. In some ways we are analytical “fixers,” and we are happy to take on any kind of problem related to data analysis. In trying to concisely categorize this type of work for my presentation, the best I could come up with was…”Math.”  It’s maybe a bit far from my PhD in complex analysis, but a definite path can be traced back to those roots. And there is a lot of math in this work, albeit in service to a specific (and valuable) purpose.

It’s truly an exciting time to be working in the field of computational biology, especially as it is applied to finding treatments for devastating, tough-to-treat diseases like cancer. Advances in biological understanding and experimental capabilities on the one side, and computational capacity and algorithmic sophistication on the other, have opened the way to new treatments and new tests to get the best therapies to the right patients. Breakthrough advances like immunotherapy have dramatically changed the prognosis for some patients. Advanced non-small-cell lung cancer (NSCLC), for example, has a terrible prognosis, with a 5 year survival rate near 0% for more advanced cases1. But so-called checkpoint inhibitors like nivolumab and atezolizumab, which target the cell surface proteins PD-1 and PD-L1 and free up the body’s immune system to attack cancer, have in some cases doubled overall survival rates compared to previous standards of care2. This level of improvement is virtually unheard of for new cancer therapies, and it means that the field now cautiously uses the word “cures” in cases it never could before. However, only a subset of patients respond to this type of therapy. So the race is on to 1) find biomarkers, that is, some measurable patient characteristics that can predict who is most likely to respond or not respond; and 2) find other immune checkpoints that are successfully druggable.

Computational biology and bioinformatics have prominent roles to play in both of these endeavors. The search for biomarkers involves sifting through data in which dozens to thousands of variables are collected on patients: from height, weight, age and gender, to the number, length and types of previous treatments, to genomic features like gene expression and mutations measured across potentially hundreds or thousands of genes. All of these patient characteristics are then compared to clinical results to see if any variable, alone or in combination with others, could be related to response. Because it is often the case in these types of problems that there are more variables than patients, modern machine learning techniques, such as regularization and random forests, can be used to overcome the limitations of under-determined systems and identify which variables are most important in predicting response. In my own work I use these techniques as well, to try to understand, for instance, what measurable characteristics of our drugs (which are fairly complicated in their mechanisms of action) contribute most to their potency in an in vitro setting. (This would be a good place to add, as a general recommendation to others as well, that I wish I had taken more statistics!)

Finding new immune checkpoints is a special case of the general problem of finding new drug “targets”. This usually means identifying a host molecule like a protein or gene product that is in some way important for the progression of a disease, and whose function can be altered or co-opted with a drug. Computational biology contributes in important ways to this as well. While a traditional approach to finding new targets might be to follow up on a research article that addresses some specific fundamental biology, modern data mining techniques can be applied to vast public data resources like the The Cancer Genome Atlas (TCGA)3 to scan the entire genome, across all cancers, for genes that are, say, preferentially expressed in cancer compared to normal tissue.

Such an exercise ties directly into one of the pleasures of the field — a lot of the data is public, and most of the tools are open source. So a new “experiment” for computational biology practitioners can be as easy as clicking a few links, downloading some data (making sure you have enough local storage space — the datasets can be huge), and writing some code. Speaking of which, another recommendation to those interested in the field is this: learn R. This open-source statistical package is an industry standard and my daily workhorse. Through its vast contributor network, R has seemingly a package to do everything, including providing an easy-to-use framework for making web apps for visualizing and sharing data.

So, what does mathematics (at least, the math I spent all that time studying for my PhD) have to do with my new career?  While I’m not proving theorems anymore, I would argue that my PhD experience provided important training for my work in a number of ways. A critical, analytical perspective is obviously important for both endeavors. Also, having a PhD background means mathematics is not a barrier to understanding new statistical techniques, and I can focus instead on the ideas. A love of learning, and a humility and curiosity about what you don’t know, are also crossover values. In my job, as in my PhD study, each day means another opportunity to learn, keeping things fresh and interesting. Finally, this is not a job for those who prefer to work alone. Creating new therapies is a complex, collaborative, multi-disciplinary endeavor, requiring clear communication with all the stakeholders. One of the joys of the position is to work with scientists who are not computationally or mathematically oriented and help translate their questions into concrete analytical problems. Teaching experience in academia has really helped in that regard, since it strengthened my skills of listening and explaining.

For those who love math, love programming, and love learning new things, computational biology is a great career option, and provides an opportunity to make a concrete difference in people’s lives.


1 American Cancer Society, https://www.cancer.org/cancer/non-small-cell-lung-cancer/detection-diagnosis-staging/survival-rates.html

2 “Further Evidence that Immunotherapy Provides a Longterm Survival Benefit for Lung Cancer Patients,” R&D online, 12 Apr 2018, https://www.rdmag.com/news/2018/04/further-evidence-immunotherapy-provides-longterm-survival-benefit-lung-cancer-patients

3 The Cancer Genome Atlas, https://cancergenome.nih.gov/

My BIG Math Experience: A Java Web Development Boot Camp

Joyce-YangJoyce C. Yang

This summer, Codework Academy at Montgomery College had its first Java Web Development boot camp.  The program was in Gaithersburg, MD and taught students how to write and deploy web applications in eight weeks.  I participated in the boot camp, which was full-time, 9 am to 5 pm every day.  Starting from object-oriented programming fundamentals, I learned how to think like a programmer. The main things I gained from the camp were the programming skills and the professional network.

bootcamp

One of the model-view-controller (MVC) projects that our team worked on: a boot camp finder that that enabled users to search for boot camps from a database, apply to a camp as a student, and accept applicants as an administrator. Top: a preliminary version of the code for the boot camp model.  Bottom: the final version on the live site

Programming Skills

Until the boot camp, I did not have experience in Java or C.  While looking for employment opportunities, I examined software engineering job listings and they generally required those languages.  Since I had had experience in Python, R, Matlab, and Visual Basic, I was familiar with programming fundamentals.   The Java boot camp was a good way to learn new programming concepts that were relevant and apply them immediately.

Some of the skills I learned

  • Using relational database management systems—we used MySQL and PostgreSQL
  • Using the concept of encapsulation for data hiding
  • Making “Input–Processing- Output” (IPO) diagrams
  • Developing the model, view, and controller of an application
  • Using an application framework (Spring) to streamline the development process
  • Deploying applications to a cloud service (Heroku)

 

Building networks

I learned a lot outside the classroom by talking to others, and I expanded my professional network.  One graduate student had switched majors from chemical engineering to computer science, and they helped me decide to learn more about careers in web development.  Another student was considering applying to a four-year college, and in my capacity as a college graduate, I offered some advice.  The Montgomery College web development boot camp was supported by a grant.  As a result, it was completely free, and people who were underemployed and unemployed could attend! Students were constantly talking about new ways of solving problems, and the environment was collaborative.

The boot camp was quite challenging, and students needed to meet strict requirements.  The program’s aim was to make assignments as close to “real life” as possible.  Each day at camp consisted of testing code, determining new issues to fix, and fixing them.  One of the main differences between web development and math is that web development does not usually have well-posed problems.  There can be times when the problem is not clear.  I was prepared for the boot camp, but I wish that, before I started, I had learned a bit more object-oriented programming.   Overall, I gained software skills and a great professional network from this Java web development boot camp.

 

About the author: Joyce C. Yang graduated from Harvey Mudd College in December 2016 with a Bachelor’s degree in Mathematics. An experienced K-12 teacher, she has also worked on research problems in graph theory, statistics, and abstract algebra.  Currently living in the DC area, she is looking for employment opportunities. Joyce can be reached at jcyang@hmc.edu

ICME Data Science Workshop

ICME will hold one-day summer workshops on Fundamentals of Data Science from August 14-18th at Stanford.  You can sign up for one workshop, or several, with topics ranging from Machine Learning to Natural Language Processing to Programming in R. Visit our Summer Workshops website for more information and to register.  

Feel free to spread the word! We hope to see you there.

Judy and the ICME team

Judy Logan

Institute for Computational and Mathematical Engineering (ICME)

Stanford University

Women in Data Science (WiDS) Conference

Free AAAS Career Webinar

AAAS Career Development Center logo

Webcast: Transitioning into a Non-academic Career
Tuesday, June 2012:00-1:00 p.m. EDTRegister now

This workshop explores the skills and best practices for transitioning from an academic environment to one of many non-academic career paths. It introduces strategies for career planning, emphasizing an ongoing process for professional development throughout your career.

Join us for this FREE webcast!


Presenter: Josh Henkin, PhD – Founder, STEM Career Services, LLC

Josh Henkin
Josh is the founder of STEM Career Services, a career coaching company aimed at helping STEM graduates launch and sustain careers outside of academia. He conducts workshops at conferences, universities and institutes across the country and provides career coaching to STEM graduates at all levels of their careers. Josh sits on the National Postdoctoral Association Board of Directors. He is also an AAAS Science and Technology Policy Fellow Alum, AAAS member, and is an AAAS Career Development Center subject matter expert.

Agenda:

  • Being strategic in your career planning
  • What skills you need for non-academic jobs and how to acquire these skills while still in the lab
  • Networking as a part of life
  • Crafting your “elevator pitch”
  • How to create a master resume (inclusive of all your skills)
  • Creating position-specific resumes
  • Ample time will be provided for Q&A with Josh

Opportunity: NASA High Performance Fast Computing Challenge

NASA_logoFULL NOTICE HERE

Overview (from the site above)

Do you want to help aerospace engineers solve problems faster? Does the phrase “nonlinear partial differential equations used for unsteady computations” excite you? Do you want to try yourself with the complex computational software that NASA scientists use? This might be the challenge for you.

NASA’s Aeronautics Research Mission Directorate (ARMD) is responsible for developing technologies that will enable future aircraft to burn less fuel, generate fewer emissions and make less noise.  Every U.S. aircraft and U.S. air traffic control tower has NASA-developed technology on board. It’s why we like to say, NASA is with you when you fly!

We need to increase the speed of computations on the Pleiades supercomputer, specifically for computational fluid dynamics, by orders of magnitude, and could use your help!

This isn’t a quest for the faint of heart. As a participant, you’ll need to gain access to FUN3D software through an application process with the US Government.  Although this software usually runs on the Pleiades supercomputer, you can download and run it locally after applying HERE.

 

Background

NASA’s Aeronautics Research Mission Directorate (ARMD) is tasked with innovating at the cutting edge of aerospace.  Their work includes Innovation in Commercial Supersonic Aircraft, Ultra-efficient Commercial Vehicles and Transitioning to Low-Carbon Propulsion while also supporting the development of launch vehicles and planetary entry systems.  These strategic thrusts are supported by advanced computational tools, which enable reductions in ground-based and in-flight testing, provide added physical insight, enable superior designs at reduced cost and risk, and open new frontiers in aerospace vehicle design and performance.

The advanced computational tools include the NASA FUN3D software which is used for solving nonlinear partial differential equations, known as Navier-Stokes equations, used for steady and unsteady flow computations including large eddy simulations in computational fluid dynamics (CFD). Despite tremendous progress made in the past few decades, CFD tools are too slow for simulation of complex geometry flows, particularly those involving flow separation and multi-physics (e.g. combustion) applications. To enable high-fidelity CFD for multi-disciplinary analysis and design, the speed of computation must be increased by orders of magnitude.

NASA is seeking proposals for improving the performance of the NASA FUN3D software running on the NASA Pleiades supercomputer.  The desired outcome is any approach that can accelerate calculations by a factor of 10-1000x without any decrease in accuracy and while utilizing the existing hardware platform.

More info HERE.

 

 

Study Groups with Industry: Mathematics meets the real world

A study group is a type of workshop which brings together mathematicians and people from industry. The meetings typically last for 5 days, Monday-Friday. On the Monday morning the industry representatives present problems of current interest to an audience of applied mathematicians. Subsequently the mathematicians split into working groups to investigate the suggested topics. On the Friday solutions and results are presented to the industry representative. After the meeting a report is prepared for the company, detailing the progress made and usually with suggestions for further work or experiments. Over the years they have proved to be an excellent way of building bridges between universities and companies as well as providing exciting new topics for mathematicians. Of course there is pressure involved in attempting to understand and solve a problem over a short time frame. This can often produce an exciting and intense atmosphere but, in general, a good time is had by all.

 

Meyers_Study_groups.jpeg

Experiments can often help guide a mathematical investigation (or cause even more confusion)

The original Study Groups with Industry started in Oxford in 1968. The format proved a popular way for initiating interaction between universities and private industry. The interaction often led to further collaboration, student projects and new fields of research. Consequently, study groups were adopted in other countries, starting in Europe to form the European Study Groups with Industry (ESGI) and then spreading throughout the world, regular meetings are currently held in Australia, Canada, India, New Zealand, US, Russia and South Africa. A vast range of topics have been covered in the meetings, including beer and wine bottle labelling, legal sale of rhino horn, spontaneous combustion, mortgaging of cows, building toys, city bike sharing strategies, determining fish freshness, etc. New forms of meeting have also evolved, such as the Mathematics in Medicine or Agri-Food Study Groups.

The popularity of study groups can be attributed to their mutually beneficial effects. For companies there is:

  1. The possibility of a quick solution to their problem, or at least guidance on a way forward.
  2. Mathematicians can help identify and correctly formulate a problem for further study.
  3. Access to state-of-the-art techniques.
  4. Building contacts with top researchers in a given field.

The academics benefit from:

  1. Discovering new problems and research areas with practical applications.
  2. The possibility of further projects and collaboration with industry.
  3. The opportunity for future funding.

An important feature of these meetings is that they can also highlight the talents of students, leading to employment opportunities with the companies. In South Africa, after attending a number of study groups, a group of students took a new direction. Noting the gap in the market for applying mathematics to real world problems they started their own company, Isazi Consulting. Now they return to the meetings this time posing their own problems, and looking for new recruits.

Information on the European Study Groups can be found on the website of the European Consortium for Mathematics in Industry. A good source of information for meetings in Europe and the rest of the world is the Mathematics in Industry Information Service, see

ECMI Study Groups https://ecmiindmath.org/study-groups/

MIIS Website http://www.maths-in-industry.org/

 

Tim Myers

Centre de Recerca Matematica

Barcelona, Spain

Opportunity: 33rd Annual Mathematical Problems in Industry (MPI) Workshop

Registration is now open for the 33rd Annual Mathematical Problems in Industry (MPI) Workshop, to be held June 19-23, 2017 at New Jersey Institute of Technology in Newark, NJ. The Department of Mathematical Sciences at NJIT is hosting the meeting, with Linda Cummings and Richard Moore acting as local organizers. Funding is provided by our industrial participants and the National Science Foundation.

The format of MPI 2017 will be familiar to those of you who have attended MPI or a similar week-long study group in the past. On Monday, several industrial participants present their research problems to an assembled group of professors, postdocs and graduate students working in the field of applied mathematics. These presentations are followed by break-out sessions, where teams form to work on the problems throughout the week. The week culminates in presentations delivered Friday to the assembled group of industrial
participants and applied mathematicians. A follow-up report is delivered to each industrial participant in the weeks following MPI. These reports are often modified and submitted for publication in peer-reviewed journals, and many past MPI workshops have produced fruitful long-term collaborations.

To learn more about MPI 2017 and prior workshops, please visit the workshop website:

http://web.njit.edu/~rmoore/MPI2017/

A link on the left menubar will direct you to the online registration form. Spaces and funding are limited, so please register as early as possible. Young researchers and those with prior experience at MPI or the GSMMC (see below) are especially encouraged to apply, as are members of groups traditionally underrepresented in applied mathematics.

Graduate students who have not already done so in a previous year are strongly encouraged to participate in the Graduate Student Mathematical Modeling Camp (GSMMC), held at Rensselaer Polytechnic Institute the week immediately preceding MPI. You will automatically be registered for MPI as a Camp attendee. Please follow the following link to register for the GSMMC:

http://homepages.rpi.edu/~schwed/Workshop/GSMMCamp2017/home.html

Although some of the industrial problems have already been selected, we are still
accepting applications to participate as problem-presenters. Please forward this email to industrial contacts who might be interested in exposing their research problems to a large body of creative problem-solvers with broad expertise in industrial applied math.

Looking forward to seeing you at MPI 2017!

CC BY-NC-ND