How to Become an Industry Computational Biologist in a Year

Recently, MCS Advisors were able to attend the MassBioEd Life Sciences Workforce Conference where they discussed how the demand for the life sciences workforce in Massachusetts continues to grow, especially for those with interdisciplinary skillsets, like those of computational biologists. Read more in the 2023 Massachusetts Life Sciences Employment Outlook Report. Leveraging our Harvard network, we asked alumnus, Dean Lee, Compbio guru, to share his story, and insights into this exciting field.

In February of 2017, I was officially stuck. I was in my late 20s, I had been working towards academic science for eight years, but I had also just decided to take a gap year from my graduate neuroscience program at Harvard. Grad school made it clear to me that I would not be happy if I continued with academic science, so I was back at the drawing board. As lost as I felt, I knew that I wasn’t navigating completely blindly; I still enjoyed the life sciences, and I just needed to find a way to work on interesting biological questions beyond the confines of academia.

I knew only vaguely that I wanted to transition into computational biology (compbio). I noticed that data generation from biological experiments was becoming cheaper each year, but most biologists were not equipped to analyze this dramatically increasing amount of data. I guessed that if I could figure out how to analyze that data for them, then I can still work on exciting biological questions and maybe even get paid for it!

For the next 2.5 years, I struggled to acquire the necessary skills for becoming a computational biologist in the biotech/pharma industry. The bar seemed high; I wasn’t sure if or how I would learn enough programming, math, and statistics to be qualified. No one I spoke with could give me clear guidance on how to make this transition. I found many master’s programs that claimed to be a direct path to industry roles, but upon closer inspection most of them seemed to be money grabs that offered content that was too generic to be useful and training that was outdated (Ex. programming in Perl, analysis with Galaxy). Even if some of those master’s programs were useful, I couldn’t afford them anyway. Eventually, I was able to navigate to my first industry compbio role, but only by trial and error. In retrospect, I probably could have made this transition in less than a year if I had proper guidance.

I hope to provide a bit more clarity to this process so you don’t have to spend as much time as I did groping in the dark. I will highlight several practical skills/experiences that will help you prepare for a compbio job (my examples will be a bit biased towards omics-related compbio). These components can be acquired simultaneously, and at any stage in your postsecondary education: bachelor’s, master’s, PhD, or postdoc. Those with more years of education might reasonably require less time to acquire these components, while those with fewer years of education might require more time.

Want to know our little secret? Most industry computational biologists are not expert coders. I would be ashamed to admit how many for loops I write. Our product is not code; our product is biological insights we extract from data. We tend to perform ad hoc, highly customized analyses to answer niche questions. We are often superusers of a finite set of powerful Python/R packages that do all the heavy lifting for us in a particular domain of biology, rather than general programming maestros. We are very good at debugging by googling. We usually don’t need to code at the level of Google programmers.

With that in mind, your goal then is to become comfortable enough with Python and R such that you can quickly adopt any set of packages designed for biological data analysis. This familiarity should not require years and years of time. There are countless free online resources from which you can learn standard Python and R syntax. Start with one language, then eventually you can pick up the other. I personally think Python is the more efficient language and that compbio is slowly shifting towards Python. But for now many of the most popular packages for analyzing biological data are still in R, so it’s good to just learn both.

Make sure you learn how to make informative plots. Keep it simple. Boxplots, scatterplots, and heatmaps made with seaborn, matplotlib, or ggplot2 can go a long way.

I know machine learning is all the rage, but before you sink your teeth into the fancier techniques of machine learning, you should master the more traditional but still powerful approaches from statistics. Be very comfortable with foundational statistical topics/techniques such as probability theory, basic discrete and continuous distributions, hypothesis testing, p-values, multiple testing correction, various ways of normalizing data, measures of correlation, linear regression, logistic regression, principal component analysis, and cluster analysis.

Your standard year-long college-level statistics course series should do the trick. Many free online courses also will teach you well. Don’t just watch videos, however. Work out problems by hand so you learn these concepts deeply. Your future self will thank you.

Industry computational biologists never work alone. We always work with bench scientists who generate the data we analyze. So we must speak their language. We need to understand the field in biology they are speaking from. We must understand why they designed their experiments a certain way, because it informs how we analyze their data. Being able to sympathize with the challenges faced by bench scientists also helps us to build positive working relationships with them. For this reason, experience as a bench scientist is highly relevant preparation for compbio roles. The better we can bridge the data-to-analysis-to-insight gap, the more valuable we are as computational biologists.

To gain deep understanding of a field in biology, read lots of primary literature in that domain. This is the most time-consuming piece of your preparation for a compbio role, but also the most fun! If you are a PhD student or postdoc in the life sciences, you should already have this skill; little to no further preparation is needed here. For undergrads and master’s students, please make sure that you learn how to dissect primary literature. It doesn’t matter how many or few biology classes you take; at the end of the day, you should be able to judge a Nature/Cell/Science article on its merits. There is no shortcut to learning this skill. You just have to sit down and read. Google is your friend. Joining a journal club can help. Your first scientific papers may take 10-20 hours each to digest.

One way to measure your ability to digest biology papers is to see whether you can pick up any Nature/Cell/Science paper in your chosen biological field and glean the gist of it in 15 minutes. You should be able to give an overview of the paper to a scientifically literate friend by drawing/writing on a single sheet of paper. The ability to do this implies you are familiar with the fundamental biology being addressed, the most popular/powerful experimental methods in that field, and the plots typically used to visualize results.

In addition to learning a field of biology, such as immunology or microbiology, we also have to follow the most recent technical advances in our own field of computational biology. New methods are published pretty much daily, and part of our jobs is to quickly decide which methods make sense and which do not. Having the ability to parse compbio primary literature will give you an additional edge in your preparation for an industry compbio role.

You need to complete a meaningful analysis of biological data as the final part of your preparation to become a computational biologist.

The most direct way to do this is to join a research lab that already has datasets you can play with. This might be imaging data, any kind of omics data (genomics, epigenomics, transcriptomics, proteomics) usually obtained by some sequencing approach (DNA-seq, RNA-seq, ATAC-seq), or data about DNA/RNA/protein structure. The variety of data types you might work with is too long to completely list here.

Mentoring matters a lot. Join a lab with a supportive graduate student or postdoc skilled in computational methods who can guide your data analysis. This person will save you countless hours banging your head against your MacBook when you are stuck. This person will also be your reference when you apply for a job.

Working on this project is where you specialize in certain compbio analyses. This often looks like becoming an expert user of certain Python or R packages designed to parse a specific type of data. You might find this blog by Tommy Tang, a personal hero of mine, helpful for some of your omics data analysis.

When you have completed your analysis, put it together into a PowerPoint presentation that tells a story in 30 minutes. You will need to convey the background on your chosen topic of study (Ex. mechanical sensation in developing fruit flies, mechanisms of resistance in gastric cancer), the exact questions/hypotheses you address, the data generated to test your hypotheses, the computational method used and why, any positive or negative findings, the implications of your findings for your field, any caveats in your data or analysis the audience should be aware of, and which experiments or additional analyses you propose to do next. Practice really does make perfect. Get lots of feedback from your research mentor.

If joining a research lab is not accessible to you, you might also complete your compbio project as part of an industry internship. For those who are extremely motivated, you could also complete this compbio project on your own free time by analyzing published data. For example, you might find this paper on synovial sarcoma interesting and decide to download the associated data here for your own analysis.

Once you have your story, you are ready to start applying to compbio roles. This blog post by bitsinbio does a good job of broaching the variety of compbio roles; it is written for PhD-holders, but its content is helpful for job seekers at any stage in their education.

In your job search, you should be aware that there is a type of computational biologist for every flavor of biology. For example, a compbio role for evolutionary biology will share very few technical requirements with a compbio role for protein structure modeling, even though they may be advertised under the same job title. So read the job description closely to find out the skills required. Lots of nuances between compbio roles make it difficult for the hiring manager to identify the right candidate, so the more intentional candidate will be more successful in landing interviews.

If the data types and compbio analyses you specialized in for your compbio project are a match for the job description, you may get invited for interviews, which will typically involve 1) giving a short presentation on your project to showcase your scientific critical thinking abilities and technical skills and 2) one-on-one interviews with the hiring manager and your potential teammates to assess fit.

Most compbio jobs in the Boston area are hybrid; some WFH flexibility is the norm. Currently, the base salary for these industry compbio jobs are roughly $75-90K out of college, $80-110K out of a master’s, and $110-150K out of PhD/postdoc. The Broad Institute also hires many computational biologists, but their salaries are lower compared to their industry counterparts.

And there you have it! I hope that this general guide provides a bit more clarity to what it takes to work in computational biology and dispels some myths about entering this field. You don’t need to have years and years of advanced biology, statistics, computer science, and math training to begin meaningful contributions as a computational biologist.

I currently work as a computational biologist in Cambridge, MA. I am always open to connect with aspiring computational biologists at any stage in your education, so don’t hesitate to message me on LinkedIn.

About the Author: Dean started his graduate training in neuroscience (GSAS ’18) studying the molecular rules directing the developing mammalian cortex. But he decided to change course to computational biology as he witnessed the data revolution in the life sciences being accelerated by next-generation sequencing technologies. He now queries this data to guide immuno-oncology drug development in biotech/pharma. He thinks a lot about how scientists grow professionally and the organizational ingredients that enable scientists to realize their full positive impact on human health.

Monday	9:00 am - 5:00 pm
Tuesday	9:00 am - 5:00 pm
Wednesday	9:00 am - 5:00 pm
Thursday	9:00 am - 5:00 pm
Friday	9:00 am - 5:00 pm

How to Become an Industry Computational Biologist in a Year

1. Python and R (6 months- 1 year)

2. Statistics (1 year)

3. Deep Understanding of a Field of Biology (1-2 years)

4. Compbio Project (3-6 months)

5. Apply and Interview!