From wet lab to data science

Data science

How I think experimental researchers are data scientist

Abel Torres Espín
2022-01-14

Not while ago, I introduced myself as a data scientist, but it has not always been like that. Let me tell you how a biologist like me gets to that and how I believe any researcher can. I am a biomedical researcher and I spent the first 10 years of my career performing lab experiments, what is known as “wet lab”. At some point I realized that doing experiments was not that exciting anymore (a story for another day), but I was not ready to stop doing research, so I decided to shift focus to a more data-intensive work. For the last 3 years I have moved from a pure wet lab work to a pure dry or computational professional activity. During this time, I have learned a lot of new stuff, but I have also got to the conclusion that researchers have a good head start to transition to data science and related work. After all, data is the prima matter for contemporary research. Don’t make me wrong, is not an easy path, but if you are reading this because you are thinking about it, let me convince you that your research knowledge and skills are not that far from those needed to start a career in data science.

NOTE: If you are a computational researcher or your work is heavy on the quantitative site, you probably already made the connections I am making here and you probably already call yourself a data scientist. I have the perception that there is an increase in the number of researchers that has some working experience on computational methods, but yet are not fully aware of it.

We scientist have a lot of work qualities, but acknowledging our transferable skills is not one of them. So let me illustrate my point. First thing first, researchers like to ask questions about the unknown and generate hypothesis about how they think things work. And how do we answer those questions? We turn into data. No matter your field, if you are doing research today, you probably have had to deal with data. What data scientist do? They ask questions, using data to answer them. Not that different so far. Let’s dissect this process and see how researcher’s skills can translate to data science.

Ask your question: We went over this already, all starts with some questions, whether you do research or business analytics, you always want to know something new and researchers sick the limits of knowledge by definition.

Gather your data: Here researchers excel! After years of research work, I am pretty sure you know your ups and downs about doing literature review, designing an experiment tailored to answer your question, and painstikly performed experiment/s and data collection. Data scientist do the same but using other tools. Actually, depending on their work, data scientist might not do any data gathering themselves. Researchers might also get the data already collected, but if you have done research for a while, you would probably be asking a lot of questions on how the data came to be.

Build your dataset: Do you remember all those hours entering numbers in a spreadsheet? You are halfway there then. In my experience, here is where researchers might lack basic concepts such as data structures, relational databases, metadata or data dictionary. So next time you enter data in any sort of digital data keeper, you could start by tidying your dataset (Data organization in spreadsheets) and build a data dictionary for it.

Data wrangling: This is a jargon to evoke any process related to joint, transform, re-organize, summarize and re-format datasets. While well formatted data for storage is important, almost every time you want to perform a specific analysis, you will have to make changes in the data structure. Most researchers probably know their way through spreadsheets or some stats software to bring the data to the required format for analysis.

Explore data: Meeting after meeting you show summary statistics and graphs to lab mattes, supervisors and collaborators. That is really it, you already know how to do this. Produce some summaries and visuals to start getting a sense of the data, check for potential errors and trends, etc.

Data analysis: If you have performed and analyzed research experiments, you probably navigated things like descriptive statistics, hypothesis testing, t-test, analysis of variance (ANOVA), p-values, parametric and non-parametric test, linear regression, correlation, normal distribution, to say a few. So you know more analysis and statistics that you might think. Although data analysis can be very complex and sophisticated in data science, researchers are very well equipped to do a lot of analytical task.

Get insights: Interpret your results. As a researcher, I believe this is one of your powers. Communication, data visualization and storytelling: If anyone things that researchers are these robots that communicate between them using dull technical reports full of numbers and lacking literary elements, will be very surprised at reading a scientific manuscript. Yes, these documents have more jargon and terminology than the average read, but nowadays manuscript and grant writing is an art of effective communication. How to say as much as possible, in the minimum space, for a broad audience (editors, reviewers and a broad reach of researchers), and at the same time captivate and avoid boresome. You know how to sell your insights. An although I think there is still a lot of space for improvement in scientific communication using viz, you know how to make pretty tables, plots, figures, posters and slide presentations.

Learn new tools: there are a lot of tools in data science that researchers don’t know about. And the number keeps increasing at a scary paste. But so it is in research. Both researchers and data scientist are used to the need of keeping up with new methods and learning some of them in detail. Don’t let this scare you.

As you can see, researchers and data scientist share more than what you might think at first glance. Sure you will have to learn a lot, and sure will get frustrating some times, but if research has taught me something is how to constantly learn and how to endure these frustrations and transform them into growth. If you are a researcher that have considered the idea of changing to a more data-centric role, I hope I have convinced you that you are not starting from nothing and I hope I can ease a bit the anxiety of confronting a change like this.

RECOMMENDATION: I truly believe researchers have a lot of transferable skills to data science. If I might give a recommendation for those wanting to do the transition, I would say “learn the terminology and start learning some coding”. Once you know how things are called in data science, you will realize you know more than what you think you know. Code will make your life easier even if you decide to stay in research after all.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/ATEspin/distill_ATE_blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Espín (2022, Jan. 14). health.data DRIVEN Lab: From wet lab to data science. Retrieved from https://atespin.github.io/posts/2022-01-14-from-wet-lab-to-data-science/

BibTeX citation

@misc{espín2022from,
  author = {Espín, Abel Torres},
  title = {health.data DRIVEN Lab: From wet lab to data science},
  url = {https://atespin.github.io/posts/2022-01-14-from-wet-lab-to-data-science/},
  year = {2022}
}