Skip to main content

Creating a Data Science Learning Path

Posted by on Thursday, June 25, 2020 in Newsletter.

On your journey as a data scientist, being able to develop an effective learning strategy can be vital in determining your trajectory and career outcomes. Even if you are embarking on a structured graduate program where you are bound to learn the skills you need, it might be a good idea to think of the coursework as additional resources towards achieving your own goals.

When structuring your learning within data science, a simple approach would be:

  • Identifying what core skill sets comprise a data scientist.
  • Identifying where you are, proficiency-wise, in those core skill sets.
  • Building and refining weak skill sets.

Identifying what core skill sets comprise a data scientist

Being able to consider the type of role you will be playing in a team as a data scientist will determine the skill sets you need. A Data Science and Analytics company, Mango Solutions, uses the following six competencies to assess a data science team for their “Data Science Radar” tool. While knowing about these competencies can help a team to identify their needs and where they stand as a whole, it can also be useful for your self-assessment of skills.

Identifying where you are, proficiency-wise, in those core skill sets

Image retrieved from https://www.mango-solutions.com/products/data-science-radar/

Building and refining weak skill sets

Based on the above approach, you could keep the competencies in mind and see where you lie in terms of proficiency in each of those areas. Then you could systematically, one by one, aim to improve your skills in each of those areas until you master them. But, one of the biggest problems with this approach is knowing when to stop or knowing if you’ve achieved mastery knowledge. While it might be nice to know about a specific topic from start to finish,  you likely only need to know 20% about a topic in order to accomplish 80% of what you need to (see the Pareto Principle).

Another problem with this approach is that when you are developing your skills in one area, you are not connecting the dots between all of the other areas. A good example is being able to connect the dots between visualizations and using those visualizations to guide your machine learning algorithms. Envisioning how all skill sets connect and complement each other is important in being able to understand the data science workflow. This type of intuition can only be acquired through working on projects and having a very directed approach of learning.

In your career as a data scientist, you will see that the process of working on any data science project is iterative and you must constantly go back and reevaluate something or improve on some part of the process. The same iterative mentality can be applied to developing an effective learning strategy. However, to apply this mentality to your learning you will need to be comfortable with the notion that you will not know everything about the tools you are using to solve a specific problem. You should be comfortable with learning just enough to be able to solve a problem and not make grueling mistakes or unclear assumptions (as before, remember the Pareto Principle!). This approach is more effective as it teaches you how to identify what you need to learn to carry out a task and connect the dots between different parts of the project.

Imagine you are blindfolded and dropped off in an unknown forest. What is the first thing you do? You look around and try to identify where you are. You explore. You try to get a lay of the land. You identify what you need to do in order to survive; your goal of survival is linked to you getting food, water, and shelter. Getting food, water, and shelter is linked to you acquiring the skills you need to be able to obtain those possessions. Acquiring those skills is connected to actually working on getting those three items and working with the knowledge you already have. Everything circles back to having a clear and defined goal – survival. Having an end goal in your learning process will bring a lot of clarity in terms of both knowing when to stop and knowing what you need to know. It will also allow you to see the learning as a whole rather than bits and pieces that are supposed to fit together.

But, I get it! Having a clear and defined goal is difficult when you are just starting your journey as a data scientist. There can be a lot of noise regarding what you could be doing and what you should be doing, and everyone has an opinion (just like this article). However, it is important to realize that the best way to cut through the noise and get right to the signal is to know where the signal is coming from.

There are two ways to narrow your scope of finding an end objective:

  • Identify an industry of interest. You should ask yourself the following questions: What type of industry are you most inclined towards? What type of problems are usually worked on in this industry? Do those problems and the solutions to those problems sound interesting to you? Can you reach out to data scientists in that specific industry and ask them about what drives them to work on the problems they work on?
  • Identify the type of data or problems you wish to work on. Do you find the idea of working with image data and computer vision interesting or do you wish to work with something different like audio or music? Do you find the types of problems that lie within natural language processing to be interesting? Do you wish to work with prediction tasks or inference tasks? Or do you wish to be able to tell a good story through beautiful data visualizations?

Asking yourself these questions will direct your research and help you gain a clearer picture about what you are interested in, and in turn help structure your learning. The more you work towards a specific goal in mind, the better you will be able to identify what to learn and how to stop when you have learned enough* (I’m sure opinions on what qualifies as ‘enough’ will vary widely…). It would also be beneficial to read papers about topics you are interested in and be aware of the new developments in your field of interest. Connectedpapers.com is a great resource for you to use to identify papers that are thought-provoking and similar to your scope of interest. It will also help you see what technologies are being used, and the type of skills you will need to know to gain a better conceptual understanding for you to arrive at your ultimate goals.

Finally, connecting the dots between different fragments of what you are learning, can only be done by applying what you learn. This can be done through working on projects that you find interesting and going through the process from start to finish. A good way to work on a project is by re-implementing a research project on your own and then writing about it in an explanatory blogpost that helps someone else understand the project better. A good resource for going through code in research projects is paperswithcode.com. This contains papers with the datasets they used, and code published online.

Your Data Science Journey

In the end, the journey you take towards becoming a better data scientist is always going to be paved with struggle. Knowing the right questions to ask yourself about where you want to be is the best approach of all. Most importantly, understand that you can learn to be better every single day, through consistent practice and research! Learning is an iterative process, and through consistency in learning, you can ensure that you are the data scientist you want to be.

Written and submitted by Ali Yaqoob
ali.arsalan.yaqoob@vanderbilt.edu
https://www.linkedin.com/in/aliyaqoob