Assitan Koné
Feb 12

Guide to learning the basics of data science in 7 steps

1. Understand the basic concepts of statistics and mathematics

It's important to understand basic concepts such as probability, descriptive statistics, statistical inference, and mathematics such as linear algebra and optimizations.
It's so easy to get directly into practice, especially if you know how to code and you know how to implement libraries. But machine learning is about using mathematical formulas to make predictions. Let's not forget about data analysis which requires statistics knowledge. You will be much more comfortable doing your data science projects if you master the fundamentals and you will have less chance to be overwhelmed. Foundations are essential.

Let's say, you have a machine learning project to create. It will be easier for you to know right away which algorithm you can use related to the data. Or if it's better to use accuracy or rather the F1-score to evaluate your model to interpret correctly the results.

Author

Assitan Koné
Founder @Codistwa
Empty space, drag to resize

SHARE

2. Learn how to manipulate data

This requires an understanding of data structures which are Series and Dataframe and knowing data preparation techniques using tools such as Pandas and Numpy. When comes to machine learning, people think that creating a model is the most important part. But let me tell you the reality.

Actually, data processing is the most important part. Why? Because if your data are not well-cleaned, you will have a poor model. So pay attention to empty values, bad formatting, etc. Turns out, this is the most tedious and time-consuming part, but in the end, so satisfying!

3. Learn to use data visualization tools

Data visualizations are a crucial tool for understanding and interpreting data. It's important to know how to use tools such as Matplotlib and Seaborn to display plots, histograms, bar chat, etc. It can be hard to know how to display such code.

Seaborn it's easier to implement because the syntax it's simple, but I recommend you to have snippets to help you understand how to use these libraries, because, usually, you use both in your code. AI tools like ChatGPT can also help you code your charts.

4. Understanding machine learning algorithms

Today, we don't need to create our own algorithm, except if we have a very specific need or if we do a deep tech project. But libraries like Scikit-learn, Tensorflow, and Pytorch help us implement these algorithms very easily to create our models.

However, it's important to understand the main machine learning algorithms such as linear regressions, decision trees, neural networks, and clustering algorithms, understand how they work in general, why and when to use them, with which data, and the best practices.
This is essential because for example, if we want to do an image classification, it's more efficient to use neural networks than logistic regression. But if we want to do a simple regression, linear regression is more appropriate. Choose your algorithm according to your type of problem.
Let me ask you a question. If you want to buy a computer just for the basics things like checking your emails, no game whatsoever, would you buy a simple computer at $400 or a powerful one at $2500? Of course, you'd buy the cheapest. Well, this is the same principle for machine learning algorithms.

5. Understand model evaluation methods

Understanding model evaluation methods such as cross-validation, confusion matrix, and performance measures such as AUC and F1 score is crucial to make sure that our model can be generalized.

Also, knowing exactly why using accuracy allows us to avoid errors of interpretation. This requires a deep understanding of your data. For example, if our dataset is imbalanced, it's better to use the F1 score to avoid bias.

6. Practice using data science projects

Of course, practice is mandatory to mastering data science skills. I recommend doing classic projects like Titanic, image classification, and spam detection to understand the underlying concepts and use best practices.

However, don't hesitate to go further by participating in data science projects on online platforms such as Kaggle, Omdena, DrivenData, MachineHack or working on personal projects is recommended. For your personal project, my advice is to choose a subject related to your passion, for example, music or ecology. Or perhaps related to your culture or very important like women's rights. It will be so valuable to your portfolio. And remember, try to document everything. What is the best place? Your notion for example, but even better, blog posts! If you want help with that, don't hesitate to join our membership.

I know that it could be so complicated to start a data science project. Which libraries do I need to import again? Should I use a histogram or a bar chart? You know what? Practice makes you better. So what are you waiting for? Let’s go!

7. Stay up-to-date with the latest trends and developments in data science

Data science is a constantly evolving field. It's important to stay up to date with the latest trends by following data science blogs like KD Nuggets, attending conferences like NeurIPS, and reading professional literature like Paper With Code.

However, subscribing to newsletter can be so overwhelming. My advice? Create filters in your email box and set a day in the week when you will read everything. And most importantly, don't subscribe to everything. it can be tempting, but I'm sure that you already have a busy life. So choose 3 newsletters max.

Conclusion

This is a starting point for learning the basics of data science. It is important to persevere with learning and never stop growing by following trends and practicing regularly.
Write your awesome label here.

Design Your Custom Machine Learning Chatbot

Write your awesome label here.
This quiz aims to help you create a machine learning-powered chatbot tailored to your interests, passions, culture, values, and expertise area. Answer the following questions honestly to uncover your ideal chatbot concept.

Will you succeed in your data science learning?

Data science is a rapidly growing field with high demand for skilled professionals. If you're thinking about a career in data science, you may be wondering if you have what it takes to succeed. This quiz will help you assess your personality traits and skills to see if you're well-suited for data science training.
Sign up. Be inspired. Code.

Get a FREE Machine Learning Roadmap!

Subscribe to our newsletter to get your gift.

Get tips to teach yourself data science without being overwelmed in your email box. Get secrets to think and act like a Data Scientist on a daily basis. 
Write your awesome label here.
Created with