Assitan Koné
Dec 6

What's the Gini index for machine learning?

Why does the Gini index is used for?

The Gini index is used for decision trees. Indeed, how do we know how to separate the root node? Well, there are a couple of methods, and the Gini index is a good one. It allows checking if the leaves containing labels are pure or impure.
That's right, the more diverse the leaves are, the higher the Gini index is. Why? Because if, let's say, you want to recommend a product using a decision tree, you want to make sure that the leaves are the most homogeneous possible so that you can be confident in your proposition.

Formula

When we glance, we can think that feature A gives leaves with less diversity, so a better score, because we have 3 purple circles and two red circles. But you know what, let’s be a bit more rigorous.
So to choose which feature we use as the root tree, we calculate the diversity of the leaves.
This is the formula:

Score

Then we compare the mean of each tree and choose the lowest number. Our winner is feature A!
#MachineLearning #TechEducation #AIForBeginners #DeepLearning #DataScience #AI #ArtificialIntelligence #DataScienceMentorship

Author

Assitan Koné
Founder @Codistwa
Empty space, drag to resize

SHARE

Want to talk about your project idea or get help finding the right one?

I offer a free 30-minute strategy call to help you:

  • Get clarity on your goals
  • Understand how real-world projects can unlock your next step
  • See if the Accelerator is the right fit for you

Spots are limited, I work with a small number of mentees at a time.

Created with