Before starting with the Gini Index, let us first understand what splitting is and what are the measures used to perform it.
What are Splitting Measures?
With more than one attribute taking part in the decision-making process, it is necessary to decide the relevance and importance of each of the attributes. Thus placing the most relevant at the root node and further traversing down by splitting the nodes.
As we move further down the tree, the level of impurity or uncertainty decreases, thus leading to a better classification or best split at every node. To decide the same, splitting measures such as Information Gain, Gini Index, etc. are used.
What is Information Gain(信息增益)?
Information Gain is used to determine which feature/attribute gives us the maximum information about a class.
- Information Gain is based on the concept of entropy, which is the degree of uncertainty, impurity or disorder.
- Information Gain aims to reduce the level of entropy starting from the root node to the leave nodes.
Formula for Entropy
E(S)=∑i=1c−pilog2pi
where, ‘p’, denotes the probability and E(S) denotes the entropy.
Entropy is not preferred due to the ‘log’ function as it increases the computational complexity.
What is Gini Index(基尼指数)?
Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen.
But what is actually meant by ‘impurity’?
If all the elements belong to a single class, then it can be called pure. The degree of Gini index varies between 0 and 1,
where,
0 denotes that all elements belong to a certain class or if there exists only one class, and
1 denotes that the elements are randomly distributed across various classes.
A Gini Index of 0.5 denotes equally distributed elements into some classes.
Formula for Gini Index
Gini=1−∑i=1n(pi)2
where pi is the probability of an object being classified to a particular class.
While building the decision tree, we would prefer choosing the attribute/feature with the least Gini index as the root node.
Let’s understand with a simple example of how the Gini Index works.
Example of Gini Index
Past Trend | Open Interest | Trading Volume | Return |
Positive | Low | High | Up |
Negative | High | Low | Down |
Positive | Low | High | Up |
Positive | High | High | Up |
Negative | Low | High | Down |
Positive | Low | Low | Down |
Negative | High | High | Down |
Negative | Low | High | Down |
Positive | Low | Low | Down |
Positive | High | High | Up |
Table: Gini Index example
Calculating the Gini Index
Calculating the Gini Index for Past Trend
P(Past Trend=Positive): 6/10
P(Past Trend=Negative): 4/10
- If (Past Trend = Positive & Return = Up), probability = 4/6
- If (Past Trend = Positive & Return = Down), probability = 2/6
Gini index = 1 - ((4/6)^2 + (2/6)^2) = 0.45
- If (Past Trend = Negative & Return = Up), probability = 0
- If (Past Trend = Negative & Return = Down), probability = 4/4
Gini index = 1 - ((0)^2 + (4/4)^2) = 0
- Weighted sum of the Gini Indices can be calculated as follows:
Gini Index for Past Trend = (6/10)0.45 + (4/10)0 = 0.27
Calculation of Gini Index for Open Interest
P(Open Interest=High): 4/10
P(Open Interest=Low): 6/10
- If (Open Interest = High & Return = Up), probability = 2/4
- If (Open Interest = High & Return = Down), probability = 2/4
Gini index = 1 - ((2/4)^2 + (2/4)^2) = 0.5
- If (Open Interest = Low & Return = Up), probability = 2/6
- If (Open Interest = Low & Return = Down), probability = 4/6
Gini index = 1 - ((2/6)^2 + (4/6)^2) = 0.45
- Weighted sum of the Gini Indices can be calculated as follows:
Gini Index for Open Interest = (4/10)0.5 + (6/10)0.45 = 0.47
Calculation of Gini Index for Trading Volume
P(Trading Volume=High): 7/10
P(Trading Volume=Low): 3/10
- If (Trading Volume = High & Return = Up), probability = 4/7
- If (Trading Volume = High & Return = Down), probability = 3/7
Gini index = 1 - ((4/7)^2 + (3/7)^2) = 0.49
- If (Trading Volume = Low & Return = Up), probability = 0
- If (Trading Volume = Low & Return = Down), probability = 3/3
Gini index = 1 - ((0)^2 + (1)^2) = 0
- Weighted sum of the Gini Indices can be calculated as follows:
Gini Index for Trading Volume = (7/10)0.49 + (3/10)0 = 0.34
Gini Index attributes or features
Attributes/Features | Gini Index |
Past Trend | 0.27 |
Open Interest | 0.47 |
Trading Volume | 0.34 |
Table 1: Gini Index attributes or features
From the above table, we observe that ‘Past Trend’ has the lowest Gini Index and hence it will be chosen as the root node for how decision tree works.
We will repeat the same procedure to determine the sub-nodes or branches of the decision tree.
We will calculate the Gini Index for the ‘Positive’ branch of Past Trend as follows:
Past Trend | Open Interest | Trading Volume | Return |
Positive | Low | High | Up |
Positive | Low | High | Up |
Positive | High | High | Up |
Positive | Low | Low | Down |
Positive | Low | Low | Down |
Positive | High | High | Up |
Table: Gini Index calculation for the Positive branch of Past Trend
Calculation of Gini Index of Open Interest for Positive Past Trend
P(Open Interest=High): 2/6
P(Open Interest=Low): 4/6
- If (Open Interest = High & Return = Up), probability = 2/2
- If (Open Interest = High & Return = Down), probability = 0
Gini index = 1 - (sq(2/2) + sq(0)) = 0
- If (Open Interest = Low & Return = Up), probability = 2/4
- If (Open Interest = Low & Return = Down), probability = 2/4
Gini index = 1 - (sq(0) + sq(2/4)) = 0.50
- Weighted sum of the Gini Indices can be calculated as follows:
Gini Index for Open Interest = (2/6)0 + (4/6)0.50 = 0.33
Calculation of Gini Index for Trading Volume
P(Trading Volume=High): 4/6
P(Trading Volume=Low): 2/6
- If (Trading Volume = High & Return = Up), probability = 4/4
- If (Trading Volume = High & Return = Down), probability = 0
Gini index = 1 - (sq(4/4) + sq(0)) = 0
- If (Trading Volume = Low & Return = Up), probability = 0
- If (Trading Volume = Low & Return = Down), probability = 2/2
Gini index = 1 - (sq(0) + sq(2/2)) = 0
- Weighted sum of the Gini Indices can be calculated as follows:
Gini Index for Trading Volume = (4/6)0 + (2/6)0 = 0
Gini Index attributes or features
Attributes/Features | Gini Index |
Open Interest | 0.33 |
Trading Volume | 0 |
Table 2: Gini Index attributes or features
We will split the node further using the ‘Trading Volume’ feature, as it has the minimum Gini index.
Learn how to make a decision tree to predict the markets and find trading opportunities using AI techniques with our Quantra course.
Conclusion
Gini Index, unlike information gain, isn’t computationally intensive as it doesn’t involve the logarithm function used to calculate entropy in information gain. This is why Gini Index is preferred over Information gain.