Mathematics behind Gini Index for Decision Tree
In Data Science and ML, Gini index is a measure used to value how "homogeneous" a certain node of a decision tree is with respect to the distribution of categorical values in the node itself:
for instance, given a node in a decision tree having 10 elements and considering an with three possible values, A-B-C, the node {A=10,B=0,C=0} is more "pure" then the node {A=3,B=4,C=3}, since using this last node could lead to erros in the classification process. In order to measure the degree of impurity of a node, the following formula is used:
GINI(t) $=1-\sum_{j}[p(j|t)]^{2}$
Where j is the number of different classes in the node (in the example above, j would be 3) and $p(j|t)$ is the frequency of the elements of class j over all the elements of the node (for instance, in the node {A=10,B=0,C=0} $p(A|t)^2$ would be 10/10 = 1 )
I would like to understand under which condition the summation $\sum_{j}[p(j|t)]^{2}$ reaches the highest possible value and the lowest possible values for a certain value of j.
Intuitively, the highest possible value would be 1, and this happens when one of the relative frequencies is equal to one and all the others are zero. Intuitively, the lowest possible value is reached when all the frequencies are the same, so when $[p(j|t)]^{2}$ is equal to $j\times(1/j)^2$.
But I would like to understand why, formally and using a proof, the max value of the summation is always 1 and the least value is always $j\times(1/j)^2$
Solution 1:
Denote $p_j$ be the probability masses with $n$ support points. By the power mean inequality,
$$ \left(\frac {1} {n} \sum_{j=1}^n p_j^2 \right)^{1/2} \geq \frac {1} {n} \sum_{j=1}^n p_j = \frac {1} {n} \Rightarrow \sum_{j=1}^n p_j^2 \geq \frac {n} {n^2} = \frac {1} {n} \tag{1}$$
Equality holds if any only if $$p_1 = p_2 = \ldots = p_n = \frac {1} {n}$$
On the other hand, since $$0 \leq p_j \leq 1 \text{ for } j = 1, 2, \ldots n $$ we have $$ p_j^2 \leq p_j \text{ for } j = 1, 2, \ldots n $$
Equality holds if any only if $p_j = 0$ or $1$, i.e. exactly one of the $p_j = 1$ and all the remaining $p_j$ vanishes.
Summing the inequality from $j = 1$ to $j = n$, we have
$$ \sum_{j=1}^n p_j^2 \leq \sum_{j=1}^n p_j = 1 \tag{2}$$
Combining $(1)$ and $(2)$ together,
$$ \frac {1} {n} \leq \sum_{j=1}^n p_j^2 \leq 1 $$