The Math Behind C4.5 Decision Tree Algorithm

The Math Behind C4.5 Decision Tree Algorithm


Hello, welcome to C4.5 decision tree algorithm’s overview video. actually, it is very similar to ID3 algorithm. You might remember that we have calculated entropy and information gain for features … in ID3. Here, … We are going to calculate additional metrics. In this way, we can normalize … the calculations of C4.5. This additional metric is called split info. For example, when … we calculate split info for humidity, we are going to calculate … this formula. probability of sub set times logarithm of probability of the sub set. For example, … There are 14 instances in the main data set. I separate the data set into two different sub data sets for humidity classes. There are seven instances in the high humidity and seven instances in the normal humidity. as seen, probability of the first … data set is 7/14 times logarithm 7/14 … minus … second data set’s … probability is same 7/14 times logarithm 7/14. BTW, this logarithm calculations should be … to the base 2. and we are going to use this split info for calculating gain ratio. Actually gain has same calculation in ID3, and we are going to calculate gain over split info And this gives us the gain ratio. and we are going to use gain ratio in C4.5’s decision tree buildings. Thank you for watching and see you next time.

Leave a Reply

Your email address will not be published. Required fields are marked *