I am trying to solve this problem from Stepic:
Download a dataset with three variables: sex, exang, num. Imagine thatwe want to use a decision tree to classify whether or not a patienthas heart disease (variable num) based on two criteria: sex and thepresence / absence of angina pectoris (exang). Train a decision treeon this data, use entropy as a criterion. Specify what the InformationGain value will be for the variable that will be placed in the root ofthe tree. The answer must be a number with precision 3 decimal places.
That's what I did:
clf = tree.DecisionTreeClassifier()clf.fit(X, y)tree.plot_tree(clf, filled=True)l_node = clf.tree_.children_left[0]r_node = clf.tree_.children_left[1]n1 = clf.tree_.n_node_samples[l_node]n2 = clf.tree_.n_node_samples[r_node]e1 = clf.tree_.impurity[l_node]e2 = clf.tree_.impurity[r_node]n = n1 + n2ig = 0.996 - (n1 * e1 + n2 * e2) / nInformation Gain is 0.607. But when I enter Information Gain, the answer is not correct.What am I doing wrong?
