In [1]:
import sklearn
import sklearn.datasets
import sklearn.ensemble
import numpy as np
import lime
import lime.lime_tabular
from __future__ import print_function
np.random.seed(1)

Continuous features

Loading data, training a model

For this part, we'll use the Iris dataset, and we'll train a random forest.

In [2]:
iris = sklearn.datasets.load_iris()
In [3]:
train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(iris.data, iris.target, train_size=0.80)
In [4]:
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train, labels_train)
Out[4]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=500, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False)
In [5]:
sklearn.metrics.accuracy_score(labels_test, rf.predict(test))
Out[5]:
0.96666666666666667

Create the explainer

As opposed to lime_text.TextExplainer, tabular explainers need a training set. The reason for this is because we compute statistics on each feature (column). If the feature is numerical, we compute the mean and std, and discretize it into quartiles. If the feature is categorical, we compute the frequency of each value. For this tutorial, we'll only look at numerical features.

We use these computed statistics for two things:

  1. To scale the data, so that we can meaningfully compute distances when the attributes are not on the same scale
  2. To sample perturbed instances - which we do by sampling from a Normal(0,1), multiplying by the std and adding back the mean.
In [6]:
explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=iris.feature_names, class_names=iris.target_names, discretize_continuous=True)

Explaining an instance

Since this is a multi-class classification problem, we set the top_labels parameter, so that we only explain the top class.

In [7]:
i = np.random.randint(0, test.shape[0])
exp = explainer.explain_instance(test[i], rf.predict_proba, num_features=2, top_labels=1)

We now explain a single instance:

In [8]:
exp.show_in_notebook(show_table=True, show_all=False)