Use Shapash Webapp with Eurybia

With this tutorial, you will learn to use Eurybia and the Shapash webapp to understand your datadrift classifier

Contents: - Build a model to deploy - Do data validation between learning dataset and production dataset - Generate Report - Run Webapp

Data from Kaggle Titanic

Requirements notice : the following tutorial may use third party modules not included in Eurybia.
You can find them all in one file on our Github repository or you can manually install those you are missing, if any.
[1]:
from category_encoders import OrdinalEncoder
import catboost
from sklearn.model_selection import train_test_split

Building Supervized Model

[3]:
from eurybia.data.data_loader import data_loading
/Users/78176D/workspace/eurybia/eurybia/report/generation.py:18: UserWarning:

Using Panel interactively in VSCode notebooks requires the jupyter_bokeh package to be installed. You can install it with:

   pip install jupyter_bokeh

or:
    conda install jupyter_bokeh

and try again.

[4]:
titan_df = data_loading('titanic')
[5]:
features = ['Pclass', 'Age', 'Embarked', 'Sex', 'SibSp', 'Parch', 'Fare']
features_to_encode = ['Pclass', 'Embarked', 'Sex']
[6]:
encoder = OrdinalEncoder(cols=features_to_encode)
encoder.fit(titan_df[features], verbose=False)
[6]:
OrdinalEncoder(cols=['Pclass', 'Embarked', 'Sex'],
               mapping=[{'col': 'Pclass', 'data_type': dtype('O'),
                         'mapping': Third class     1
First class     2
Second class    3
NaN            -2
dtype: int64},
                        {'col': 'Embarked', 'data_type': dtype('O'),
                         'mapping': Southampton    1
Cherbourg      2
Queenstown     3
NaN           -2
dtype: int64},
                        {'col': 'Sex', 'data_type': dtype('O'),
                         'mapping': male      1
female    2
NaN      -2
dtype: int64}])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
[7]:
titan_df_encoded = encoder.transform(titan_df[features])
[8]:
X_train, X_test, y_train, y_test = train_test_split(
    titan_df_encoded,
    titan_df['Survived'].to_frame(),
    test_size=0.2,
    random_state=11
)
[9]:
i=0
indice_cat  = []
for feature in titan_df_encoded:
    if feature in features_to_encode:
        indice_cat.append(i)
    i=i+1
[10]:
model = catboost.CatBoostClassifier(loss_function= "Logloss", eval_metric="Logloss",
        learning_rate=0.143852,
        iterations=500,
        l2_leaf_reg=15,
        max_depth = 4)
[11]:
train_pool_cat = catboost.Pool(data=X_train, label= y_train, cat_features = indice_cat)
test_pool_cat = catboost.Pool(data=X_test, label=y_test, cat_features = indice_cat)
[12]:
model.fit(train_pool_cat, eval_set=test_pool_cat, silent=True)
y_pred = model.predict(X_test)

Creating a fake dataset as a production dataset

[13]:
import random
[14]:
df_production = titan_df.copy()
[15]:
df_production['Age'] = df_production['Age'].apply(lambda x: random.randrange(10, 76)).astype(float)
df_production['Fare'] = df_production['Fare'].apply(lambda x: random.randrange(1, 100)).astype(float)
list_sex= ["male", "female"]
df_production['Sex'] = df_production['Sex'].apply(lambda x: random.choice(list_sex))
[16]:
df_baseline = titan_df[features]
df_current = df_production[features]
[17]:
df_current.head()
[17]:
Pclass Age Embarked Sex SibSp Parch Fare
PassengerId
1 Third class 36.0 Southampton male 1 0 57.0
2 First class 11.0 Cherbourg female 1 0 94.0
3 Third class 12.0 Southampton female 0 0 25.0
4 First class 60.0 Southampton female 1 0 94.0
5 Third class 22.0 Southampton female 0 0 84.0
[18]:
df_baseline.head()
[18]:
Pclass Age Embarked Sex SibSp Parch Fare
PassengerId
1 Third class 22.0 Southampton male 1 0 7.25
2 First class 38.0 Cherbourg female 1 0 71.28
3 Third class 26.0 Southampton female 0 0 7.92
4 First class 35.0 Southampton female 1 0 53.10
5 Third class 35.0 Southampton male 0 0 8.05

Use Eurybia for data validation

[19]:
from eurybia import SmartDrift
[20]:
sd = SmartDrift(df_current=df_current, df_baseline=df_baseline, deployed_model=model, encoding=encoder)
[21]:
%time sd.compile(full_validation=True)
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x11d28cad0>
CPU times: user 3.6 s, sys: 690 ms, total: 4.29 s
Wall time: 848 ms
[22]:
sd.generate_report(
    output_file='report_titanic.html',
    title_story="Data validation",
    title_description="""Titanic Data validation"""
    )

Launch WebApp Shapash from SmartDrift

After compile step, you can launch a WebApp Shapash directly from your object SmartDrift. It allows you to access several dynamic plots that will help you to understand where drift has been detected in your data. For information on Shapash Webapp : (https://github.com/MAIF/shapash)

[23]:
app = sd.xpl.run_app(title_story='Eurybia datadrift classifier')
INFO:root:Your Shapash application run on http://PMP01204:8050/
INFO:root:Use the method .kill() to down your app.

Stop the WebApp after using it

[24]:
app.kill()
[ ]: