Use Shapash Webapp with Eurybia

With this tutorial, you will learn to use Eurybia and the Shapash webapp to understand your datadrift classifier

Contents: - Build a model to deploy - Do data validation between learning dataset and production dataset - Generate Report - Run Webapp

Data from Kaggle Titanic

Requirements notice : the following tutorial may use third party modules not included in Eurybia.
You can find them all in one file on our Github repository or you can manually install those you are missing, if any.
[3]:
import pandas as pd
from category_encoders import OrdinalEncoder
import catboost
from eurybia.core.smartdrift import SmartDrift
from sklearn.model_selection import train_test_split

Building Supervized Model

[4]:
from eurybia.data.data_loader import data_loading
[5]:
titan_df = data_loading('titanic')
[6]:
features = ['Pclass', 'Age', 'Embarked', 'Sex', 'SibSp', 'Parch', 'Fare']
features_to_encode = ['Pclass', 'Embarked', 'Sex']
[ ]:
encoder = OrdinalEncoder(cols=features_to_encode)
encoder.fit(titan_df[features], verbose=False)
[8]:
titan_df_encoded = encoder.transform(titan_df[features])
[9]:
X_train, X_test, y_train, y_test = train_test_split(
    titan_df_encoded,
    titan_df['Survived'].to_frame(),
    test_size=0.2,
    random_state=11
)
[10]:
i=0
indice_cat  = []
for feature in titan_df_encoded:
    if feature in features_to_encode:
        indice_cat.append(i)
    i=i+1
[11]:
model = catboost.CatBoostClassifier(loss_function= "Logloss", eval_metric="Logloss",
        learning_rate=0.143852,
        iterations=500,
        l2_leaf_reg=15,
        max_depth = 4)
[12]:
train_pool_cat = catboost.Pool(data=X_train, label= y_train, cat_features = indice_cat)
test_pool_cat = catboost.Pool(data=X_test, label=y_test, cat_features = indice_cat)
[13]:
model.fit(train_pool_cat, eval_set=test_pool_cat, silent=True)
y_pred = model.predict(X_test)

Creating a fake dataset as a production dataset

[14]:
import random
[15]:
df_production = titan_df.copy()
[16]:
df_production['Age'] = df_production['Age'].apply(lambda x: random.randrange(10, 76)).astype(float)
df_production['Fare'] = df_production['Fare'].apply(lambda x: random.randrange(1, 100)).astype(float)
list_sex= ["male", "female"]
df_production['Sex'] = df_production['Sex'].apply(lambda x: random.choice(list_sex))
[17]:
df_baseline = titan_df[features]
df_current = df_production[features]
[18]:
df_current.head()
[18]:
Pclass Age Embarked Sex SibSp Parch Fare
PassengerId
1 Third class 19.0 Southampton female 1 0 41.0
2 First class 40.0 Cherbourg male 1 0 52.0
3 Third class 40.0 Southampton female 0 0 81.0
4 First class 53.0 Southampton male 1 0 15.0
5 Third class 46.0 Southampton male 0 0 69.0
[19]:
df_baseline.head()
[19]:
Pclass Age Embarked Sex SibSp Parch Fare
PassengerId
1 Third class 22.0 Southampton male 1 0 7.25
2 First class 38.0 Cherbourg female 1 0 71.28
3 Third class 26.0 Southampton female 0 0 7.92
4 First class 35.0 Southampton female 1 0 53.10
5 Third class 35.0 Southampton male 0 0 8.05

Use Eurybia for data validation

[20]:
from eurybia import SmartDrift
[21]:
sd = SmartDrift(df_current=df_current, df_baseline=df_baseline, deployed_model=model, encoding=encoder)
[22]:
%time sd.compile(full_validation=True)
CPU times: user 35.9 s, sys: 5.03 s, total: 40.9 s
Wall time: 1.97 s
[24]:
sd.generate_report(
    output_file='report_titanic.html',
    title_story="Data validation",
    title_description="""Titanic Data validation"""
    )

Report saved to ./report_titanic.html. To upload and share your report, create a free Datapane account by running !datapane signup.

Launch WebApp Shapash from SmartDrift

After compile step, you can launch a WebApp Shapash directly from your object SmartDrift. It allows you to access several dynamic plots that will help you to understand where drift has been detected in your data. For information on Shapash Webapp : (https://github.com/MAIF/shapash)

[ ]:
app = sd.xpl.run_app(title_story='Eurybia datadrift classifier')

Stop the WebApp after using it

[ ]:
app.kill()