Use Shapash Webapp with Eurybia¶

With this tutorial, you will learn to use Eurybia and the Shapash webapp to understand your datadrift classifier

Contents: - Build a model to deploy - Do data validation between learning dataset and production dataset - Generate Report - Run Webapp

Data from Kaggle Titanic

Requirements notice : the following tutorial may use third party modules not included in Eurybia.

You can find them all in one file on our Github repository or you can manually install those you are missing, if any.

[3]:

import pandas as pd
from category_encoders import OrdinalEncoder
import catboost
from eurybia.core.smartdrift import SmartDrift
from sklearn.model_selection import train_test_split

Building Supervized Model¶

[4]:

from eurybia.data.data_loader import data_loading

[5]:

titan_df = data_loading('titanic')

[6]:

features = ['Pclass', 'Age', 'Embarked', 'Sex', 'SibSp', 'Parch', 'Fare']
features_to_encode = ['Pclass', 'Embarked', 'Sex']

[ ]:

encoder = OrdinalEncoder(cols=features_to_encode)
encoder.fit(titan_df[features], verbose=False)

[8]:

titan_df_encoded = encoder.transform(titan_df[features])

[9]:

X_train, X_test, y_train, y_test = train_test_split(
    titan_df_encoded,
    titan_df['Survived'].to_frame(),
    test_size=0.2,
    random_state=11
)

[10]:

i=0
indice_cat  = []
for feature in titan_df_encoded:
    if feature in features_to_encode:
        indice_cat.append(i)
    i=i+1

[11]:

model = catboost.CatBoostClassifier(loss_function= "Logloss", eval_metric="Logloss",
        learning_rate=0.143852,
        iterations=500,
        l2_leaf_reg=15,
        max_depth = 4)

[12]:

train_pool_cat = catboost.Pool(data=X_train, label= y_train, cat_features = indice_cat)
test_pool_cat = catboost.Pool(data=X_test, label=y_test, cat_features = indice_cat)

[13]:

model.fit(train_pool_cat, eval_set=test_pool_cat, silent=True)
y_pred = model.predict(X_test)

Creating a fake dataset as a production dataset¶

[14]:

import random

[15]:

df_production = titan_df.copy()

[16]:

df_production['Age'] = df_production['Age'].apply(lambda x: random.randrange(10, 76)).astype(float)
df_production['Fare'] = df_production['Fare'].apply(lambda x: random.randrange(1, 100)).astype(float)
list_sex= ["male", "female"]
df_production['Sex'] = df_production['Sex'].apply(lambda x: random.choice(list_sex))

[17]:

df_baseline = titan_df[features]
df_current = df_production[features]

[18]:

df_current.head()

[18]:

	Pclass	Age	Embarked	Sex	SibSp	Parch	Fare
PassengerId
1	Third class	19.0	Southampton	female	1	0	41.0
2	First class	40.0	Cherbourg	male	1	0	52.0
3	Third class	40.0	Southampton	female	0	0	81.0
4	First class	53.0	Southampton	male	1	0	15.0
5	Third class	46.0	Southampton	male	0	0	69.0

[19]:

df_baseline.head()

[19]:

	Pclass	Age	Embarked	Sex	SibSp	Parch	Fare
PassengerId
1	Third class	22.0	Southampton	male	1	0	7.25
2	First class	38.0	Cherbourg	female	1	0	71.28
3	Third class	26.0	Southampton	female	0	0	7.92
4	First class	35.0	Southampton	female	1	0	53.10
5	Third class	35.0	Southampton	male	0	0	8.05

Use Eurybia for data validation¶

[20]:

from eurybia import SmartDrift

[21]:

sd = SmartDrift(df_current=df_current, df_baseline=df_baseline, deployed_model=model, encoding=encoder)

[22]:

%time sd.compile(full_validation=True)

CPU times: user 35.9 s, sys: 5.03 s, total: 40.9 s
Wall time: 1.97 s

[24]:

sd.generate_report(
    output_file='report_titanic.html',
    title_story="Data validation",
    title_description="""Titanic Data validation"""
    )

Report saved to ./report_titanic.html. To upload and share your report, create a free Datapane account by running !datapane signup.

Launch WebApp Shapash from SmartDrift¶

After compile step, you can launch a WebApp Shapash directly from your object SmartDrift. It allows you to access several dynamic plots that will help you to understand where drift has been detected in your data. For information on Shapash Webapp : (https://github.com/MAIF/shapash)

[ ]:

app = sd.xpl.run_app(title_story='Eurybia datadrift classifier')

Stop the WebApp after using it

[ ]:

app.kill()