Use Shapash Webapp with Eurybia¶
With this tutorial, you will learn to use Eurybia and the Shapash webapp to understand your datadrift classifier
Contents: - Build a model to deploy - Do data validation between learning dataset and production dataset - Generate Report - Run Webapp
Data from Kaggle Titanic
Requirements notice : the following tutorial may use third party modules not included in Eurybia.
You can find them all in one file on our Github repository or you can manually install those you are missing, if any.
[3]:
import pandas as pd
from category_encoders import OrdinalEncoder
import catboost
from eurybia.core.smartdrift import SmartDrift
from sklearn.model_selection import train_test_split
Building Supervized Model¶
[4]:
from eurybia.data.data_loader import data_loading
[5]:
titan_df = data_loading('titanic')
[6]:
features = ['Pclass', 'Age', 'Embarked', 'Sex', 'SibSp', 'Parch', 'Fare']
features_to_encode = ['Pclass', 'Embarked', 'Sex']
[ ]:
encoder = OrdinalEncoder(cols=features_to_encode)
encoder.fit(titan_df[features], verbose=False)
[8]:
titan_df_encoded = encoder.transform(titan_df[features])
[9]:
X_train, X_test, y_train, y_test = train_test_split(
titan_df_encoded,
titan_df['Survived'].to_frame(),
test_size=0.2,
random_state=11
)
[10]:
i=0
indice_cat = []
for feature in titan_df_encoded:
if feature in features_to_encode:
indice_cat.append(i)
i=i+1
[11]:
model = catboost.CatBoostClassifier(loss_function= "Logloss", eval_metric="Logloss",
learning_rate=0.143852,
iterations=500,
l2_leaf_reg=15,
max_depth = 4)
[12]:
train_pool_cat = catboost.Pool(data=X_train, label= y_train, cat_features = indice_cat)
test_pool_cat = catboost.Pool(data=X_test, label=y_test, cat_features = indice_cat)
[13]:
model.fit(train_pool_cat, eval_set=test_pool_cat, silent=True)
y_pred = model.predict(X_test)
Creating a fake dataset as a production dataset¶
[14]:
import random
[15]:
df_production = titan_df.copy()
[16]:
df_production['Age'] = df_production['Age'].apply(lambda x: random.randrange(10, 76)).astype(float)
df_production['Fare'] = df_production['Fare'].apply(lambda x: random.randrange(1, 100)).astype(float)
list_sex= ["male", "female"]
df_production['Sex'] = df_production['Sex'].apply(lambda x: random.choice(list_sex))
[17]:
df_baseline = titan_df[features]
df_current = df_production[features]
[18]:
df_current.head()
[18]:
Pclass | Age | Embarked | Sex | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
PassengerId | |||||||
1 | Third class | 19.0 | Southampton | female | 1 | 0 | 41.0 |
2 | First class | 40.0 | Cherbourg | male | 1 | 0 | 52.0 |
3 | Third class | 40.0 | Southampton | female | 0 | 0 | 81.0 |
4 | First class | 53.0 | Southampton | male | 1 | 0 | 15.0 |
5 | Third class | 46.0 | Southampton | male | 0 | 0 | 69.0 |
[19]:
df_baseline.head()
[19]:
Pclass | Age | Embarked | Sex | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
PassengerId | |||||||
1 | Third class | 22.0 | Southampton | male | 1 | 0 | 7.25 |
2 | First class | 38.0 | Cherbourg | female | 1 | 0 | 71.28 |
3 | Third class | 26.0 | Southampton | female | 0 | 0 | 7.92 |
4 | First class | 35.0 | Southampton | female | 1 | 0 | 53.10 |
5 | Third class | 35.0 | Southampton | male | 0 | 0 | 8.05 |
Use Eurybia for data validation¶
[20]:
from eurybia import SmartDrift
[21]:
sd = SmartDrift(df_current=df_current, df_baseline=df_baseline, deployed_model=model, encoding=encoder)
[22]:
%time sd.compile(full_validation=True)
CPU times: user 35.9 s, sys: 5.03 s, total: 40.9 s
Wall time: 1.97 s
[24]:
sd.generate_report(
output_file='report_titanic.html',
title_story="Data validation",
title_description="""Titanic Data validation"""
)
Report saved to ./report_titanic.html. To upload and share your report, create a free Datapane account by running !datapane signup
.
Launch WebApp Shapash from SmartDrift¶
After compile step, you can launch a WebApp Shapash directly from your object SmartDrift. It allows you to access several dynamic plots that will help you to understand where drift has been detected in your data. For information on Shapash Webapp : (https://github.com/MAIF/shapash)
[ ]:
app = sd.xpl.run_app(title_story='Eurybia datadrift classifier')
Stop the WebApp after using it
[ ]:
app.kill()