Eurybia with custom colors

With this tutorial, you will understand how to manipulate colors with Eurybia plots

Contents: - Compile Eurybia SmartDrift - Use palette_name parameter - Use colors_dict parameter - Change the colors after compiling SmartDrift

Data from Kaggle House Prices

Requirements notice : the following tutorial may use third party modules not included in Eurybia.
You can find them all in one file on our Github repository or you can manually install those you are missing, if any.
[2]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Building a Supervized Model

[3]:
from eurybia.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[4]:
#For the purpose of this tutorial and to better represent a common use case of Eurybia,
#the house_prices dataset was split in two smaller sets : "training" and "production"
# To see an interesting analysis, let's test for a bias in the date of construction of training and production dataset
house_df_learning = house_df.loc[house_df['YearBuilt'] < 1980]
house_df_production = house_df.loc[house_df['YearBuilt'] >= 1980]
[5]:
y_df_learning=house_df_learning['SalePrice'].to_frame()
X_df_learning=house_df_learning[house_df_learning.columns.difference(['SalePrice','YearBuilt'])]

y_df_production=house_df_production['SalePrice'].to_frame()
X_df_production=house_df_production[house_df_production.columns.difference(['SalePrice','YearBuilt'])]
[6]:
from eurybia import SmartDrift
[7]:
SD = SmartDrift(df_current=X_df_production, df_baseline=X_df_learning)
[8]:
%time SD.compile(full_validation=True)
The variable BldgType has mismatching unique values:
[] | ['Two-family Conversion; originally built as one-family dwelling']

The variable BsmtCond has mismatching unique values:
[] | ['Poor -Severe cracking, settling, or wetness']

The variable CentralAir has mismatching unique values:
[] | ['No']

The variable Condition1 has mismatching unique values:
["Within 200' of East-West Railroad"] | ['Adjacent to arterial street', 'Adjacent to postive off-site feature']

The variable Condition2 has mismatching unique values:
['Near positive off-site feature--park, greenbelt, etc.'] | ['Adjacent to arterial street', "Within 200' of North-South Railroad", 'Adjacent to feeder street', 'Adjacent to postive off-site feature', 'Adjacent to North-South Railroad', 'Adjacent to East-West Railroad']

The variable Electrical has mismatching unique values:
[] | ['60 AMP Fuse Box and mostly Romex wiring (Fair)', 'Fuse Box over 60 AMP and all Romex wiring (Average)', '60 AMP Fuse Box and mostly knob & tube wiring (poor)']

The variable ExterCond has mismatching unique values:
[] | ['Fair', 'Poor', 'Excellent']

The variable ExterQual has mismatching unique values:
[] | ['Fair']

The variable Exterior1st has mismatching unique values:
['Imitation Stucco'] | ['Asbestos Shingles', 'Brick Common', 'Asphalt Shingles', 'Stone', 'Cinder Block']

The variable Exterior2nd has mismatching unique values:
['Other'] | ['Asbestos Shingles', 'Brick Common', 'Asphalt Shingles', 'Stone', 'Cinder Block']

The variable Foundation has mismatching unique values:
['Wood'] | ['Brick & Tile', 'Stone']

The variable Functional has mismatching unique values:
[] | ['Major Deductions 2', 'Severely Damaged']

The variable GarageCond has mismatching unique values:
[] | ['Poor', 'Excellent']

The variable GarageQual has mismatching unique values:
[] | ['Excellent', 'Poor']

The variable GarageType has mismatching unique values:
[] | ['Car Port']

The variable Heating has mismatching unique values:
[] | ['Gas hot water or steam heat', 'Gravity furnace', 'Wall furnace', 'Hot water or steam heat other than gas', 'Floor Furnace']

The variable HeatingQC has mismatching unique values:
[] | ['Fair', 'Poor']

The variable HouseStyle has mismatching unique values:
[] | ['One and one-half story: 2nd level unfinished', 'Two and one-half story: 2nd level unfinished', 'Two and one-half story: 2nd level finished']

The variable KitchenQual has mismatching unique values:
[] | ['Fair']

The variable LandSlope has mismatching unique values:
[] | ['Severe Slope']

The variable MSSubClass has mismatching unique values:
[] | ['2-Story 1945 & Older', '2 Family Conversion - All Styles and Ages', '1-1/2 Story - Unfinished All Ages', '1-Story 1945 & Older', '2-1/2 Story All Ages', '1-Story w/Finished Attic All Ages']

The variable MSZoning has mismatching unique values:
['Floating Village Residential'] | ['Commercial']

The variable MasVnrType has mismatching unique values:
[] | ['Brick Common']

The variable Neighborhood has mismatching unique values:
['Northridge', 'Somerset', 'Northridge Heights', 'Stone Brook', 'Bloomington Heights', 'Bluestem'] | ['Brookside', 'Iowa DOT and Rail Road', 'Meadow Village', 'Northpark Villa', 'Briardale', 'South & West of Iowa State University']

The variable PavedDrive has mismatching unique values:
[] | ['Partial Pavement']

The variable RoofMatl has mismatching unique values:
['Clay or Tile'] | ['Metal', 'Membrane', 'Gravel & Tar', 'Roll']

The variable RoofStyle has mismatching unique values:
[] | ['Gabrel (Barn)', 'Mansard', 'Flat', 'Shed']

The variable SaleCondition has mismatching unique values:
[] | ['Adjoining Land Purchase']

The variable SaleType has mismatching unique values:
['Contract 15% Down payment regular terms'] | []

The variable Utilities has mismatching unique values:
[] | ['Electricity and Gas Only']

CPU times: user 2min 59s, sys: 33.8 s, total: 3min 33s
Wall time: 10.7 s
[9]:
SD.xpl.plot.features_importance()
../../_images/tutorials_common_tuto-common01-colors_10_0.png

Eurybia with different colors

Option 1 : define user-specific colors with colors_dict parameter

The colors declared will replace the one in the default palette.

In the example below, we replace the colors used in the features importance bar plot:

[10]:
# first, let's print the colors used in the previous explainer:
SD.colors_dict['featureimp_bar']
[10]:
{'1': 'rgba(0,154,203,255)', '2': 'rgba(223, 103, 0, 0.8)'}
[11]:
# Now we replace these colors using the colors_dict parameter
SD2 = SmartDrift(df_current=X_df_production, df_baseline=X_df_learning,
    colors_dict=dict(
        featureimp_bar={
            "1": "rgba(244, 192, 0, 1.0)",
            "2": "rgba(52, 55, 54, 0.7)"
            },
            univariate_cat_bar={
            "1": "rgba(244, 192, 0, 1.0)",
            "2": "rgba(52, 55, 54, 0.7)"
            },
            univariate_cont_bar={
            "1": "rgba(244, 192, 0, 1.0)",
            "2": "rgba(52, 55, 54, 0.7)"
            })
                )
[12]:
%time SD2.compile(full_validation=True)
The variable BldgType has mismatching unique values:
[] | ['Two-family Conversion; originally built as one-family dwelling']

The variable BsmtCond has mismatching unique values:
[] | ['Poor -Severe cracking, settling, or wetness']

The variable CentralAir has mismatching unique values:
[] | ['No']

The variable Condition1 has mismatching unique values:
["Within 200' of East-West Railroad"] | ['Adjacent to arterial street', 'Adjacent to postive off-site feature']

The variable Condition2 has mismatching unique values:
['Near positive off-site feature--park, greenbelt, etc.'] | ['Adjacent to arterial street', "Within 200' of North-South Railroad", 'Adjacent to feeder street', 'Adjacent to postive off-site feature', 'Adjacent to North-South Railroad', 'Adjacent to East-West Railroad']

The variable Electrical has mismatching unique values:
[] | ['60 AMP Fuse Box and mostly Romex wiring (Fair)', 'Fuse Box over 60 AMP and all Romex wiring (Average)', '60 AMP Fuse Box and mostly knob & tube wiring (poor)']

The variable ExterCond has mismatching unique values:
[] | ['Fair', 'Poor', 'Excellent']

The variable ExterQual has mismatching unique values:
[] | ['Fair']

The variable Exterior1st has mismatching unique values:
['Imitation Stucco'] | ['Asbestos Shingles', 'Brick Common', 'Asphalt Shingles', 'Stone', 'Cinder Block']

The variable Exterior2nd has mismatching unique values:
['Other'] | ['Asbestos Shingles', 'Brick Common', 'Asphalt Shingles', 'Stone', 'Cinder Block']

The variable Foundation has mismatching unique values:
['Wood'] | ['Brick & Tile', 'Stone']

The variable Functional has mismatching unique values:
[] | ['Major Deductions 2', 'Severely Damaged']

The variable GarageCond has mismatching unique values:
[] | ['Poor', 'Excellent']

The variable GarageQual has mismatching unique values:
[] | ['Excellent', 'Poor']

The variable GarageType has mismatching unique values:
[] | ['Car Port']

The variable Heating has mismatching unique values:
[] | ['Gas hot water or steam heat', 'Gravity furnace', 'Wall furnace', 'Hot water or steam heat other than gas', 'Floor Furnace']

The variable HeatingQC has mismatching unique values:
[] | ['Fair', 'Poor']

The variable HouseStyle has mismatching unique values:
[] | ['One and one-half story: 2nd level unfinished', 'Two and one-half story: 2nd level unfinished', 'Two and one-half story: 2nd level finished']

The variable KitchenQual has mismatching unique values:
[] | ['Fair']

The variable LandSlope has mismatching unique values:
[] | ['Severe Slope']

The variable MSSubClass has mismatching unique values:
[] | ['2-Story 1945 & Older', '2 Family Conversion - All Styles and Ages', '1-1/2 Story - Unfinished All Ages', '1-Story 1945 & Older', '2-1/2 Story All Ages', '1-Story w/Finished Attic All Ages']

The variable MSZoning has mismatching unique values:
['Floating Village Residential'] | ['Commercial']

The variable MasVnrType has mismatching unique values:
[] | ['Brick Common']

The variable Neighborhood has mismatching unique values:
['Northridge', 'Somerset', 'Northridge Heights', 'Stone Brook', 'Bloomington Heights', 'Bluestem'] | ['Brookside', 'Iowa DOT and Rail Road', 'Meadow Village', 'Northpark Villa', 'Briardale', 'South & West of Iowa State University']

The variable PavedDrive has mismatching unique values:
[] | ['Partial Pavement']

The variable RoofMatl has mismatching unique values:
['Clay or Tile'] | ['Metal', 'Membrane', 'Gravel & Tar', 'Roll']

The variable RoofStyle has mismatching unique values:
[] | ['Gabrel (Barn)', 'Mansard', 'Flat', 'Shed']

The variable SaleCondition has mismatching unique values:
[] | ['Adjoining Land Purchase']

The variable SaleType has mismatching unique values:
['Contract 15% Down payment regular terms'] | []

The variable Utilities has mismatching unique values:
[] | ['Electricity and Gas Only']

CPU times: user 2min 58s, sys: 33.5 s, total: 3min 31s
Wall time: 10.8 s
[13]:
SD2.xpl.plot.features_importance()
../../_images/tutorials_common_tuto-common01-colors_16_0.png
[14]:
SD2.plot.generate_fig_univariate('BsmtQual')
../../_images/tutorials_common_tuto-common01-colors_17_0.png

Option 2 : redefine colors after compiling Eurybia

[15]:
SD3 = SmartDrift(df_current=X_df_production, df_baseline=X_df_learning)
[16]:
%time SD3.compile(full_validation=True)
The variable BldgType has mismatching unique values:
[] | ['Two-family Conversion; originally built as one-family dwelling']

The variable BsmtCond has mismatching unique values:
[] | ['Poor -Severe cracking, settling, or wetness']

The variable CentralAir has mismatching unique values:
[] | ['No']

The variable Condition1 has mismatching unique values:
["Within 200' of East-West Railroad"] | ['Adjacent to arterial street', 'Adjacent to postive off-site feature']

The variable Condition2 has mismatching unique values:
['Near positive off-site feature--park, greenbelt, etc.'] | ['Adjacent to arterial street', "Within 200' of North-South Railroad", 'Adjacent to feeder street', 'Adjacent to postive off-site feature', 'Adjacent to North-South Railroad', 'Adjacent to East-West Railroad']

The variable Electrical has mismatching unique values:
[] | ['60 AMP Fuse Box and mostly Romex wiring (Fair)', 'Fuse Box over 60 AMP and all Romex wiring (Average)', '60 AMP Fuse Box and mostly knob & tube wiring (poor)']

The variable ExterCond has mismatching unique values:
[] | ['Fair', 'Poor', 'Excellent']

The variable ExterQual has mismatching unique values:
[] | ['Fair']

The variable Exterior1st has mismatching unique values:
['Imitation Stucco'] | ['Asbestos Shingles', 'Brick Common', 'Asphalt Shingles', 'Stone', 'Cinder Block']

The variable Exterior2nd has mismatching unique values:
['Other'] | ['Asbestos Shingles', 'Brick Common', 'Asphalt Shingles', 'Stone', 'Cinder Block']

The variable Foundation has mismatching unique values:
['Wood'] | ['Brick & Tile', 'Stone']

The variable Functional has mismatching unique values:
[] | ['Major Deductions 2', 'Severely Damaged']

The variable GarageCond has mismatching unique values:
[] | ['Poor', 'Excellent']

The variable GarageQual has mismatching unique values:
[] | ['Excellent', 'Poor']

The variable GarageType has mismatching unique values:
[] | ['Car Port']

The variable Heating has mismatching unique values:
[] | ['Gas hot water or steam heat', 'Gravity furnace', 'Wall furnace', 'Hot water or steam heat other than gas', 'Floor Furnace']

The variable HeatingQC has mismatching unique values:
[] | ['Fair', 'Poor']

The variable HouseStyle has mismatching unique values:
[] | ['One and one-half story: 2nd level unfinished', 'Two and one-half story: 2nd level unfinished', 'Two and one-half story: 2nd level finished']

The variable KitchenQual has mismatching unique values:
[] | ['Fair']

The variable LandSlope has mismatching unique values:
[] | ['Severe Slope']

The variable MSSubClass has mismatching unique values:
[] | ['2-Story 1945 & Older', '2 Family Conversion - All Styles and Ages', '1-1/2 Story - Unfinished All Ages', '1-Story 1945 & Older', '2-1/2 Story All Ages', '1-Story w/Finished Attic All Ages']

The variable MSZoning has mismatching unique values:
['Floating Village Residential'] | ['Commercial']

The variable MasVnrType has mismatching unique values:
[] | ['Brick Common']

The variable Neighborhood has mismatching unique values:
['Northridge', 'Somerset', 'Northridge Heights', 'Stone Brook', 'Bloomington Heights', 'Bluestem'] | ['Brookside', 'Iowa DOT and Rail Road', 'Meadow Village', 'Northpark Villa', 'Briardale', 'South & West of Iowa State University']

The variable PavedDrive has mismatching unique values:
[] | ['Partial Pavement']

The variable RoofMatl has mismatching unique values:
['Clay or Tile'] | ['Metal', 'Membrane', 'Gravel & Tar', 'Roll']

The variable RoofStyle has mismatching unique values:
[] | ['Gabrel (Barn)', 'Mansard', 'Flat', 'Shed']

The variable SaleCondition has mismatching unique values:
[] | ['Adjoining Land Purchase']

The variable SaleType has mismatching unique values:
['Contract 15% Down payment regular terms'] | []

The variable Utilities has mismatching unique values:
[] | ['Electricity and Gas Only']

CPU times: user 3min 1s, sys: 33.1 s, total: 3min 34s
Wall time: 10.7 s
[17]:
SD3.xpl.plot.features_importance()
../../_images/tutorials_common_tuto-common01-colors_21_0.png
[18]:
SD3.plot.generate_fig_univariate('BsmtQual')
../../_images/tutorials_common_tuto-common01-colors_22_0.png