How low code machine learning can enhance responsible AI

We are excited to bring Transform 2022 back in person July 19th and approximately July 20th-28th. Join AI and data leaders for clear conversations and exciting networking opportunities. Register today!


The rapid technical advancement and widespread adoption of AI-based products and workflows affect many aspects of human and commercial activities in banking, healthcare, advertising and many other fields. Although the accuracy of AI models is undoubtedly the most important factor to consider when deploying an AI-based product, there is an urgent need to understand how AI can be designed to work responsibly.

AI is a framework under which any organization that develops software must adopt any AI embedded decisions to build customer confidence in transparency, accountability, fairness, and security. At the same time, a key aspect of AI responsibility is to have a development pipeline that can facilitate the reproduction of results and manage data lines and ML models.

Low-code machine learning with tools such as PyCaret, H2O.ai, and DataRobot is gaining popularity, allowing data scientists to perform pre-canned samples for feature engineering, data erasure, model development, and statistical comparison. However, often the missing parts of these packages are examples around the AI ​​responsible for evaluating ML models for fairness, transparency, explanation, reasons, and so on.

Here, we show a quick and easy way to integrate PyCaret with a Microsoft RAI (Responsible AI) framework to develop a detailed report that analyzes errors, explanations, causes, and inconsistencies. The first part is a symbolic step for developers to show how to build an RAI dashboard. The second part is a detailed evaluation of the RAI report.

Code process

First, we install the necessary libraries. This can be done on your local machine with Python 3.6+ or on a SaaS platform like Google Colab.

!pip install raiwidgets
!pip install pycaret
!pip install — upgrade pandas
!pip install — upgrade numpy

The Pandas and Numpy update is needed now, but needs to be fixed soon. Also, don’t forget to restart your work time if you have Google Colab installed.

Next, we download the data from GitHub and clear the data and do the features engineering with PyCaret.

import pandas as pd, numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
csv_url = ‘https://raw.githubusercontent.com/sahutkarsh/loan-prediction-analytics-vidhya/master/train.csv'
dataset_v1 = pd.read_csv (csv_url)
dataset_v1 = dataset_v1.dropna()
from pycaret.classification import *
clf_setup = setup(data = dataset_v1, target = ‘Loan_Status’,
train_size=0.8, categorical_features=[‘Gender’, ‘Married’, ‘Education’,
‘Self_Employed’, ‘Property_Area’], 
imputation_type=’simple’, categorical_imputation = ‘mode’, ignore_features=[‘Loan_ID’], fix_imbalance=True, silent=True, session_id=123)

A data set is a set of data that mimics credit programs that have characteristics such as gender, marital status, employment, income, and so on. applicants. PyCaret has a wonderful feature that allows you to access a framework of learning and testing data after feature engineering via the get _config method. We use this to get the refined features that are then given to the RAI widget.

X_train = get_config(variable=”X_train”).reset_index().drop([‘index’], axis=1)
y_train = get_config(variable=”y_train”).reset_index().drop([‘index’], axis=1)[‘Loan_Status’]
X_test = get_config(variable=”X_test”).reset_index().drop([‘index’], axis=1)
y_test = get_config(variable=”y_test”).reset_index().drop([‘index’], axis=1)[‘Loan_Status’]
df_train = X_train.copy()
df_train[‘LABEL’] = y_train
df_test = X_test.copy()
df_test[‘LABEL’] = y_test

Now we are running PyCaret to build multiple models and compare them in Recall as a statistical indicator.

top5_results = compare_models(n_select=5, sort="Recall")
Figure 1 – Comparison of PyCaret models on recall

Our top model is a random forest classification with a recall of 0.9, which we can design here.

selected_model = top5_results[0]
plot_model(selected_model)
Figure 2 – AUC for the selected ROC lines of the model

Now, we are writing 10 lines of code to build a RAI panel using data frame features and models generated from PyCaret.

cat_cols = [‘Gender_Male’, ‘Married_Yes’, ‘Dependents_0’, ‘Dependents_1’, ‘Dependents_2’, ‘Dependents_3+’, ‘Education_Not Graduate’, ‘Self_Employed_Yes’, ‘Credit_History_1.0’, ‘Property_Area_Rural’, ‘Property_Area_Semiurban’, ‘Property_Area_Urban’]
from raiwidgets import ResponsibleAIDashboard
from responsibleai import RAIInsights

rai_insights = RAIInsights(selected_model, df_train, df_test, ‘LABEL’, ‘classification’,

categorical_features=cat_cols)
rai_insights.explainer.add()
rai_insights.error_analysis.add()
rai_insights.causal.add(treatment_features=[‘Credit_History_1.0’, ‘Married_Yes’])
rai_insights.counterfactual.add(total_CFs=10, desired_class=’opposite’)
rai_insights.compute()

The above code, although very minimalist, does a lot of work under the hood. It creates insights about the RAI for classification and adds modules for explaining and analyzing errors. A causal analysis is then performed based on two characteristics of the treatment, including credit history and marital status. A counter-analysis will also be performed for 10 scenarios. Now let’s generate the control panel.

ResponsibleAIDashboard(rai_insights)

The code mentioned above starts the control panel on a port like 5000. On the local machine you can go directly to http: // localhost: 5000 and see the control panel. To see this control panel in Google Colab you need to perform a simple trick.

from google.colab.output import eval_js

print(eval_js(“google.colab.kernel.proxyPort(5000)”))

This will give you the URL to view the RAI panel. You can see some of the RAI panel components below. Here are some key results of this analysis that were automatically generated by PyCaret to complete the AutoML analysis.

Results: AI Responsible Report

Error analysis: We see that the error rate is high for rural property areas and our model has a negative bias for this feature.

Global description – the importance of character: We see that the importance of character remains in both groups – all data (blue) and property zone in rural (orange). We see for the orange group that the property zone has a greater impact, but still, the credit history is a factor of №1.

Local description: We see that credit history is also an important feature for individual prediction – line №20.

Anti-factual analysis: We see that in the same line №20 the decision from N to Y is possible (based on the data) in case of changing the loan date and loan amount.

Causal conclusion: We analyze the reasons for studying the effects of the two treatments, credit history, and employment status, and see that credit history has a greater impact on approval.

The report in charge of AI analysis, which shows the analysis of model errors, explanations, reasoning conclusions, and contradictory facts, can add great value to traditional statistical measurement measures that we typically use as levers to evaluate models. With modern tools like PyCaret and RAI panels, creating these reports is easy. These reports can be prepared using other tools – it is important that data scientists evaluate the models of these samples in the responsible AI to make sure that their models are ethical and accurate.

Dattaraj Rao is the Chief Information Officer at Persistent.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including technical people who work with data, can share insights and innovations related to data.

If you would like to learn about cutting-edge and up-to-date ideas, best practices and future data and data technologies, join us at DataDecisionMakers.

You might even consider submitting your own article!

Read more from DataDecisionMakers

Leave a Comment