Your own application
Getting set up with Pyreal using your own data and ML model.
The full code for this guide can be found in our user guide tutorial.
In the previous Quickstart guide, you learned how to use RealApps to use and understand your ML models. In this guide, we will cover the basics of creating a RealApp for your own application.
Everything covered in this quickstart is discussed in more detail in our user guides.
Problem setup
In this guide, we will be using data about houses, and looking at an ML model that predicts the price of houses.
Data Preparation and Modeling
Data
Pyreal expects data in the format of Pandas DataFrames. Each row refers to one data instance (a person, place, thing, or entity), and each column refers to a feature, or piece of information about that instance. Column headers are the names of feature. Each instance may optionally have an instance ID, which can either be stored as the DataFrame's indices (row IDs) or as a separate column.
In the code below, we load in some data and split it into training and test sets
Our sample dataset looks like:
LotArea | Neighborhood | OverallQuality | YearBuilt | Material | BasementSize | CentralAir |
---|---|---|---|---|---|---|
12589.0 | Gilbert | 6 | 2005 | Vinyl Siding | 728.0 | True |
9100.0 | Brookside | 5 | missing | Vinyl Siding | 944.0 | True |
10125.0 | Mitchell | missing | 1977 | Plywood | 483.0 | False |
Transformers
RealApps expect a list of Pyreal transformers, which they use to prepare the data for making model predictions. By passing in original data and these transformers separately, you can get explanations of the data presented in the original, understandable and non-transformed format.
To prepare the data for the model, we need to one-hot encode categorical features (or, replace features that take a set number of string category values with a series of Boolean features, one per category), impute features to fill in the missing values, and scale features so they are all on the same numeric scale.
Modeling
We can now transform our training and testing data, and initialize, train, and evaluate our ML model.
In this guide, we will use LightGBM, a powerful and lightweight library that offers classfiers and regressors using the gradient boosting framework. It is an effective choice for many ML use cases.
Creating and Using your RealApp
Creating a RealApp is easy once you have the required components. We will add two additional inputs to make our outputs easier to read: a dictionary of feature names (our data column names) to readable descriptions, and a format function that converts floats to formatted dollar amounts.
You can now use the .predict
and .produce
functions to use and understand your ML model
Sample output:
Predicted price for House 101: $127,285.59
Sample output:
Feature Name | Feature Value | Contribution | Average/Mode |
---|---|---|---|
Lot size in square feet | 9937 | 1137.73 | 10847.56 |
Original construction date | 1965 | -3514.96 | 1981 |
... | ... | ... | ... |
Sample output:
Last updated