Zephyr

Prediction engineering methods for wind turbine maintenance. Homepage: https://github.com/sintel-dev/zephyr

Overview

Homepage: https://github.com/sintel-dev/zephyr

The Zephyr library is a framework designed to assist in the generation of machine learning problems for wind farms operations data by analyzing past occurrences of events.

The main features of Zephyr are:

  • Data Processing: a set of tools designed to load preprocessed turbine operations data into pre-defined EntitySets for Prediction Engineering. Expected data includes turbines, alarms, stoppages, work orders, notifications, and turbine signals data such as PI or SCADA data.

  • Labeling Functions: a collection of functions, as well as tools to create custom versions of them, ready to be used to analyze past operations data in the search for occurrences of specific types of events in the past.

  • Prediction Engineering: a flexible framework designed to apply labeling functions on wind turbine operations data to create labels for custom Machine Learning problems.

Install

Requirements

Zephyr has been developed and runs on Python 3.7 and 3.8.

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where you are trying to run Zephyr.

Download and Install

Zephyr can be installed locally using pip with the following command:

pip install zephyr-ml

If you want to install from source or contribute to the project please read the Contributing Guide.

Quickstart

In this short tutorial we will guide you through a series of steps that will help you getting started with Zephyr.

1. Loading and preprocessing the Data

The first step we will be to load the data that we will need using the function and apply any necessary preprocessing. Zephyr expects the following dataframes to create an EntitySet:

  • Turbines

  • Alarms

  • Stoppages

  • Work orders

  • Notifications

  • Signal data (PI or SCADA)

Additionally, Zephyr requires that a unique turbine identifier be present in all dataframes (by default, Zephyr uses COD_ELEMENT as the identifier). A unique work order identifier must also be present in the work order dataframe and the notifications dataframe (Zephyr uses COD_ORDER by default).

NOTE: if you cloned the Zephyr repository, you will find some demo data inside the notebooks/data folder. All of it has been pre-processed into the appropriate format, except for the pidata.csv which needs pivoted so that the tag column is converted to columns in the table.

from os import path
import pandas as pd
from zephyr_ml import create_scada_entityset

data_path = 'notebooks/data'

data = {
  'turbines': pd.read_csv(os.path.join(data_path, 'turbines.csv')),
  'alarms': pd.read_csv(os.path.join(data_path, 'alarms.csv')),
  'work_orders': pd.read_csv(os.path.join(data_path, 'work_orders.csv')),
  'stoppages': pd.read_csv(os.path.join(data_path, 'stoppages.csv')),
  'notifications': pd.read_csv(os.path.join(data_path, 'notifications.csv')),
  'scada': pd.read_csv(os.path.join(data_path, 'scada.csv'))
}
scada_es = create_scada_entityset(data)

This will load the turbine, alarms, stoppages, work order, notifications, and SCADA data, and return it as an EntitySet.

Entityset: SCADA data
  DataFrames:
    turbines [Rows: 1, Columns: 10]
    alarms [Rows: 2, Columns: 9]
    work_orders [Rows: 2, Columns: 20]
    stoppages [Rows: 2, Columns: 16]
    notifications [Rows: 2, Columns: 15]
    scada [Rows: 2, Columns: 5]
  Relationships:
    alarms.COD_ELEMENT -> turbines.COD_ELEMENT
    stoppages.COD_ELEMENT -> turbines.COD_ELEMENT
    work_orders.COD_ELEMENT -> turbines.COD_ELEMENT
    scada.COD_ELEMENT -> turbines.COD_ELEMENT
    notifications.COD_ORDER -> work_orders.COD_ORDER

2. Selecting a Labeling Function

The second step will be to choose an adequate Labeling Function.

We can see the list of available labeling functions using the zephyr_ml.labeling.get_labeling_functions function.

from zephyr_ml import labeling

labeling.get_labeling_functions()

This will return us a dictionary with the name and a short description of each available function.

{'brake_pad_presence': 'Calculates the total power loss over the data slice.',
 'converter_replacement_presence': 'Calculates the converter replacement presence.',
 'total_power_loss': 'Calculates the total power loss over the data slice.'}

In this case, we will choose the total_power_loss function, which calculates the total power loss for each turbine over our time window.

3. Generate Target Times

Once we have loaded the data and the Labeling Function, we are ready to start using the zephyr_ml.generate_labels function to generate a Target Times table.

from zephyr_ml import DataLabeler

data_labeler = DataLabeler(labeling.labeling_functions.total_power_loss)
target_times, _ = data_labeler.generate_label_times(scada_es)

This will return us a composeml.LabelTimes containing the three columns required to start working on a Machine Learning problem: the turbine ID (COD_ELEMENT), the cutoff time (time) and the label.

   COD_ELEMENT       time    label
0            0 2022-01-01  45801.0

What's Next?

If you want to continue learning about Zephyr and all its features please have a look at the tutorials found inside the notebooks folder.

Last updated