Skip to content

Extensibility#

Collaboration#

A feature in machine learning is an individual measurable property or characteristic of the data you’re analyzing. Think of it as a specific piece of information about each item in your dataset. For example, if you're looking at a dataset of houses, features could include the size of the house, the number of bedrooms, the age of the house, and its location. Features are used by machine learning models to understand patterns in the data and make predictions or decisions. Each feature provides a different aspect of information that helps the model learn and perform better.

In the platform, a feature is computed by a Python script. The script's author specifies variables that need to be provided by the user. For instance, if the feature represents OHLC (Open, High, Low, Close) bars, the variables might include the interval (such as second, minute, hour, or day) and the count. These variables are defined in the docstring of the script, along with their types and tooltips for user guidance. The platform then presents these variables to the user through an intuitive web interface. The user fills in the required information, names the feature, and saves it. Once saved, the feature can be easily recalled by the platform for various purposes, such as computing features for building historical data, preparing real-time streaming data for predictions, or aiding a data engineer in QA or other data analysis tasks. This process allows for seamless integration of user-defined features into the machine learning workflow, ensuring consistency and ease of use.

1. The data scientist programmer specified what inputs the feature would need from the user via the docstring.

alt text

2. The platform presents those variable inputs via the graphical interface

Features variables are surfaced in the platform with rich user-interface support; strict types, drop-down lists, required vs optional parameters, and UI hints. Here we see the variable inputs as specified by the feature script unit_bar.py.

alt text

3. The user provides the feature variables

This provides an easy to intuit playground for SMEs and data scientist to interact at a higher layer of abstraction than code, which can often be distracting. Feature parameters variations can be instantly previewed against random points in time within the dataset without having to run a costly build. Here the user has filled in the feature variables, named the feature 90_day_bar, and saved the feature.

alt text

4. The feature as function, along with the variables are accessible to the platform for transforming historical training data, transforming streaming data, or for automation via script.

Here we are loading the feature as a function.

alt text

Here are the feature variables which were set by the user via the UI.

alt text

Here, given a data adapter, the feature computes it's value for a given index.

alt text