Datasets#

`lit.sdk.data.datasets` #

This module provides methods for creating datasets that will be utilized during the build process.

`Dataset` #

`getitem(key)` #

Enable subscription (slicing/indexing) of datasets directly. Delegates to the adapter's getitem method.

Examples:

>>> ds = Dataset.from_team_and_name("contoso", "nvda")
>>> data = ds[-10:]  # Last 10 records
>>> data = ds[100:200]  # Records 100-199
>>> data = ds[50]  # Single record at index 50

`DatasetEvent` #

Bases: TypedDict

This class defines the structure for an event within a dataset.

`detail` `instance-attribute` #

Additional details or context about the event.

`timestamp` `instance-attribute` #

The Unix timestamp (in floating-point format) indicating when the event occurred.

`type` `instance-attribute` #

A string representing the type of the event.

`username` `instance-attribute` #

The Unix username of the individual responsible for triggering the event.

`add_path_to_dataset(team, name, path)` #

Add path to existing dataset

Parameters:

Name	Type	Description	Default
`team`	`str`	The team the dataset belongs to.	required
`name`	`str`	Name of the dataset the path is to be added.	required
`path`	`str`	Path of file to be added to dataset.	required

Returns:

Type	Description
`dict`	The dataset.

Examples:

>>> add_path_to_dataset("contoso", "my_ds", "/data/contoso/raw/sample.csv.gz")
{
    "name": "my_ds",
    "raw": ["/data/contoso/raw/sample.csv.gz"],
    "events": [{
        "type": "Added raw",
        "detail": "/data/contoso/raw/sample.csv.gz",
        "timestamp": 11729182007,
        "username": "lit_user"
    },
    {
        "type": "init",
        "detail": "began work on my_ds",
        "timestamp": 1729181556,
        "username": "lit_user"
    }],
}

`demo(team_name, name, feature_path, index, params)` #

Runs a feature demonstration on the specified dataset and returns the result.

Parameters:

Name	Type	Description	Default
`team_name`	`str`	The name of the team.	required
`name`	`str`	The name of the dataset.	required
`feature_path`	`str`	The path to the feature to test.	required
`index`	`int`	The data index within the dataset to use for the demo.	required
`params`	`dict`	A set of parameters to pass to the feature script.	required

Returns:

Type	Description
`dict`	The result of the feature demonstration; the timestamp, return data from the feature, and any UI hints.

Examples:

>>> demo(
...     "contoso",
...     "my_ds",
...     "/data/contoso/features/ohlcv.py",
...     19562810,
...     {"count": 5, "size": 1, "unit": "hour"},
... )
{'timestamp': 1493994825045691315,
    'data': array([[2.38490005e+02, 2.38559998e+02, 2.33226593e+02, 2.38500000e+02,
        3.71410000e+04, 8.73224400e+06, 2.35110626e+02],
        [2.38500000e+02, 2.38660004e+02, 2.38300003e+02, 2.38520004e+02,
        2.33910000e+04, 4.86324900e+06, 2.07911118e+02],
        [2.38528900e+02, 2.38770004e+02, 2.38210007e+02, 2.38500000e+02,
        2.87640000e+04, 6.10002200e+06, 2.12071411e+02],
        [2.38500000e+02, 2.38798996e+02, 2.38399994e+02, 2.38740005e+02,
        3.50160000e+04, 7.89611100e+06, 2.25500092e+02],
        [2.39190002e+02, 2.39309998e+02, 2.38839996e+02, 2.38860001e+02,
        2.14480000e+04, 4.43271800e+06, 2.06672791e+02]]),
'hints': {}}

`estimate(team_name, name, feature_path, count, params)` #

Estimates feature data for a specified dataset.

Parameters:

Name	Type	Description	Default
`team_name`	`str`	The name of the team.	required
`name`	`str`	The name of the dataset.	required
`feature_path`	`str`	The path to the feature script.	required
`count`	`int`	The number of samples to estimate.	required
`params`	`dict`	A set of parameters to pass to the feature script.	required

Returns:

Type	Description
`NDArray`	The estimated feature data as a NumPy array.

Examples:

>>> estimate(
...     "contoso",
...     "spy",
...     "/data/contoso/features/ohlcv.py",
...     5,
...     {"count": 5, "size": 1, "unit": "hour"},
... )
array([2.43907004e+02, 2.44300000e+02, 2.43257996e+02, 2.43632999e+02,
    7.06504000e+04, 1.72431610e+07, 2.39403308e+02])

`get_data(team_name, name, start, stop)` #

Retrieves data for a specified dataset within a team over a given range.

This function fetches data between the start and stop indices for the given dataset. The returned data is either a JSON string or a dictionary. If the data is a JSON string, it is parsed into a dictionary before being returned.

Parameters:

Name	Type	Description	Default
`team_name`	`str`	The name of the team.	required
`name`	`str`	The name of the dataset.	required
`start`	`int`	The starting index for the data retrieval.	required
`stop`	`int`	The stopping index for the data retrieval.	required

Returns:

Type	Description
`dict`	The data for the specified dataset and range, parsed as a dictionary.

Raises:

Type	Description
`TypeError`	If the returned data is not of type 'str' or 'dict'.

Examples:

>>> get_data("contoso", "my_ds", 0, 100)
{...}

`get_data_by_date(team_name, name, timestamp, aperture)` #

summary

Parameters:

Name	Type	Description	Default
`team_name`	`str`	The name of the team.	required
`name`	`str`	The name of the dataset.	required
`timestamp`	`float`	The timestamp around which data is to be retrieved.	required
`aperture`	`int`	The number of samples to retrieve on either side of the timestamp.	required

Returns:

Name	Type	Description
`dict`	`dict`	The data around the specified timestamp with the given aperture.

Examples:

>>> get_data_by_date("contoso", "my_ds", 1494858825, 10000)
{...}

`get_dataset(team, name)` #

Returns a dataset by name

Parameters:

Name	Type	Description	Default
`team`	`str`	The team the datasets belongs to.	required
`name`	`str`	Name of the dataset to be returned	required

Returns:

Type	Description
`dict`	The dataset.

Examples:

>>> get_dataset("contoso", "my_ds")
{
    "name": "my_ds",
    "raw": ["/data/contoso/raw/sample.csv.gz"],
    "events": [{
        "type": "Added raw",
        "detail": "/data/contoso/raw/sample.csv.gz",
        "timestamp": 11729182007,
        "username": "lit_user"
    },
    {
        "type": "init",
        "detail": "began work on my_ds",
        "timestamp": 1729181556,
        "username": "lit_user"
    }],
}

`get_sample_count(team_name, name)` #

Retrieves the sample count for a specified dataset within a team.

Parameters:

Name	Type	Description	Default
`team_name`	`str`	The name of the team.	required
`name`	`str`	The name of the dataset.	required

Returns:

Type	Description
`int`	The sample count for the specified dataset.

Examples:

>>> get_sample_count("contoso", "my_ds")
52042581

`init_dataset(team, name)` #

Create a new dataset

Parameters:

Name	Type	Description	Default
`team`	`str`	The team the dataset belongs to.	required
`name`	`str`	Name of the new dataset.	required

Returns:

Type	Description
`dict`	The dataset.

Examples:

>>> init_dataset("contoso", "my_ds")
{
    "name": "my_ds",
    "raw": [],
    "events": [{
        "type": "init",
        "detail": "began work on my_ds",
        "timestamp": 1729181556,
        "username": "lit_user"
    }],
}

`list_datasets(team)` #

Returns a list of dataset names

Parameters:

Name	Type	Description	Default
`team`	`str`	The team the datasets belongs to.	required

Returns:

Type	Description
`list[str]`	The collection of dataset names.

Examples:

>>> list_datasets("contoso")
["MSFT", "AAPL", "SPY"]

`remove_path_to_dataset(team, name, path)` #

Remove path from existing dataset

Parameters:

Name	Type	Description	Default
`team`	`str`	The team the dataset belongs to.	required
`name`	`str`	Name of the dataset the path is to be removed.	required
`path`	`str`	Path of file to be removed to dataset.	required

Returns:

Type	Description
`dict`	The dataset.

Examples:

>>> remove_path_to_dataset("contoso", "my_ds", "/data/contoso/raw/sample.csv.gz")
{
    "name": "my_ds",
    "raw": ["/data/contoso/raw/sample.csv.gz"],
    "events": [{
        "type": "Removed raw",
        "detail": "/data/contoso/raw/sample.csv.gz",
        "timestamp": 1729184479,
        "username": "lit_user"
        },
        {
            "type": "Added raw",
            "detail": "/data/contoso/raw/sample.csv.gz",
            "timestamp": 11729182007,
            "username": "lit_user"
        },
        {
            "type": "init",
            "detail": "began work on my_ds",
            "timestamp": 1729181556,
            "username": "lit_user"
    }],
}

Datasets#

lit.sdk.data.datasets #

Dataset #

__getitem__(key) #

DatasetEvent #

detail instance-attribute #

timestamp instance-attribute #

type instance-attribute #

username instance-attribute #

add_path_to_dataset(team, name, path) #

demo(team_name, name, feature_path, index, params) #

estimate(team_name, name, feature_path, count, params) #

get_data(team_name, name, start, stop) #

get_data_by_date(team_name, name, timestamp, aperture) #

get_dataset(team, name) #

get_sample_count(team_name, name) #

init_dataset(team, name) #

list_datasets(team) #

remove_path_to_dataset(team, name, path) #

`lit.sdk.data.datasets` #

`Dataset` #

`getitem(key)` #

`DatasetEvent` #

`detail` `instance-attribute` #

`timestamp` `instance-attribute` #

`type` `instance-attribute` #

`username` `instance-attribute` #

`add_path_to_dataset(team, name, path)` #

`demo(team_name, name, feature_path, index, params)` #

`estimate(team_name, name, feature_path, count, params)` #

`get_data(team_name, name, start, stop)` #

`get_data_by_date(team_name, name, timestamp, aperture)` #

`get_dataset(team, name)` #

`get_sample_count(team_name, name)` #

`init_dataset(team, name)` #

`list_datasets(team)` #

`remove_path_to_dataset(team, name, path)` #