Datasets#
lit.sdk.data.datasets
#
This module provides methods for creating datasets that will be utilized during the build process.
DatasetEvent
#
Bases: TypedDict
This class defines the structure for an event within a dataset.
detail
instance-attribute
#
Additional details or context about the event.
timestamp
instance-attribute
#
The Unix timestamp (in floating-point format) indicating when the event occurred.
type
instance-attribute
#
A string representing the type of the event.
username
instance-attribute
#
The Unix username of the individual responsible for triggering the event.
add_path_to_dataset(team, name, path)
#
Add path to existing dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team
|
str
|
The team the dataset belongs to. |
required |
name
|
str
|
Name of the dataset the path is to be added. |
required |
path
|
str
|
Path of file to be added to dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The dataset. |
Examples:
>>> add_path_to_dataset("contoso", "my_ds", "/data/contoso/raw/sample.csv.gz")
{
"name": "my_ds",
"raw": ["/data/contoso/raw/sample.csv.gz"],
"events": [{
"type": "Added raw",
"detail": "/data/contoso/raw/sample.csv.gz",
"timestamp": 11729182007,
"username": "lit_user"
},
{
"type": "init",
"detail": "began work on my_ds",
"timestamp": 1729181556,
"username": "lit_user"
}],
}
demo(team_name, name, feature_path, index, params)
#
Runs a feature demonstration on the specified dataset and returns the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team_name
|
str
|
The name of the team. |
required |
name
|
str
|
The name of the dataset. |
required |
feature_path
|
str
|
The path to the feature to test. |
required |
index
|
int
|
The data index within the dataset to use for the demo. |
required |
params
|
dict
|
A set of parameters to pass to the feature script. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The result of the feature demonstration; the timestamp, return data from the feature, and any UI hints. |
Examples:
>>> demo(
... "contoso",
... "my_ds",
... "/data/contoso/features/ohlcv.py",
... 19562810,
... {"count": 5, "size": 1, "unit": "hour"},
... )
{'timestamp': 1493994825045691315,
'data': array([[2.38490005e+02, 2.38559998e+02, 2.33226593e+02, 2.38500000e+02,
3.71410000e+04, 8.73224400e+06, 2.35110626e+02],
[2.38500000e+02, 2.38660004e+02, 2.38300003e+02, 2.38520004e+02,
2.33910000e+04, 4.86324900e+06, 2.07911118e+02],
[2.38528900e+02, 2.38770004e+02, 2.38210007e+02, 2.38500000e+02,
2.87640000e+04, 6.10002200e+06, 2.12071411e+02],
[2.38500000e+02, 2.38798996e+02, 2.38399994e+02, 2.38740005e+02,
3.50160000e+04, 7.89611100e+06, 2.25500092e+02],
[2.39190002e+02, 2.39309998e+02, 2.38839996e+02, 2.38860001e+02,
2.14480000e+04, 4.43271800e+06, 2.06672791e+02]]),
'hints': {}}
estimate(team_name, name, feature_path, count, params)
#
Estimates feature data for a specified dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team_name
|
str
|
The name of the team. |
required |
name
|
str
|
The name of the dataset. |
required |
feature_path
|
str
|
The path to the feature script. |
required |
count
|
int
|
The number of samples to estimate. |
required |
params
|
dict
|
A set of parameters to pass to the feature script. |
required |
Returns:
| Type | Description |
|---|---|
NDArray
|
The estimated feature data as a NumPy array. |
Examples:
get_data(team_name, name, start, stop)
#
Retrieves data for a specified dataset within a team over a given range.
This function fetches data between the start and stop indices for the given dataset.
The returned data is either a JSON string or a dictionary. If the data is a JSON string,
it is parsed into a dictionary before being returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team_name
|
str
|
The name of the team. |
required |
name
|
str
|
The name of the dataset. |
required |
start
|
int
|
The starting index for the data retrieval. |
required |
stop
|
int
|
The stopping index for the data retrieval. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The data for the specified dataset and range, parsed as a dictionary. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the returned data is not of type 'str' or 'dict'. |
Examples:
get_data_by_date(team_name, name, timestamp, aperture)
#
summary
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team_name
|
str
|
The name of the team. |
required |
name
|
str
|
The name of the dataset. |
required |
timestamp
|
float
|
The timestamp around which data is to be retrieved. |
required |
aperture
|
int
|
The number of samples to retrieve on either side of the timestamp. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The data around the specified timestamp with the given aperture. |
Examples:
get_dataset(team, name)
#
Returns a dataset by name
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team
|
str
|
The team the datasets belongs to. |
required |
name
|
str
|
Name of the dataset to be returned |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The dataset. |
Examples:
>>> get_dataset("contoso", "my_ds")
{
"name": "my_ds",
"raw": ["/data/contoso/raw/sample.csv.gz"],
"events": [{
"type": "Added raw",
"detail": "/data/contoso/raw/sample.csv.gz",
"timestamp": 11729182007,
"username": "lit_user"
},
{
"type": "init",
"detail": "began work on my_ds",
"timestamp": 1729181556,
"username": "lit_user"
}],
}
get_sample_count(team_name, name)
#
Retrieves the sample count for a specified dataset within a team.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team_name
|
str
|
The name of the team. |
required |
name
|
str
|
The name of the dataset. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The sample count for the specified dataset. |
Examples:
init_dataset(team, name)
#
Create a new dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team
|
str
|
The team the dataset belongs to. |
required |
name
|
str
|
Name of the new dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The dataset. |
Examples:
list_datasets(team)
#
remove_path_to_dataset(team, name, path)
#
Remove path from existing dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
team
|
str
|
The team the dataset belongs to. |
required |
name
|
str
|
Name of the dataset the path is to be removed. |
required |
path
|
str
|
Path of file to be removed to dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The dataset. |
Examples:
>>> remove_path_to_dataset("contoso", "my_ds", "/data/contoso/raw/sample.csv.gz")
{
"name": "my_ds",
"raw": ["/data/contoso/raw/sample.csv.gz"],
"events": [{
"type": "Removed raw",
"detail": "/data/contoso/raw/sample.csv.gz",
"timestamp": 1729184479,
"username": "lit_user"
},
{
"type": "Added raw",
"detail": "/data/contoso/raw/sample.csv.gz",
"timestamp": 11729182007,
"username": "lit_user"
},
{
"type": "init",
"detail": "began work on my_ds",
"timestamp": 1729181556,
"username": "lit_user"
}],
}