In the main feature extract loop, tsfeatures groups by the hard coded unique_id columns, and then applies transforms the grouped data.
|
ts_features = pool.starmap(partial_get_feats, ts.groupby('unique_id')) |
It would be more generic if you could pass in a Grouper to perform the grouping, i.e. at the moment I have to group my data then create a flat column from the multi-index (i.e. a column of tuples)
# group by id and day
grouper = [pd.Grouper(key='id'), pd.Grouper(key='time', freq='1D')]
grouped_data = df.groupby(grouper, group_keys=True)
# join groups, use grouper key as new index
grouped_data = grouped_data.apply(lambda x: x.drop(columns=['id']))
grouped_data = grouped_data.droplevel(-1)
# flatten index to tuples
grouped_data.index = grouped_data.index.to_flat_index()
grouped_data.index.name = 'id'
grouped_data = grouped_data.reset_index()
The issue I've had with that is that I've been experimenting with Dask and data formats like parquet don't seem to support this column type (you can create a Dask data frame from a pandas dataframe that contains tuple columns but so far I've been unable to persist them). I know tsfeatures doesn't support Dask at this stage but I guess it might be on the roadmap?
In the main feature extract loop, tsfeatures groups by the hard coded
unique_idcolumns, and then applies transforms the grouped data.tsfeatures/tsfeatures/tsfeatures.py
Line 916 in 5ce2ba7
It would be more generic if you could pass in a
Grouperto perform the grouping, i.e. at the moment I have to group my data then create a flat column from the multi-index (i.e. a column of tuples)The issue I've had with that is that I've been experimenting with Dask and data formats like parquet don't seem to support this column type (you can create a Dask data frame from a pandas dataframe that contains tuple columns but so far I've been unable to persist them). I know tsfeatures doesn't support Dask at this stage but I guess it might be on the roadmap?