-
Notifications
You must be signed in to change notification settings - Fork 93
Open
Description
Dataset from following kaggle challenge:
https://www.kaggle.com/c/acquire-valued-shoppers-challenge
Note that is enough to reproduce error using single table:
import pandas as pd
import auto_smart
import os.path
import time
import datetime
PREPROC = True
NROWS = None
TARGET = 'repeater'
DATE = 'offerdate'
if PREPROC:
#train & target
df_tr = pd.read_csv(os.path.join('data', 'train', 'trainHistory.csv'), nrows=NROWS)
df_tr_lbl = df_tr[[TARGET]]
df_tr_lbl[TARGET] = df_tr_lbl[TARGET].map({'f': 0, 't': 1})
df_tr_lbl = df_tr_lbl.rename(columns={TARGET: 'label'})
df_tr_lbl.to_csv(os.path.join('data', 'train', 'main_train.solution'), index=False)
df_tr = df_tr[df_tr.columns.difference([TARGET])]
df_tr = df_tr.drop(['repeattrips'], axis=1)
df_tr[DATE] = df_tr[DATE].apply(lambda s: time.mktime(datetime.datetime.strptime(s, '%Y-%m-%d').timetuple()))
df_tr.to_csv(os.path.join('data', 'train', 'main_train.data'), index=False, sep='\t')
# #offer:
# df_of = pd.read_csv(os.path.join('data', 'train', 'offers.csv'), nrows=NROWS)
# df_of.to_csv(os.path.join('data', 'train', 'offers.data'), index=False, sep='\t')
#transactions
# df_txs = pd.read_csv(os.path.join('data', 'train', 'transactions.csv'), nrows=NROWS)
# df_txs['date'] = df_txs['date'].apply(lambda s: time.mktime(datetime.datetime.strptime(s, '%Y-%m-%d').timetuple()))
# df_txs.to_csv(os.path.join('data', 'train', 'transactions.data'), index=False, sep='\t')
#test:
df_te = pd.read_csv(os.path.join('data', 'test', 'testHistory.csv'), nrows=NROWS)
df_te[DATE] = df_te[DATE].apply(lambda s: time.mktime(datetime.datetime.strptime(s, '%Y-%m-%d').timetuple()))
df_te.to_csv(os.path.join('data', 'test', 'main_test.data'), index=False, sep='\t')
print('info...')
info = auto_smart.read_info('data')
print('train...')
train_data, train_label = auto_smart.read_train('data', info)
print('test...')
test_data = auto_smart.read_test('data', info)
print('model...')
prd = auto_smart.train_and_predict(train_data, train_label, info, test_data)
print('finalizing...')
prd_df = pd.read_csv('sampleSubmission.csv')
prd_df['repeatProbability'] = prd
prd_df.to_csv('predictions.csv', index=False)
with following json configuration:
{
"time_budget": 300,
"time_col": "offerdate",
"start_time": 1550654179,
"tables": {
"main": {
"id": "cat",
"chain": "cat",
"offer": "cat",
"market": "cat",
"offerdate": "time"
}
},
"relations": []
}
I got following error:
'New categorical_feature is {}'.format(sorted(list(categorical_feature))))
--------------------total feat num:22, drop feat num:0
----------------End [LGBFeatureSelectionWait.fit]. Time elapsed: 0.56 sec.
----------------End time: 2020-02-11 06:21:35
----------------Start [LGBFeatureSelectionWait.transform]:
----------------Start time: 2020-02-11 06:21:35
----------------End [LGBFeatureSelectionWait.transform]. Time elapsed: 0.00 sec.
----------------End time: 2020-02-11 06:21:35
------------End [LGBFeatureSelectionWait.fit_transform]. Time elapsed: 0.56 sec.
------------End time: 2020-02-11 06:21:35
--------End [FeatEngine.fit_transform_keys_order2]. Time elapsed: 0.56 sec.
--------End time: 2020-02-11 06:21:35
--------Start [FeatEngine.fit_transform_keys_order3]:
--------Start time: 2020-02-11 06:21:35
Traceback (most recent call last):
File "/home/mglowacki/Desktop/AVR_kaggle/autosmart_avr.py", line 61, in <module>
prd = auto_smart.train_and_predict(train_data, train_label, info, test_data)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/__init__.py", line 71, in train_and_predict
return cmodel.predict(test_data)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/util.py", line 38, in timed
result = method(*args, **kw)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/model.py", line 358, in predict
self.my_fit(self.Xs, self.y, X_test)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/util.py", line 38, in timed
result = method(*args, **kw)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/model.py", line 156, in my_fit
feat_engine.fit_transform_keys_order3(main_table,y)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/util.py", line 38, in timed
result = method(*args, **kw)
File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/feat_engine.py", line 143, in fit_transform_keys_order3
for feat_cls in self.feat_pipeline.keys_order3s:
AttributeError: 'DefaultFeatPipeline' object has no attribute 'keys_order3s'
It is auto_smart issue, I've check file auto_smart/feat/feat_pipeline.py and there is no self.keys_order3s = ....
"Stop-error solution" for single table is set self.keys_order3s to self.keys_order2s, but different error appears when you add offers table (about signature mismatch) also it doesn't look right to me. Additional error could be related to this "stop-error solution" or completly independent thing.
Metadata
Metadata
Assignees
Labels
No labels