Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 18819

How to use featuretools at the test time?

$
0
0

I would demonstrate the issue with an example:

Let us say we want to use the primitive 'PERCENTILE'

Imports:

import pandas as pdimport featuretools as ft

For training (create a simple data with one column and let featuretools compute a percentile feature on top of it):

df_train = pd.DataFrame({'index':[1,2,3,4,5], 'val':[1,2,3,4,5]})es_train = ft.EntitySet("es_train")es_train.add_dataframe(df_train,'df')fm, fl = ft.dfs(entityset = es_train, trans_primitives=['percentile'], agg_primitives=[], target_dataframe_name='df')

output:

print(fm)       val  PERCENTILE(val)index                      1        1              0.22        2              0.43        3              0.64        4              0.85        5              1.0

So far everything is expected

Now, when I get an example with the value, say, 3, at the test time. I would want it translated to 0.6 as per the training data. But, that is not what happens

df_test = pd.DataFrame({'index':[1], 'val':[3]})es_test = ft.EntitySet("es_test")es_test.add_dataframe(df_test,'df')ft.calculate_feature_matrix(features = fl, entityset=es_test)

output:

       val  PERCENTILE(val)index                      1        3              1.0

So, metadata in feature definitions in fl that is the output of ft.dfs does not store train time stats needed to compute the features at the test time. This would throw any machine-learning model into a tailspin

What is the canonical way to apply featuretools at the test time?


Viewing all articles
Browse latest Browse all 18819

Trending Articles