Skip to content

Expression translator should support multi-dimensional array indexing syntax #15

Description

@AbdealiLoKo

Hi, I have a scanrio where I need to use an array as a input column to my pipeline.
I'd reduced a minimal example of the issue I'm having:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn2pmml.preprocessing import ExpressionTransformer

df = pd.DataFrame({'c1': [1, 2, 3], 'c2': [[1,2], [1,2], [3,1]]})

pipeline = make_pipeline(
    ColumnTransformer(
        transformers=[
            (f'get_item_0_from_c2_array', ExpressionTransformer('X["c2"][0]'), ['c2'])
        ]
    ),
    LogisticRegression(),
)
pipeline.fit(df, [0, 0, 1])
pipeline.predict(df)

The above pipeline works fine in my jupyter notebook. But converting it to a PMML gives an error:

import sklearn2pmml

pmml_pipeline = sklearn2pmml.PMMLPipeline(steps=[
    ('pipeline',pipeline)
])

sklearn2pmml.sklearn2pmml(pmml_pipeline, './pipeline.pmml', debug=True)

Gives the error:

java.lang.IllegalArgumentException: Python expression 'X["c2"][0]' is either invalid or not supported
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:36)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:23)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:51)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.Composite.encodeModel(Composite.java:135)
	at sklearn.pipeline.PipelineClassifier.encodeModel(PipelineClassifier.java:86)
	at sklearn.Estimator.encode(Estimator.java:103)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
	at org.jpmml.sklearn.Main.run(Main.java:217)
	at org.jpmml.sklearn.Main.main(Main.java:143)
Caused by: org.jpmml.python.ParseException: Encountered unexpected token: "]" "]"
    at line 1, column 10.

Was expecting one of:

    ":"

	at org.jpmml.python.ExpressionTranslator.generateParseException(ExpressionTranslator.java:2110)
	at org.jpmml.python.ExpressionTranslator.jj_consume_token(ExpressionTranslator.java:1973)
	at org.jpmml.python.ExpressionTranslator.StringSlicingExpression(ExpressionTranslator.java:956)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:637)
	at org.jpmml.python.ExpressionTranslator.UnaryExpression(ExpressionTranslator.java:597)
	at org.jpmml.python.ExpressionTranslator.MultiplicativeExpression(ExpressionTranslator.java:538)
	at org.jpmml.python.ExpressionTranslator.AdditiveExpression(ExpressionTranslator.java:494)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:434)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:389)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:359)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:338)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:319)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:312)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:306)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:34)
	... 12 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions