Skip to content

HiGEPS Presentation Code (2) in 11/22 #6

Description

@Integral-Snake

欠損値の補完を行った後に、その性質を上手く利用することで新しいカラムを生成できる。

これが正解と強い相関を持つ情報となる可能性が示唆される。

import numpy as np
from sklearn.impute import KNNImputer

KNN補完を実施

imputer = KNNImputer(n_neighbors=5)
X_imputed = pd.DataFrame(imputer.fit_transform(X), columns=X.columns)

局所類似度指数の生成(補完値と近傍平均の差)

neighbors_mean = imputer.fit_transform(X)
local_similarity = np.abs(X_imputed - neighbors_mean).mean(axis=1)
X_imputed["local_similarity_index"] = local_similarity

多重補完による不確実性スコア

imputed_sets = [imputer.fit_transform(X) for _ in range(5)]
variance_score = np.var(imputed_sets, axis=0).mean(axis=1)
X_imputed["imputation_uncertainty_score"] = variance_score

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions