Skip to content

Copilot based solution for some of the tasks.#2

Open
Michal-Fularz wants to merge 1 commit intomainfrom
feature/copilot-tests
Open

Copilot based solution for some of the tasks.#2
Michal-Fularz wants to merge 1 commit intomainfrom
feature/copilot-tests

Conversation

@Michal-Fularz
Copy link
Copy Markdown
Member

No description provided.

@Michal-Fularz
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the machine learning lab script with new functions for anomaly detection, hyperparameter tuning, and ensemble methods. The review feedback identifies a critical issue in the LocalOutlierFactor implementation, which incorrectly uses fit_predict on test data and requires the novelty=True parameter to function correctly for model comparison. Additionally, the reviewer recommends refactoring the code to avoid redundant network requests for the same dataset and removing large blocks of commented-out code to improve maintainability.

# Define the anomaly detection methods
methods = {
'IsolationForest': ensemble.IsolationForest(contamination=0.1, random_state=42),
'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To use LocalOutlierFactor for novelty detection (predicting on unseen data like the test set or the mesh grid), you must set novelty=True. Without this parameter, the model does not provide a predict method for new samples.

Suggested change
'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20),
'LocalOutlierFactor': neighbors.LocalOutlierFactor(n_neighbors=20, novelty=True),

Comment on lines +314 to +327
if method_name == 'LocalOutlierFactor':
method.fit(X_train_features)
y_pred_train = method.fit_predict(X_train_features)
y_pred_test = method.fit_predict(X_test_features)
else:
method.fit(X_train_features)
y_pred_train = method.predict(X_train_features)
y_pred_test = method.predict(X_test_features)

# Predict on mesh
if method_name == 'LocalOutlierFactor':
Z = method.fit_predict(np.c_[xx.ravel(), yy.ravel()])
else:
Z = method.predict(np.c_[xx.ravel(), yy.ravel()])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for LocalOutlierFactor is incorrect for a model comparison task. Calling fit_predict on the test set or the mesh grid ignores the training data and fits a new model on those specific samples. By setting novelty=True in the constructor, you can use the standard fit and predict methods consistently across all anomaly detection algorithms.

        method.fit(X_train_features)
        y_pred_test = method.predict(X_test_features)
        
        # Predict on mesh
        Z = method.predict(np.c_[xx.ravel(), yy.ravel()])


print_function_name()

X, y = datasets.fetch_openml('diabetes', as_frame=True, return_X_y=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The 'diabetes' dataset is fetched from OpenML multiple times across different functions (e.g., here and in todo_6). This is inefficient and makes the script dependent on a stable internet connection. Consider fetching the data once and passing it as an argument or using a caching mechanism.

Comment on lines +231 to +266
# print("\n\nLocalOutlierFactor with n_neighbors=20")
# lof = neighbors.LocalOutlierFactor(n_neighbors=20)
# y_pred_lof = lof.fit_predict(X_test_features)
#
# plt.figure(figsize=(10, 6))
# plt.scatter(
# X_test_features[y_pred_lof == 1, 0],
# X_test_features[y_pred_lof == 1, 1],
# c='green', label='Inliers', alpha=0.6
# )
# plt.scatter(
# X_test_features[y_pred_lof == -1, 0],
# X_test_features[y_pred_lof == -1, 1],
# c='orange', label='Outliers', alpha=0.6
# )
#
# Create decision boundary for LOF using contour
# h = 0.5
# x_min, x_max = X_test_features[:, 0].min() - 1, X_test_features[:, 0].max() + 1
# y_min, y_max = X_test_features[:, 1].min() - 1, X_test_features[:, 1].max() + 1
#
# xx, yy = np.meshgrid(
# np.arange(x_min, x_max, h),
# np.arange(y_min, y_max, h)
# )
#
# Z_lof = lof.fit_predict(np.c_[xx.ravel(), yy.ravel()])
# Z_lof = Z_lof.reshape(xx.shape)
#
# plt.contourf(xx, yy, Z_lof, alpha=0.3, levels=[-1, 0], colors=['orange'])
# plt.contourf(xx, yy, Z_lof, alpha=0.1, levels=[0, 1], colors=['green'])
#
# plt.xlabel('plas')
# plt.ylabel('mass')
# plt.title('LocalOutlierFactor (n_neighbors=20)')
# plt.legend()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This large block of commented-out code should be removed to improve readability. If these experiments are intended to be optional or for future reference, consider moving them to a separate script or a dedicated function. Note that the code also contains a bug in the LocalOutlierFactor usage (missing novelty=True).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant