Skip to content

Predicting Football Player Positions: A Dynamic Streamlit Implementation - A repository for the final submission in the Computing for Data Science class at BSE.

Notifications You must be signed in to change notification settings

m9o8/cfds_final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computing for Data Science Final

A repository for the final submission in the Computing for Data Science class at BSE.

Group members:

•⁠ ⁠Hannes Schiemann
•⁠ ⁠Julian Romero
•⁠ ⁠Lucia Sauer
•⁠ ⁠Moritz Peist

Repository structure

  • api - FastAPI with one endpoint to make predictions.
  • app - Streamlit application folder for demonstration.
  • data - Contains the training data.
  • custom_library
    • src
      • fifa_library
        • eda - Quick exploratory analysis of the data.
        • model - Whole pipeline including feature creation, preprocessing, model fitting, hyperparameter tuning, model evaluation, and prediction.
        • tests - Unit tests for preprocessing and feature creation.

How to Run the Project

Prerequisites

Ensure you have Docker Desktop installed on your system. It provides the Docker environment required to build and run the services.

Steps to Run

  1. Open your terminal and navigate to the project folder containing the docker-compose.yml file.

  2. Execute the following command:

    docker-compose up

    This command will build the Docker images (if not already built) and start the services.

Accessing the Services

Once the services are running, you can access them via your browser:

The Fifa Library

The following is going to outline the design philosophy, scalability considerations, and best practices for contributing to the project's library. It also provides guidelines for adapting the library to work with different versions of the FIFA dataset.

Overview

The project is structured around a pipeline of well-defined, modular preprocessors. Each preprocessor is responsible for a specific set of transformations on the input data. By adhering to clear interface contracts and separation of concerns, the library remains extensible and maintainable as it evolves.

Installing Fifa Library

To be able to run main.ipynb notebook, it is necesary to create a new environment, install the requirements.txt and also the custom FIFA library. For installing the custom library locally, follow these steps:

  1. Open a terminal and navigate to the custom_library directory:

    cd custom_library
  2. Install the library using pip:

    pip install .

This will install the library and make its modules available for use in your Python environment. Ensure that your Python environment is activated before running the command.

Data and Aim

The data set is from EA's football computer game FIFA 2021. The aim is to predict the position of the player. The initial data set had a fine-granular structure of 24 positions. For simplicity, we reduced the number of positions to 8 reflecting the key positions in the game:

Football pitch

Design Principles

1.⁠ ⁠Modularity:
Each transformation step should be encapsulated in its own class. This ensures:

  • Clear separation of responsibilities.
  • Easier debugging, testing, and maintenance.
  • The ability to mix and match preprocessors as needed.

2.⁠ ⁠Scalability:
The codebase is expected to grow in terms of:

  • Number of Preprocessors: As new feature engineering ideas are introduced, they can be implemented as new classes or integrated into existing ones.
  • Variety of Features: The ⁠ FeatureCreation ⁠ class uses feature flags to enable or disable certain transformations. Adding a new feature involves:
    • Defining it in a separate method.
    • Adding a feature flag.
    • Updating ⁠ get_feature_names_out ⁠ and the input validation logic accordingly.
  • Number of Models and Metrics: Although the ⁠ FeatureCreation ⁠ class focuses on feature engineering, the same principles apply across the pipeline. Models and metrics should follow a similar, modular approach.

3.⁠ ⁠Flexibility and Extensibility:
The ⁠ FeatureCreation ⁠ class allows customization through:

  • Feature Flags: Users can enable or disable sets of transformations.
  • Traits Mapping: Users can provide a custom ⁠ traits_mapping ⁠ dictionary to handle different trait sets or adapt to changes in the dataset’s trait definitions.

4.⁠ Consistency with scikit-learn Conventions:
By inheriting from ⁠ BaseEstimator ⁠ and ⁠ TransformerMixin ⁠, the ⁠ FeatureCreation ⁠ class integrates seamlessly into a sklearn pipeline. This ensures:

  • Familiar APIs for users.
  • Compatibility with tools like ⁠ GridSearchCV ⁠, ⁠ Pipeline ⁠, and ⁠ FeatureUnion ⁠.
  • Easy and fast extensibility and scalin-up or down of any steps within the pipeline.

How the ⁠ FeatureCreation ⁠ Class Was Implemented

•⁠ ⁠Feature Flags:
The class accepts a ⁠ feature_flags ⁠ dictionary at initialization. This dictionary controls which features are computed. For example:

feature_flags = {
    "ratio_features": True,
    "foot_pace": True,
    "wide_player": True,
    "playmaker": True,
    "traits": True,
}

Adapting to different versions of FIFA

In general we expect the structure of player stats not to change dramatically between versions, so in principle the library should still be applicable. However, some new attributes may be introduced, old ones removed, or the names could be changed. To adapt the library to a new version:

•⁠ Check for Renamed or Missing Columns: Review the dataset schema. If attributes like power_stamina or movement_balance are renamed or removed, you must:

  • Update the FeatureCreation class’s input validation (_validate_feature_requirements) to reflect the new names.
  • Adjust the feature computation methods to use the correct columns.
    • or turn off the feature creation involving non-existant features

•⁠ Player_traits: We expect the encoding of player_traits to be especially sensible to changing the version of FIFA. In case new traits are added or old ones removed, the mapping of these traits can be updated and passed as a dictionary

Testing

We only test on preprocessing and feature engineering for now and reach a 100% coverage. To run the tests:

  1. Navigate to the folder custom_library

  2. Execute the following command in the terminal:

    pytest --cov --cov-report=html:coverage_re
  3. This outputs a coverage report in HTML format under coverage_re

About

Predicting Football Player Positions: A Dynamic Streamlit Implementation - A repository for the final submission in the Computing for Data Science class at BSE.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •