Computing for Data Science Final

A repository for the final submission in the Computing for Data Science class at BSE.

Group members:

•⁠ ⁠Hannes Schiemann
•⁠ ⁠Julian Romero
•⁠ ⁠Lucia Sauer
•⁠ ⁠Moritz Peist

Repository structure

api - FastAPI with one endpoint to make predictions.
app - Streamlit application folder for demonstration.
data - Contains the training data.
custom_library
- src
  - fifa_library
    - eda - Quick exploratory analysis of the data.
    - model - Whole pipeline including feature creation, preprocessing, model fitting, hyperparameter tuning, model evaluation, and prediction.
    - tests - Unit tests for preprocessing and feature creation.

How to Run the Project

Prerequisites

Ensure you have Docker Desktop installed on your system. It provides the Docker environment required to build and run the services.

Steps to Run

Open your terminal and navigate to the project folder containing the docker-compose.yml file.
Execute the following command:
```
docker-compose up
```
This command will build the Docker images (if not already built) and start the services.

Accessing the Services

Once the services are running, you can access them via your browser:

Streamlit App:
Access the Streamlit application at http://localhost:8501.
FastAPI Swagger UI:
Explore the FastAPI endpoints using the Swagger UI at http://localhost:8000/docs.

The Fifa Library

The following is going to outline the design philosophy, scalability considerations, and best practices for contributing to the project's library. It also provides guidelines for adapting the library to work with different versions of the FIFA dataset.

Overview

The project is structured around a pipeline of well-defined, modular preprocessors. Each preprocessor is responsible for a specific set of transformations on the input data. By adhering to clear interface contracts and separation of concerns, the library remains extensible and maintainable as it evolves.

Installing Fifa Library

To be able to run main.ipynb notebook, it is necesary to create a new environment, install the requirements.txt and also the custom FIFA library. For installing the custom library locally, follow these steps:

Open a terminal and navigate to the custom_library directory:
```
cd custom_library
```
Install the library using pip:
```
pip install .
```

This will install the library and make its modules available for use in your Python environment. Ensure that your Python environment is activated before running the command.

Data and Aim

The data set is from EA's football computer game FIFA 2021. The aim is to predict the position of the player. The initial data set had a fine-granular structure of 24 positions. For simplicity, we reduced the number of positions to 8 reflecting the key positions in the game:

Design Principles

1.⁠ ⁠Modularity:
Each transformation step should be encapsulated in its own class. This ensures:

Clear separation of responsibilities.
Easier debugging, testing, and maintenance.
The ability to mix and match preprocessors as needed.

2.⁠ ⁠Scalability:
The codebase is expected to grow in terms of:

Number of Preprocessors: As new feature engineering ideas are introduced, they can be implemented as new classes or integrated into existing ones.
Variety of Features: The ⁠ FeatureCreation ⁠ class uses feature flags to enable or disable certain transformations. Adding a new feature involves:
- Defining it in a separate method.
- Adding a feature flag.
- Updating ⁠ get_feature_names_out ⁠ and the input validation logic accordingly.
Number of Models and Metrics: Although the ⁠ FeatureCreation ⁠ class focuses on feature engineering, the same principles apply across the pipeline. Models and metrics should follow a similar, modular approach.

3.⁠ ⁠Flexibility and Extensibility:
The ⁠ FeatureCreation ⁠ class allows customization through:

Feature Flags: Users can enable or disable sets of transformations.
Traits Mapping: Users can provide a custom ⁠ traits_mapping ⁠ dictionary to handle different trait sets or adapt to changes in the dataset’s trait definitions.

4.⁠ Consistency with scikit-learn Conventions:
By inheriting from ⁠ BaseEstimator ⁠ and ⁠ TransformerMixin ⁠, the ⁠ FeatureCreation ⁠ class integrates seamlessly into a sklearn pipeline. This ensures:

Familiar APIs for users.
Compatibility with tools like ⁠ GridSearchCV ⁠, ⁠ Pipeline ⁠, and ⁠ FeatureUnion ⁠.
Easy and fast extensibility and scalin-up or down of any steps within the pipeline.

How the ⁠ FeatureCreation ⁠ Class Was Implemented

•⁠ ⁠Feature Flags:
The class accepts a ⁠ feature_flags ⁠ dictionary at initialization. This dictionary controls which features are computed. For example:

feature_flags = {
    "ratio_features": True,
    "foot_pace": True,
    "wide_player": True,
    "playmaker": True,
    "traits": True,
}

Adapting to different versions of FIFA

In general we expect the structure of player stats not to change dramatically between versions, so in principle the library should still be applicable. However, some new attributes may be introduced, old ones removed, or the names could be changed. To adapt the library to a new version:

•⁠ Check for Renamed or Missing Columns: Review the dataset schema. If attributes like power_stamina or movement_balance are renamed or removed, you must:

Update the FeatureCreation class’s input validation (_validate_feature_requirements) to reflect the new names.
Adjust the feature computation methods to use the correct columns.
- or turn off the feature creation involving non-existant features

•⁠ Player_traits: We expect the encoding of player_traits to be especially sensible to changing the version of FIFA. In case new traits are added or old ones removed, the mapping of these traits can be updated and passed as a dictionary

Testing

We only test on preprocessing and feature engineering for now and reach a 100% coverage. To run the tests:

Navigate to the folder custom_library
Execute the following command in the terminal:
```
pytest --cov --cov-report=html:coverage_re
```
This outputs a coverage report in HTML format under coverage_re

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computing for Data Science Final

Repository structure

How to Run the Project

Prerequisites

Steps to Run

Accessing the Services

The Fifa Library

Overview

Installing Fifa Library

Data and Aim

Design Principles

How the ⁠ FeatureCreation ⁠ Class Was Implemented

Adapting to different versions of FIFA

Testing

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
api		api
app		app
custom_library		custom_library
data		data
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
main.ipynb		main.ipynb
requirements.txt		requirements.txt

m9o8/cfds_final

Folders and files

Latest commit

History

Repository files navigation

Computing for Data Science Final

Repository structure

How to Run the Project

Prerequisites

Steps to Run

Accessing the Services

The Fifa Library

Overview

Installing Fifa Library

Data and Aim

Design Principles

How the ⁠ FeatureCreation ⁠ Class Was Implemented

Adapting to different versions of FIFA

Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

How the ⁠ FeatureCreation ⁠ Class Was Implemented

Packages