A privacy-preserving federated learning system for medical diagnostics using Flower framework, with MLflow tracking and Docker containerization.
- Privacy-Preserving Learning: Train machine learning models collaboratively without sharing raw patient data
- Distributed Training: Support for multiple clients training simultaneously
- MLflow Integration: Track experiments, metrics, and model performance
- Docker Support: Easy deployment and distribution of both server and client components
- Ngrok Integration: Secure tunneling for remote client connections
- Real-time Monitoring: Track training progress and model performance
The system consists of three main components:
- FL Server: Coordinates the training process and aggregates model updates
- FL Clients: Train models on local data and share only model parameters
- MLflow Server: Tracks experiments and visualizes results
- Docker
- Python 3.9+
- Ngrok account (for server deployment)
- Clone the repository:
git clone https://github.com/yourusername/MedHive-FL.git
cd MedHive-FL- Set up environment variables:
# Create .env file with your Ngrok auth token
echo "NGROK_AUTHTOKEN=your_token_here" > .env- Start the server:
docker compose up -dThe server will start and expose:
- MLflow UI on port 8080
- FL Server with a public Ngrok URL (check logs for URL)
- Build the client Docker image:
docker build -t fl-client -f Dockerfile .- Run a client instance:
docker run -e SERVER_ADDRESS="<ngrok_url>" -e CLIENT_ID="<unique_id>" fl-client- Algorithm: Logistic Regression
- Training Strategy: FedAvg (Federated Averaging)
- Evaluation Metrics:
- Accuracy
- Loss
- Training rounds completion
Access the MLflow dashboard to monitor:
- Training progress
- Model metrics per round
- Aggregate model performance
- Client participation
MedHive-FL/
├── client.py # FL client implementation
├── server.py # FL server implementation
├── docker-compose.yml # Server orchestration
├── Dockerfile # Client container definition
├── requirements.txt # Python dependencies
└── data/
├── data.csv # Medical dataset
└── task.py # Data loading and model definitions
Server configuration options:
num-server-rounds: Number of training rounds (default: 5)penalty: Regularization type (default: "l2")local-epochs: Client-side training epochs (default: 5)min-available-clients: Minimum clients for training (default: 2)
The system tracks:
- Round-wise accuracy
- Loss metrics
- Client participation
- Training completion status
- Raw data never leaves client devices
- Only model parameters are shared
- Secure communication via Ngrok tunneling
- Client authentication support
All dependencies are listed in requirements.txt. Key dependencies:
- flower>=1.0.0
- scikit-learn>=1.0.2
- mlflow>=2.3.0
- pyngrok>=6.0.0
- pandas>=1.3.0
Contributions welcome! Please read our contributing guidelines and submit pull requests.
This project is licensed under the terms of the LICENSE file included in the repository.
- Flower Framework team
- MLflow community
- Contributors and maintainers
For questions and support, please open an issue in the GitHub repository.