The Binance Data Management System (BDMS) is a Python-based framework designed to streamline the management of cryptocurrency trading data from Binance. This system enables efficient downloading, conversion, merging, and updating to support research, algorithmic trading, and data-driven decision-making.
Note: This project is currently under development and may not be fully functional. Please check back for updates!
- Automated Data Population: Download historical data from Binance and create your own database.
- Data Merging: Combine the natively distributed data to create a unified dataset.
- Data Format Conversion: Seamlessly convert data between CSV, Parquet, and ZIP formats.
- Real-Time Data Updates: Update datasets using Binance's API to include the latest trades.
- Scalable Processing: Utilize multi-threading and on-disk operations for efficient parallel processing.
-
Clone the repository:
git clone https://github.com/lepremiere/bdms.git cd bdms -
Install the required dependencies:
pip install -r requirements.txt
-
Ensure you have API access keys from Binance for real-time updates.
Download historical data directly from Binance:
import warnings
from bdms import populate_database
warnings.filterwarnings("ignore", category=UserWarning)
if __name__ == "__main__":
populate_database(
root_dir="C:/Binance",
symbols=["BTCUSDT", "ETHUSDT"],
trading_types=["spot"],
market_data_types=["klines", "aggTrades"],
intervals=["1h"],
start_date="2023-01-01",
end_date="2023-12-31",
storage_format="parquet",
)- Does not support bookTicker and liquidationSnapshot data types since they are no longer supported by Binance.
- Automatically determines all valid combinations of the types and intervals provided.
- Sets the start_date automatically to earliest available date if not provided or specified to early. end_date is set to the current date if not provided.
- Tries to download all valid combinations for a given date range. If a combination is not available, it throws a UserWarning and skips the file. Therefore, suppressing UserWarning is recommended.
Convert files from one format to another:
from bdms.conversion import convert_files
convert_files(
folder="../Binance",
input_format="csv",
output_format="parquet",
walk=True, # Recursively search for files
delete_original=False, # Keep original files after conversion
)Merge all available data for a given combination into a single file:
from bdms.merge import merge_data
merge_data(
root_dir="../Binance",
symbols=["BTCUSDT"],
trading_types=["spot"],
market_data_types=["klines"],
intervals=["1h"],
output_format="parquet",
)- Automatically determines all valid combinations of the types and intervals provided.
Update aggregate trade data:
from bdms.update import update_aggTrades
update_aggTrades(
api_key="your_api_key",
api_secret="your_api_secret",
symbol="BTCUSDT",
path="../BTCUSDT.parquet",
write_interval=1000 # Write trades to file every 1000 trades
)- Supports updating aggTrades and single files only.
populate.py: Handles downloading and structuring historical data.conversion.py: Converts data between supported formats.merge.py: Merges data into unified files for analysis.update.py: Updates datasets with real-time trade data.utils.py: Utilities for common tasks like file handling and validation.enums.py: Contains constants for Binance API configurations.
- Python 3.8+
- Required libraries:
numpypandaspolarspyarrowtqdmmultiprocessingpython-binance
Install all dependencies using the provided requirements.txt.
Documentation will be available soon. Stay tuned!
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Submit a pull request for review.
This project is licensed under the MIT License. See the LICENSE file for details.
- Data provided by Binance.
- This repository is based on the code of the Binance-Team @ binance-public-data. Big thanks to them!
If you found this project useful, please consider giving it a star 🌟!