A robust Python utility to extract transaction data from Crédit Mutuel bank statement PDFs, validate data integrity, and export to structured formats (JSON/CSV) or Google Sheets.
- Automated Extraction: Parses transaction dates, descriptions, and amounts from multiple accounts per PDF.
- Balance Validation: Computes the sum of transactions and cross-references them with the starting and ending balances provided in the statement.
- Strict CLI: Explicit input file list and mandatory
--outputflag (with.csvor.jsonvalidation). - French Format Support: Handles French number formatting (e.g.,
1.234,56or1 234,56). - Structured Logging: Uses the Python
loggingmodule for clean, professional output and error reporting. - Automation: Includes a
Justfilefor common tasks likerunandclean. - Account Mapping: Support for custom account labels via YAML configuration.
- Google Sheets Export: Direct export to a Google Spreadsheet.
You can install the extractor directly from PyPI:
pip install credit_mutuel_pdf_extractorOr using uv:
uv tool install credit_mutuel_pdf_extractorOnce installed, you can use the cmut_process_pdf command from anywhere:
cmut_process_pdf data/*.pdf --output results.csv --config config.yamlIf you have the source code and just installed:
To process all PDFs in the data/ directory using the labels defined in config.yaml (outputs to transactions.csv):
just runTo output in JSON format:
just run jsonTo clean up all generated files:
just cleanYou can map account numbers to custom labels by creating a config.yaml file. See config.example.yaml for a template.
account_mapping:
12345671: "Account 1"
12345672: "Account 2"Note
Account numbers are matched as integers (leading zeros are ignored).
You can automatically rename transactions by adding a description_mapping section. If any key is found as a substring (case-insensitive) in the transaction description, it will be replaced by the corresponding label.
description_mapping:
"VIR SEPA FROM": "Transfer"
"NETFLIX": "Entertainment"
"AMAZON": "Shopping"To enable Google Sheets export, add a google_sheets section to your config.yaml:
google_sheets:
spreadsheet_id: "your-spreadsheet-id"
sheet_name: "Transactions"
credentials_file: "credentials.json"Service Account Setup:
- Create a project in Google Cloud Console.
- Enable both Google Sheets API and Google Drive API.
- Create a Service Account (APIs & Services > Credentials > Create Credentials > Service Account).
- Create a JSON Key for that service account and download it.
- Save the key as
credentials.json(or any path specified in yourconfig.yaml). - Permission: Share your Google Spreadsheet with the service account email (found in the JSON) with Editor access. (No broad IAM roles are needed if shared directly).
You can explicitly specify files, the output format, and enable Google Sheets export:
uv run credit-mutuel-extractor data/*.pdf --output results.csv --config config.yamlOr run it directly using uvx:
uvx --from credit-mutuel-pdf-extractor cmut_process_pdf data/*.pdf --output results.csv --config config.yaml --gsheet --include-source-fileRequirements:
- At least one input PDF file.
- The
--outputflag is mandatory and must end in.csvor.json.
- Account Identification: Uses vertical Y-coordinate mapping to associate tables with the correct account number headers.
- Data Normalization: Amounts are cleaned and converted to standard floats.
- Validation: If
Starting Balance + Σ(Transactions) != Ending Balance, the script will report aCRITICALerror and halt execution. - Modular Design: Utility functions are separated into
utils.pyfor maintainability.
This project uses pre-commit and detect-secrets to prevent accidental commits of sensitive data.
Before committing, the hooks will scan for potential secrets.
Publishing is automated via the Justfile and integrated with 1Password for security.
- Store your PyPI Token: Create a "Login" or "Password" item in 1Password.
- Add Environment Variable: Add a field named
UV_PUBLISH_TOKENcontaining your PyPI API token. - Publish:
This uses
just publish
op runto securely inject the token into theuv publishcommand without it ever being stored in plain text or history.
Keep your config.yaml and secrets/credentials.json synced across machines using 1Password:
- Initial Setup: Create the documents in your vault:
just secrets-setup
- Pull Changes: On a new machine:
just secrets-pull
- Push Changes: After updating your local config:
just secrets-push
Note
These commands use the titles "CMut Config" and "CMut Google Credentials". You can change these in the Justfile if needed.