Geopolitical, Global Macro & Financial News Sentiment Analysis
This project implements a macro-focused sentiment analysis system designed to ingest global financial and geopolitical news, classify dominant market narratives, and quantify sentiment trends over time.
The system is intended to support macro analysis, market monitoring, and research workflows, enabling users to observe how narratives such as inflation, monetary policy, geopolitics, and risk sentiment evolve and potentially become reflected in market expectations.
The project was built as part of a Data Bootcamp, while deliberately using modern, industry-relevant tools and patterns so that it can function as a portfolio-grade project rather than a purely academic exercise.
- Ingest financial and geopolitical news from multiple sources
- Clean and normalise unstructured text data
- Apply NLP techniques to score sentiment
- Classify news into interpretable macro narratives
- Aggregate sentiment by source and narrative over time
- Produce structured outputs suitable for dashboards and decision support
- Maintain a modular architecture that supports future expansion
News Feeds (RSS / APIs) ↓ Ingestion Layer ↓ Raw Articles (Database) ↓ Cleaning & Normalisation ↓ Sentiment Scoring (NLP) ↓ Narrative Classification ↓ Daily Aggregations ↓ Dashboards / Visualisations
- RSS feeds from major financial and global news providers
(e.g. Bloomberg Markets, BBC World; availability may vary depending on network restrictions)
These sources are used for the MVP to avoid licensing and credential dependencies while validating the full analytical pipeline.
The system is architected to support authenticated or subscription-based news feeds for users who have valid access rights. This includes, but is not limited to:
- Enterprise financial news APIs
- Authenticated RSS feeds
- Licensed data platforms providing structured news content
Integration of these feeds is not enabled by default in the MVP, but the ingestion layer is deliberately modular so that premium sources can be added without refactoring the core pipeline.
This design allows the project to scale from an educational MVP to an institutional-style research tool.
- Modular ingestion pipeline
- Support for multiple feed endpoints
- Deduplication by article link
- Preservation of source and publication timestamp
- HTML stripping and text normalisation
- Consistent datetime handling
- Separation of raw and cleaned data layers
- Baseline NLP sentiment scoring using a transparent, interpretable model
- Compound sentiment scores per article
- Model version recorded alongside outputs
Articles are classified into high-level macro narratives using a rule-based taxonomy prioritising interpretability:
- Inflation
- Monetary policy / interest rates
- Growth & recession
- Geopolitics
- Energy & commodities
- Risk sentiment
This approach enables clear reasoning about why an article is associated with a narrative, which is essential in financial analysis contexts.
- Daily sentiment by source
- Daily sentiment by narrative
- Article counts and averages suitable for dashboards and reporting
- Python 3.12
- SQLAlchemy (database modelling and access)
- SQLite (MVP persistence layer)
- pandas / numpy (data processing)
- feedparser / BeautifulSoup (ingestion and cleaning)
- VADER Sentiment (baseline NLP sentiment analysis)
- GitHub Codespaces (cloud development environment)
- Streamlit (interactive demo application)
- Power BI (external dashboarding and visual analytics)
macro-sentiment-engine/ ├── app/ # Streamlit demo app ├── db/ │ ├── models.py # Database schema │ └── init_db.py # Database initialisation ├── pipeline/ │ ├── ingest_rss.py # News ingestion │ ├── clean.py # Text cleaning │ ├── score_sentiment.py # NLP sentiment scoring │ ├── narratives.py # Narrative taxonomy │ ├── tag_narratives.py # Narrative tagging │ ├── aggregate.py # Aggregation by source │ └── run_pipeline.py # End-to-end pipeline runner ├── data_out/ │ ├── sentiment.db │ ├── daily_sentiment_by_source.csv │ └── daily_sentiment_by_narrative.csv ├── requirements.txt └── README.md
python -m pip install -r requirements.txt
python -m db.init_db
python pipeline/run_pipeline.py
- SQLite database containing raw articles, cleaned text, sentiment scores, and narrative tags
- CSV files for dashboarding:
- Daily sentiment by source
- Daily sentiment by narrative
- Interactive Streamlit dashboard (optional deployment)
- RSS feed availability may vary depending on network restrictions
- Rule-based narrative tagging may not capture nuanced or implicit themes
- Baseline sentiment models may struggle with sarcasm or complex financial language
- The system does not attempt to infer causality between sentiment and market prices
- Integration of licensed, subscription-based news feeds via authenticated APIs
- Replacement of baseline sentiment models with finance-specific transformer models
- Event-based sentiment and “surprise” scoring
- Integration of market price data for sentiment–price divergence analysis
- Source credibility weighting and time-decay modelling
- Deployment as a hosted API or research tool