Skip to content

MusaIslamFahad/newssense-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ฐ NewsSense AI

A full-stack NLP web app that classifies any news headline into 10 categories, detects sentiment and extracts named entities in ~2โ€“3 seconds powered by three HuggingFace transformer models running in parallel, built with Next.js 14 and deployed on Vercel.

NewsSense AI Banner


๐Ÿ”ด Live Demo

โ†’ Try NewsSense AI Live

๐Ÿ–ผ๏ธ Screenshot

NewsSense AI โ€” Homepage

The editorial newspaper-themed UI - paste any headline into the input box and hit Analyse


๐Ÿ“– Overview

NewsSense AI is an intelligent news understanding system that brings together three state-of-the-art NLP models in a single, fast web interface. Submit any news headline or short article and within seconds you'll see:

  • What category it belongs to - with a confidence chart across all 10 classes
  • How it feels - positive, neutral, or negative sentiment with a confidence score
  • Who and what is mentioned - named entities (people, organisations, locations) highlighted inline

The backend runs all three HuggingFace models in parallel via serverless Next.js API routes, keeping total latency at ~2โ€“3 seconds regardless of which models are slowest. The ML research pipeline (notebook/training_pipeline.py) covers the full journey from raw data to a fine-tuned DistilBERT model pushed to the HuggingFace Hub.


โœจ Features

  • ๐Ÿท๏ธ 10-class news classification: zero-shot via facebook/bart-large-mnli, no training data needed
  • ๐Ÿ˜ Sentiment analysis: positive / neutral / negative via cardiffnlp/twitter-roberta-base-sentiment-latest
  • ๐Ÿ” Named Entity Recognition: PER, ORG, LOC, MISC via dslim/bert-base-NER
  • โšก Parallel inference: all 3 models run simultaneously, total latency ~2โ€“3s
  • ๐Ÿ“Š Animated confidence chart: bar chart showing scores across all 10 categories
  • ๐ŸŽจ Newspaper editorial theme: custom CSS with live news ticker in the masthead
  • ๐ŸŒ— Entity highlighting: inline coloured spans with tooltips for each entity type
  • ๐Ÿ›ก๏ธ Rate limiting: 10 requests/min per IP (configurable via RATE_LIMIT_RPM)
  • โ˜๏ธ One-click Vercel deploy: serverless, no GPU required, free HuggingFace token sufficient

๐Ÿค– NLP Models

Task Model Avg. Latency
News classification (10 categories) facebook/bart-large-mnli (zero-shot) ~2s
Sentiment analysis cardiffnlp/twitter-roberta-base-sentiment-latest ~1s
Named Entity Recognition dslim/bert-base-NER ~1s
All 3 combined (parallel) - ~2โ€“3s total

News Categories (10 Classes)

Business ยท Technology ยท Politics ยท Sports ยท Entertainment ยท Health ยท Science ยท World ยท Environment ยท Education


๐Ÿ”ง How It Works

flowchart TD
    A([๐Ÿ‘ค User submits headline via web UI]) --> B
    B[POST /api/analyze\nNext.js Serverless Route] --> C
    C{Promise.all\nAll 3 models fire simultaneously}
    C --> D[๐ŸŸฆ BART-Large-MNLI\nNews Classification]
    C --> E[๐ŸŸฉ RoBERTa\nSentiment Analysis]
    C --> F[๐ŸŸง BERT-NER\nNamed Entity Recognition]
    D --> G
    E --> G
    F --> G
    G[Results aggregated\ninto JSON response] --> H
    H([๐Ÿ–ฅ๏ธ Frontend renders results])
    H --> I[๐Ÿ“Š Confidence Chart\n10-category scores]
    H --> J[๐ŸŽญ Sentiment Badge\nPositive / Neutral / Negative]
    H --> K[๐Ÿ” Entity Highlights\nPER ยท ORG ยท LOC ยท MISC]
Loading

Promise.all() fires all three HuggingFace API calls simultaneously rather than sequentially - cutting total latency from ~4s to ~2โ€“3s, bounded only by the slowest model (BART classification).


๐Ÿ—๏ธ Architecture

Languages: TypeScript 60.8% ยท Python 30.5% ยท CSS 8.1% ยท JavaScript 0.6%

Frontend Components

Component File Purpose
Masthead + Ticker Header.tsx Newspaper-style header with a live scrolling news ticker
Input + State NewsAnalyzer.tsx Main textarea, submit button, loading state orchestration
Results Layout ResultsPanel.tsx Container that arranges all result panels side by side
Confidence Chart ConfidenceChart.tsx Animated bar chart showing scores for all 10 categories
Sentiment Badge SentimentBadge.tsx Colour-coded badge with positive / neutral / negative + confidence
Entity Highlights EntityHighlighter.tsx Inline coloured spans with type tooltips (PER / ORG / LOC / MISC)

Backend

File Purpose
src/app/api/analyze/route.ts Serverless POST handler - calls all 3 HF models in parallel, validates input, applies rate limiting
src/lib/hf-client.ts HuggingFace Inference API client (server-only, never exposed to browser)
src/types/index.ts Shared TypeScript interfaces for request/response shapes

๐Ÿงช API Reference

POST /api/analyze

Request body

{ "text": "Federal Reserve raises interest rates amid inflation fears" }

Successful response

{
  "input": "Federal Reserve raises interest rates amid inflation fears",
  "topCategory": { "label": "Business", "score": 0.87 },
  "classification": [
    { "label": "Business",  "score": 0.87 },
    { "label": "Politics",  "score": 0.09 },
    { "label": "World",     "score": 0.02 }
  ],
  "sentiment": { "label": "negative", "score": 0.74 },
  "entities": [
    {
      "word": "Federal Reserve",
      "entity_group": "ORG",
      "score": 0.99,
      "start": 0,
      "end": 15
    }
  ],
  "processingTime": 2341
}

Error responses

Status Cause
400 Text too short (< 10 chars), too long (> 2000 chars), or invalid JSON
429 Rate limited - 10 requests/min per IP
502 HuggingFace model cold-starting - retry after 20โ€“30 seconds

๐Ÿ”ฌ ML Training Pipeline

notebook/training_pipeline.py is a full research pipeline designed to run on Kaggle (GPU) or Google Colab:

Step What It Does
1. Data loading 90K+ news headlines, null checks, class distribution validation
2. Text cleaning HTML stripping, URL removal, lowercasing, whitespace normalisation
3. EDA Class distribution, word count stats, per-category word clouds
4. Feature engineering TF-IDF (15K features, bigrams, sublinear TF)
5. Classical ML Logistic Regression + Linear SVM with 5-fold stratified CV
6. Model comparison Per-class F1 bar charts, confusion matrices
7. Error analysis Misclassified examples sorted by confidence
8. DistilBERT fine-tuning fp16, early stopping, load_best_model_at_end
9. Explainability LIME with batched predict_proba_fn
10. Hub push Deploy model artifact for production inference

Training Results

Model Macro F1 Notes
Logistic Regression ~0.91 5-fold CV, balanced class weights
Linear SVM ~0.90 Fast, competitive baseline
DistilBERT (fine-tuned) ~0.95โ€“0.97 4 epochs, early stopping

The production app uses facebook/bart-large-mnli zero-shot (no training required), making it generalisable to new categories without re-training.


๐Ÿ› ๏ธ Tech Stack

Layer Technology
Frontend Next.js 14, TypeScript, Tailwind CSS
Backend Next.js API Routes (serverless), Node.js
NLP HuggingFace Inference API (BART, RoBERTa, BERT-NER)
ML Research Python, HuggingFace Transformers, scikit-learn, LIME
Deployment Vercel (frontend + serverless API)
Styling Tailwind CSS, custom newspaper editorial CSS

โš™๏ธ Requirements

  • Node.js 18+
  • A free HuggingFace API token (get one here) - read access is sufficient, no GPU needed

๐Ÿš€ Getting Started (Local)

1. Clone the repository

git clone https://github.com/MusaIslamFahad/NewsSense-AI.git
cd NewsSense-AI

2. Install dependencies

npm install

3. Configure your HuggingFace token

cp .env.example .env.local

Open .env.local and add your token:

HF_TOKEN=your_huggingface_token_here

โš ๏ธ Security: .env.local is gitignored - never commit your token. The hf-client.ts module is marked server-only so the token is never exposed to the browser.

4. Start the development server

npm run dev

Open http://localhost:3000


โ˜๏ธ Deploy to Vercel

One-click deploy

Deploy with Vercel

Manual deploy

npm i -g vercel
vercel

Add the required environment variable in your Vercel dashboard:

Key Value
HF_TOKEN Your HuggingFace API token

๐Ÿ“‚ Project Structure

newssense-ai/
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ app/
โ”‚   โ”‚   โ”œโ”€โ”€ page.tsx                  # Home page
โ”‚   โ”‚   โ”œโ”€โ”€ layout.tsx                # Root layout + metadata
โ”‚   โ”‚   โ”œโ”€โ”€ globals.css               # Editorial newspaper theme
โ”‚   โ”‚   โ””โ”€โ”€ api/analyze/
โ”‚   โ”‚       โ””โ”€โ”€ route.ts              # POST /api/analyze โ€” parallel HF inference
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ components/
โ”‚   โ”‚   โ”œโ”€โ”€ Header.tsx                # Newspaper masthead + live news ticker
โ”‚   โ”‚   โ”œโ”€โ”€ NewsAnalyzer.tsx          # Main input + state orchestration
โ”‚   โ”‚   โ”œโ”€โ”€ ResultsPanel.tsx          # Results layout container
โ”‚   โ”‚   โ”œโ”€โ”€ ConfidenceChart.tsx       # Animated confidence bar chart (10 categories)
โ”‚   โ”‚   โ”œโ”€โ”€ SentimentBadge.tsx        # Sentiment indicator with confidence score
โ”‚   โ”‚   โ””โ”€โ”€ EntityHighlighter.tsx     # Inline entity spans with type tooltips
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ lib/
โ”‚   โ”‚   โ””โ”€โ”€ hf-client.ts              # HuggingFace Inference API (server-only)
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ types/
โ”‚       โ””โ”€โ”€ index.ts                  # Shared TypeScript interfaces
โ”‚
โ”œโ”€โ”€ notebook/
โ”‚   โ””โ”€โ”€ training_pipeline.py          # Full ML pipeline (Kaggle/Colab, GPU)
โ”‚
โ”œโ”€โ”€ .env.example                      # Template โ€” copy to .env.local
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ next.config.mjs
โ”œโ”€โ”€ tailwind.config.ts
โ”œโ”€โ”€ tsconfig.json
โ”œโ”€โ”€ package.json
โ””โ”€โ”€ README.md

โš ๏ธ Known Limitations

  • Cold start: HuggingFace free-tier models hibernate after inactivity. The first request after a period of no use may take 20-30 seconds while the models warm up
  • Rate limit: 10 requests/min per IP (configurable via RATE_LIMIT_RPM in the API route)
  • Text length: optimised for headlines and short paragraphs (10โ€“2000 characters)
  • Zero-shot tradeoff: BART zero-shot classification is highly flexible but slightly less accurate than a fine-tuned model on a fixed category set

๐Ÿ“š What You'll Learn

This project is a strong portfolio reference for:

  • Parallel serverless inference: Promise.all() across multiple HuggingFace models in a Next.js API route
  • Full-stack NLP integration: connecting a TypeScript frontend to transformer model APIs without a dedicated Python backend
  • Zero-shot classification: using BART-MNLI to classify into categories without any labelled training data
  • End-to-end ML pipeline: from raw data cleaning to DistilBERT fine-tuning to HuggingFace Hub deployment
  • LIME explainability: understanding which words drive model predictions
  • Serverless deployment: shipping an NLP product on Vercel with no GPU or server management

๐Ÿ”ฎ Future Enhancements

  • ๐Ÿ“ฐ Live news feed: integrate a news API (NewsAPI, GNews) for real-time headline analysis
  • ๐ŸŒ Multi-language support: add language detection and multilingual NER models
  • ๐Ÿ“Š Batch analysis: accept multiple headlines at once and return a summary dashboard
  • ๐Ÿ’พ History & bookmarks: save past analyses locally with the Web Storage API
  • ๐Ÿค– Custom fine-tuned classifier: replace zero-shot BART with the DistilBERT model trained in the pipeline for higher accuracy
  • ๐Ÿ“ฑ Mobile PWA: add a web app manifest for installable mobile experience

๐Ÿค Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'Add your feature')
  4. Push to the branch (git push origin feature/your-feature)
  5. Open a Pull Request

๐Ÿ“„ License

MIT - use freely, attribution appreciated. See LICENSE for details.


๐Ÿ™ Acknowledgements

  • HuggingFace - for the Inference API and the transformer models
  • Vercel - for serverless deployment and edge functions
  • Next.js - for the full-stack React framework

๐Ÿ‘จโ€๐Ÿ’ป Author

Musa Islam Fahad


โญ If you found this useful or built on top of it, a star goes a long way. Thank you!

Releases

No releases published

Packages

 
 
 

Contributors