Universe Today Knowledge Graph Experiment

This is an experiment directed toward building an open-standards knowledge graph based on the content at www.universetoday.com. I am not affiliated with Universe Today except as a member of their Patreon channel.

I am using Universe Today content per its Creative Commons Attribution 4.0 International License.

Quick Start

# Install dependencies
pip install -r requirements.txt

# Test the pipeline
python test_pipeline.py

# Classify all articles (saves to article_topics.json)
python classify_all_articles.py

# Explore classification results
python explore_topics.py

# Run full demo (requires gistCore.ttl)
python pipeline_demo.py

Project Structure

Documentation

PIPELINE.md - Full architecture and pipeline documentation
STATUS.md - Current progress and next steps
QUICKSTART.md - Quick start guide
download_articles.md - Article scraper docs
core_entities.md - Core entity reference
gistCore_llm_reference.md - LLM-friendly gist ontology reference

Pipeline Scripts

gist_schema.py - Schema loading and subsetting
topic_classifier.py - Article topic classification (keyword + LLM modes)
classify_all_articles.py - Batch classify entire corpus
explore_topics.py - Analyze and explore classification results
analyze_entities.py - Extract and count entities across corpus
find_people.py - Extract person name mentions from articles
pipeline_demo.py - End-to-end demonstration
test_pipeline.py - Validation tests
download_articles.py - Article scraper (docs)

Data Files

article_topics.json - Topic classification results for all articles
article_topics_with_people.json - Classification results including people mentions
entity_analysis.json - Entity frequency analysis across corpus
people_mentions.json - Person name extraction results

What's Working

✅ Topic classification (keyword + LLM modes) ✅ Schema subsetting from gist ontology ✅ Batch corpus classification (~30K articles) ✅ Entity frequency analysis across corpus ✅ People/person name extraction ✅ Pipeline integration 🚧 Entity extraction engine (structured, per-article) 🚧 Relationship extraction 🚧 Entity resolution 🚧 Review interface

See STATUS.md for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universe Today Knowledge Graph Experiment

Quick Start

Project Structure

Documentation

Pipeline Scripts

Data Files

What's Working

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
articles		articles
claude-handoff		claude-handoff
.gitignore		.gitignore
PIPELINE.md		PIPELINE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
STATUS.md		STATUS.md
analyze_entities.py		analyze_entities.py
article_topics.json		article_topics.json
article_topics_with_people.json		article_topics_with_people.json
classify_all_articles.py		classify_all_articles.py
core_entities.md		core_entities.md
download_articles.md		download_articles.md
download_articles.py		download_articles.py
entity_analysis.json		entity_analysis.json
explore_topics.py		explore_topics.py
find_people.py		find_people.py
gistCore.ttl		gistCore.ttl
gistCore_llm_reference.md		gistCore_llm_reference.md
gistMediaTypes14.0.0.ttl		gistMediaTypes14.0.0.ttl
gist_schema.py		gist_schema.py
people_mentions.json		people_mentions.json
pipeline_demo.py		pipeline_demo.py
requirements.txt		requirements.txt
test_pipeline.py		test_pipeline.py
topic_classifier.py		topic_classifier.py

Folders and files

Latest commit

History

Repository files navigation

Universe Today Knowledge Graph Experiment

Quick Start

Project Structure

Documentation

Pipeline Scripts

Data Files

What's Working

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages