This is an experiment directed toward building an open-standards knowledge graph based on the content at www.universetoday.com. I am not affiliated with Universe Today except as a member of their Patreon channel.
I am using Universe Today content per its Creative Commons Attribution 4.0 International License.
# Install dependencies
pip install -r requirements.txt
# Test the pipeline
python test_pipeline.py
# Classify all articles (saves to article_topics.json)
python classify_all_articles.py
# Explore classification results
python explore_topics.py
# Run full demo (requires gistCore.ttl)
python pipeline_demo.py- PIPELINE.md - Full architecture and pipeline documentation
- STATUS.md - Current progress and next steps
- QUICKSTART.md - Quick start guide
- download_articles.md - Article scraper docs
- core_entities.md - Core entity reference
- gistCore_llm_reference.md - LLM-friendly gist ontology reference
- gist_schema.py - Schema loading and subsetting
- topic_classifier.py - Article topic classification (keyword + LLM modes)
- classify_all_articles.py - Batch classify entire corpus
- explore_topics.py - Analyze and explore classification results
- analyze_entities.py - Extract and count entities across corpus
- find_people.py - Extract person name mentions from articles
- pipeline_demo.py - End-to-end demonstration
- test_pipeline.py - Validation tests
- download_articles.py - Article scraper (docs)
- article_topics.json - Topic classification results for all articles
- article_topics_with_people.json - Classification results including people mentions
- entity_analysis.json - Entity frequency analysis across corpus
- people_mentions.json - Person name extraction results
✅ Topic classification (keyword + LLM modes) ✅ Schema subsetting from gist ontology ✅ Batch corpus classification (~30K articles) ✅ Entity frequency analysis across corpus ✅ People/person name extraction ✅ Pipeline integration 🚧 Entity extraction engine (structured, per-article) 🚧 Relationship extraction 🚧 Entity resolution 🚧 Review interface
See STATUS.md for details.