An integration package connecting Google Classroom and LangChain.
Load courses, assignments, announcements, materials, student submissions, rubrics,
topics, rosters, and file attachments as structured LangChain Document objects
— ready for RAG pipelines, AI teaching assistants, and educational analytics.
pip install langchain-google-classroomWith optional parsers for PDF and DOCX attachments:
pip install "langchain-google-classroom[parsers]"from langchain_google_classroom import GoogleClassroomLoader
loader = GoogleClassroomLoader(
course_ids=["123456789"],
)
docs = loader.load()
for doc in docs:
print(f"[{doc.metadata['content_type']}] {doc.metadata.get('title', '')}")| Content Type | Flag | Default |
|---|---|---|
| Assignments (courseWork) | load_assignments |
True |
| Announcements | load_announcements |
True |
| Course materials | load_materials |
True |
| Drive attachments (PDF, DOCX, CSV, text, images) | load_attachments |
True |
| YouTube and link attachments | load_attachments |
True |
| Student submissions | load_submissions |
False |
| Rubrics (per courseWork) | Always fetched with assignments | — |
| Topics | load_topics |
False |
| Student and teacher roster | load_roster |
False |
loader = GoogleClassroomLoader(
client_secrets_file="credentials.json",
token_file="token.json",
)loader = GoogleClassroomLoader(
service_account_file="service_account.json",
)from google.oauth2 import service_account
creds = service_account.Credentials.from_service_account_file(
"service_account.json",
scopes=["https://www.googleapis.com/auth/classroom.courses.readonly"],
)
loader = GoogleClassroomLoader(credentials=creds)Note: Never commit credential files to version control. Use environment variables or secret managers in production.
Drive files attached to classroom items are automatically downloaded and parsed:
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_attachments=True,
parse_attachments=True,
)Google-native files are exported to parseable formats:
| Google Format | Exported As |
|---|---|
| Google Docs | DOCX |
| Google Slides | |
| Google Sheets | CSV |
Override the built-in parser registry with any BaseBlobParser subclass:
from langchain_community.document_loaders.parsers.pdf import PyMuPDFParser
loader = GoogleClassroomLoader(
course_ids=["123456789"],
file_parser_cls=PyMuPDFParser,
)Skip oversized attachments to prevent memory issues:
loader = GoogleClassroomLoader(
max_file_size=10_000_000, # 10 MB (default: 50 MB)
)Extract and describe images embedded in PDFs and image attachments using any vision-capable chat model:
from langchain_google_genai import ChatGoogleGenerativeAI
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_attachments=True,
vision_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
)
docs = loader.load()
# PDF pages include: "[Image: chart.png]\nA bar chart showing student grades..."Load submission state, grades, and student responses:
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_submissions=True,
)Each submission includes state, late, assigned_grade, draft_grade,
short-answer text, multiple-choice answers, and attachment file names.
loader = GoogleClassroomLoader(
course_ids=["123456789"],
load_topics=True,
load_roster=True,
)Required OAuth scopes are automatically added when these features are enabled.
import asyncio
from langchain_google_classroom import GoogleClassroomLoader
async def main():
loader = GoogleClassroomLoader(course_ids=["123456789"])
docs = []
async for doc in loader.alazy_load():
docs.append(doc)
return docs
docs = asyncio.run(main())Every document includes structured metadata for filtering and retrieval:
Document(
page_content="Assignment: Homework 3\n\nComplete exercises 1-5...",
metadata={
"source": "google_classroom",
"course_id": "12345",
"course_name": "Machine Learning",
"content_type": "assignment",
"title": "Homework 3",
"item_id": "67890",
"created_time": "2024-01-15T10:00:00Z",
"updated_time": "2024-01-15T10:00:00Z",
"due_date": "2024-01-22T23:59:00",
"max_points": 100.0,
"alternate_link": "https://classroom.google.com/...",
},
)Content types: assignment, announcement, material, submission, rubric,
topic, student, teacher, assignment_attachment, assignment_youtube,
assignment_link.
| Parameter | Type | Default | Description |
|---|---|---|---|
course_ids |
list[str] |
None |
Specific course IDs to load (None = all accessible) |
load_assignments |
bool |
True |
Load courseWork items |
load_announcements |
bool |
True |
Load announcements |
load_materials |
bool |
True |
Load courseWorkMaterials |
load_attachments |
bool |
True |
Download and parse Drive file attachments |
parse_attachments |
bool |
True |
Parse files via BaseBlobParser (otherwise raw UTF-8) |
load_images |
bool |
False |
Process image MIME types |
load_submissions |
bool |
False |
Load student submissions |
load_topics |
bool |
False |
Load course topics |
load_roster |
bool |
False |
Load student and teacher profiles |
max_file_size |
int |
50000000 |
Max attachment size in bytes (0 = no limit) |
vision_model |
BaseChatModel |
None |
Vision LLM for image understanding |
image_prompt |
str |
None |
Custom prompt for the vision model |
file_parser_cls |
type |
None |
Custom BaseBlobParser subclass for all attachments |
file_parser_kwargs |
dict |
{} |
Keyword arguments forwarded to file_parser_cls |
credentials |
Credentials |
None |
Pre-built Google credentials object |
service_account_file |
str |
None |
Path to service-account key JSON |
token_file |
str |
None |
Path to cached OAuth token JSON |
client_secrets_file |
str |
None |
Path to OAuth client-secrets JSON |
scopes |
list[str] |
Read-only | API scopes (auto-extended when opt-in features are enabled) |
GoogleClassroomLoader (BaseLoader + BaseModel)
├── _utilities.py — auth, retry/backoff, guard_import
├── _vision.py — shared vision LLM helper
├── classroom_api.py — paginated Classroom API fetcher
├── document_builder.py — raw API responses → LangChain Document
├── drive_resolver.py — Drive download/export + link extraction
├── normalizer.py — text cleanup (Unicode NFC, whitespace)
└── parsers/
├── __init__.py — MIME registry + get_parser()
├── pdf_parser.py — pypdf + vision LLM
├── docx_parser.py — python-docx + vision LLM
├── csv_parser.py — header-aware CSV (Google Sheets exports)
├── text_parser.py — built-in UTF-8
└── image_parser.py — vision LLM + base64 fallback
git clone https://github.com/ayanokojix21/langchain-google-classroom.git
cd langchain-google-classroom/libs/google-classroom
pip install -e ".[dev]"
# Tests
pytest tests/unit_tests/ -v --disable-socket --allow-unix-socket
# Lint and format
ruff check .
ruff format .MIT — see LICENSE for details.