A CLI for analyzing and exporting ChatGPT conversations.json files.
The project focuses on two things:
- analyzing the structure of a ChatGPT export without loading the whole file into memory
- exporting conversations to text or JSON with structural field filtering and metadata filtering
It uses streaming JSON parsing with ijson and is organized around small core modules so filtering, formatting, split behavior, and path generation can be changed independently.
Persistent defaults can be stored in a single TOML config file. The repo ships a template at chatgpt_export.toml.example.
This project currently targets Python 3.10+.
git clone https://github.com/voidfreud/chatgpt-export-tool.git
cd chatgpt-export-tool
uv syncFor development tooling too:
uv sync --group devYou can then run the CLI with:
uv run chatgpt-export --helpTo apply config defaults, copy the template and pass --config PATH.
Analyze an export:
uv run chatgpt-export analyze path/to/conversations.jsonInclude field coverage:
uv run chatgpt-export analyze path/to/conversations.json --fieldsExport everything as text to stdout:
uv run chatgpt-export export path/to/conversations.jsonExport everything as JSON to one file:
uv run chatgpt-export export path/to/conversations.json --format json --output conversations.jsonExport one file per conversation:
uv run chatgpt-export export path/to/conversations.json --split subject --output-dir exportsanalyze reports high-level structure and statistics for a conversations.json file.
It includes:
- conversation count
- message count
- file size
- date range
- optional field coverage with
--fields
Examples:
uv run chatgpt-export analyze data.json
uv run chatgpt-export analyze data.json --fields
uv run chatgpt-export analyze data.json --verbose --output analysis.txt
uv run chatgpt-export analyze data.json --debugexport writes conversations in either text or JSON format.
It supports:
- structural field filtering through
--fields - metadata filtering through
--includeand--exclude - transcript-oriented text export that follows the active branch
- split modes for one output, one file per conversation, date folders, or ID-based files
Examples:
uv run chatgpt-export export data.json
uv run chatgpt-export export data.json --output conversations.txt
uv run chatgpt-export export data.json --format json --output conversations.json
uv run chatgpt-export export data.json --split subject --output-dir exports
uv run chatgpt-export export data.json --fields "groups minimal" --split subject --output-dir exports
uv run chatgpt-export export data.json --fields "include title,mapping" --include "model*" --exclude plugin_ids
cp chatgpt_export.toml.example chatgpt_export.toml
uv run chatgpt-export export data.json --config chatgpt_export.tomlThe --fields option controls which structural fields are retained before formatting.
Supported forms:
allnoneinclude field1,field2exclude field1,field2groups group1,group2
Examples:
uv run chatgpt-export export data.json --fields all
uv run chatgpt-export export data.json --fields none
uv run chatgpt-export export data.json --fields "include title,create_time,mapping"
uv run chatgpt-export export data.json --fields "exclude moderation_results,plugin_ids"
uv run chatgpt-export export data.json --fields "groups minimal"Available field groups:
conversationmessagemetadataminimal
See Fields.md for the current field-selection reference.
The metadata filter runs after structural field filtering and applies only to keys inside nested message.metadata dictionaries.
Examples:
uv run chatgpt-export export data.json --include model_slug
uv run chatgpt-export export data.json --include "model*" --exclude plugin_ids
uv run chatgpt-export export data.json --fields "groups message" --include is_archivedCurrently supported metadata names include:
model_slugmessage_typeplugin_idsis_archived
export supports four split modes:
single: one combined output stream or one output filesubject: one file per conversation, named from title plus identifierdate: date folders with one file per conversationid: one file per conversation, named from conversation ID
Important output behavior:
--split singlewith no--outputwrites to stdout--split single --output FILEwrites one file- split modes like
subject,date, andidwrite into--output-dir
Supported formats:
txtjson
txt is a transcript-oriented export that follows the active branch of the conversation tree.
json writes the filtered conversation objects directly.
By default, text export includes user text, assistant text, assistant thoughts, and user editable context when present. User editable context is rendered in a compact preview by default so transcripts stay readable. Text export hides tool plumbing, assistant code, reasoning recap, and blank/internal nodes unless the transcript policy is changed in config.
Text output defaults now favor reading clarity:
- conversation context is rendered as a separate preamble block
- visible turns are grouped into clearer chat-style
User/Assistantsections - turn counts can be shown in the header
- ChatGPT citation/navigation artifacts can be stripped from text output
- long paragraphs can be wrapped for easier reading
Important transcript policy options include:
user_editable_context_modeshow_visually_hidden_content_typesinclude_content_typesexclude_content_types
export accepts --config PATH and resolves defaults from one TOML file.
The repo ships chatgpt_export.toml.example as a template. Copy it to a local file such as chatgpt_export.toml and pass that path explicitly.
The config file is TOML and is intentionally kept to one file with sections such as:
[defaults]for format, split mode, field selection, and output directory[transcript]for active-branch reconstruction and visibility rules[text_output]for header fields, transcript layout, and date/time formats
Notable [text_output] options include:
layout_mode = "reading" | "compact"heading_style = "plain" | "markdown"include_turn_count_in_header = true | falseinclude_turn_numbers = true | falseturn_separator = "---"strip_chatgpt_artifacts = true | falsewrap_width = 88
Practical transcript presets:
Reading-first transcript:
[text_output]
layout_mode = "reading"
heading_style = "plain"
include_turn_count_in_header = true
turn_separator = "---"
strip_chatgpt_artifacts = true
wrap_width = 88Compact scanning transcript:
[text_output]
layout_mode = "compact"
include_turn_count_in_header = false
turn_separator = ""
wrap_width = 0Markdown/notes transcript:
[text_output]
layout_mode = "reading"
heading_style = "markdown"
turn_separator = "---"CLI arguments override TOML values. analyze does not currently use export config defaults.
The structure is intentionally modular at the subsystem level:
- command wiring and user-facing behavior live in
chatgpt_export_tool/commands/ - streaming parse and analysis are separate from export formatting and writing
- structural field filtering and metadata filtering are separate concerns
- split-key resolution, filename policy, and writing are isolated from export orchestration
The core package is also grouped into shallow subpackages by concern:
core/config/for runtime config models, loading, and validationcore/transcript/for branch reconstruction and transcript extractioncore/validation/for field and metadata validationcore/output/for formatting, naming, path resolution, and writing
That separation is deliberate: most behavior changes can be made in one small subsystem instead of in one large control file.
Run the checks used during refactoring:
uv run pytest
uv run pytest --cov=chatgpt_export_tool --cov-report=term-missing
uv run ruff check chatgpt_export_tool tests pyproject.toml
uv run ruff format --check chatgpt_export_tool testsIf you need to format files:
uv run ruff format chatgpt_export_tool tests- Input handling is streaming, so large exports do not need to be loaded into memory just to analyze or iterate conversations.
- Single-file JSON export writes one valid JSON document.
- Split exports write one conversation per output file.
- Text export follows the active thread path using
current_nodeandparentlinks. - The field-selection and metadata-selection surface is documented in Fields.md.