Refresh 'Import' documentation #114

plaharanne · 2026-01-08T16:05:12Z

What's Inside

The import flow has been revised and therefore the documentation page is not anymore relevant as it is.
The new version of the page has new screenshots and the text fits what the user see on the screen and the available features and settings.

Preview

See https://crate-cloud--114.org.readthedocs.build/en/114/

Highlights

Checklist

Link to issue this PR refers to (if applicable): https://github.com/crate/cloud/issues/2704
CLA is signed

Summary by CodeRabbit

Documentation
- Rewrote import docs into a streamlined, file-centric flow covering local files, URLs, S3, Azure, and MongoDB sources.
- Added a sample-data note and simplified import steps (select format → source → target table).
- Updated visuals and terminology to file-based imports; clarified S3/Azure permission notes and multi-file wildcard (globbing) usage.
- Expanded “Allow schema evolution” guidance with type-mismatch examples and a new File Format Limitations section.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-08T16:05:29Z

Walkthrough

Rewrote import documentation to center on file-based imports (local file, URL, AWS S3, Azure, MongoDB) with a unified "File Import" flow, updated images and form terminology, clarified S3/Azure/glob guidance, renamed and clarified schema-evolution behavior, and added File Format Limitations and sample-data guidance.

Changes

Cohort / File(s)	Change Summary
Primary doc `docs/cluster/import.md`	Full rewrite from URL/history-centric to file-centric import guide: new "File Import" flow, prompts (format, source, target table), sample-data note, updated images and filenames, removed per-source URL examples, consolidated integration references.
S3 & Globbing `docs/cluster/import.md` (S3 sections)	Renamed sections to file-import context, clarified AWS permissions and wildcard (glob) usage for multi-file imports, removed some example specifics while keeping wildcard examples.
Azure guidance `docs/cluster/import.md` (Azure sections)	Renamed to file-import-azure context, retained secret-based access guidance and path/size notes; wording aligned with file-centric flow.
Schema & Formats `docs/cluster/import.md` (schema/formats)	"Schema evolution" renamed to "Allow schema evolution" with behavior clarification and type-mismatch example; added File Format Limitations for CSV, JSON (Documents/Arrays/JSON-Lines), and Parquet.
Integrations & Cross-references `docs/cluster/import.md`	Replaced previous Integration/File fragments with cross-references to cluster-integrations and removed redundant per-source integration examples.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Import: Copy editing #111: Overlapping edits to docs/cluster/import.md around File/URL/S3/Azure sections and globbing guidance.
Import: Use less "Import" #113: Related restructuring and renaming within the same import documentation file.

Poem

🐰 I hopped through pages, nibbling through text,
From URL vines to file-based next,
S3 leaves and Azure dew,
Schema grows if you say it's true,
A happy hop for docs anew ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Refresh 'Import' documentation' directly describes the main change—updating the Import documentation page to align with revised product features.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	The PR description follows the required template structure with all key sections present: What's Inside, Preview, Highlights, and Checklist with both items completed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch p/import

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/cluster/import.md (1)
111-123: Fix double space in Schema Evolution section.

Line 115 contains a formatting inconsistency with double space before "evolution" in "Allow schema evolution".
✏️ Proposed fix
  import process. It can be toggled via the 'Allow schema  evolution' checkbox
+ import process. It can be toggled via the 'Allow schema evolution' checkbox

🤖 Fix all issues with AI agents

In @docs/cluster/import.md:
- Around line 11-15: Update the typo in the import formats list by replacing the
incorrect word "Paquet" with "Parquet" in the bullet list (the line that
currently reads "Paquet"); ensure the list remains: CSV, JSON (JSON-Lines, JSON
Arrays and JSON Documents), Parquet, MongoDB collection.
- Around line 52-58: Update the sentence in the S3 import documentation to
correct the typo "file form bucket" to "file from bucket" (in the paragraph
describing CrateDB Cloud imports in docs/cluster/import.md) so the sentence
reads "To import a file from a bucket, provide the name of your bucket, and path
to the file."; ensure only the typo is changed and punctuation remains
consistent.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54a7928 and 94a841d.

⛔ Files ignored due to path filters (9)

docs/_assets/img/cluster-import-file-form.png is excluded by !**/*.png
docs/_assets/img/cluster-import-globbing.png is excluded by !**/*.png
docs/_assets/img/cluster-import-tab-azure.png is excluded by !**/*.png
docs/_assets/img/cluster-import-tab-choice.png is excluded by !**/*.png
docs/_assets/img/cluster-import-tab-file.png is excluded by !**/*.png
docs/_assets/img/cluster-import-tab-s3.png is excluded by !**/*.png
docs/_assets/img/cluster-import-tab-url.png is excluded by !**/*.png
docs/_assets/img/cluster-import-tab.png is excluded by !**/*.png
docs/_assets/img/cluster-import.png is excluded by !**/*.png

📒 Files selected for processing (1)

docs/cluster/import.md

🧰 Additional context used

🪛 LanguageTool

docs/cluster/import.md

[grammar] ~13-~13: Ensure spelling is correct
Context: ...N-Lines, JSON Arrays and JSON Documents) - Paquet - MongoDB collection :::{note} If you don...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~53-~53: Ensure spelling is correct
Context: ...S3-compatible storage. To import a file form bucket, provide the name of your bucket...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (4)

docs/cluster/import.md (4)

30-47: File Import section looks good.

Clear and concise explanation of the import process with appropriate image reference. The schema evolution toggle description is accurate and helpful.

86-100: Azure subsection is well-structured.

Clear documentation of Azure Blob Storage imports with appropriate guidance on secrets and file path configuration. Wildcard notation and file size limits are clearly documented.

102-105: Integration section reference is appropriate.

Brief section appropriately directs users to more comprehensive integration documentation. Cross-reference syntax is correct.

126-165: File Format Limitations section is well-documented.

Comprehensive coverage of format-specific limitations with clear code examples for JSON variants. Documentation effectively demonstrates expected input formats.

docs/cluster/import.md

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @docs/cluster/import.md:
- Line 115: Fix the double space in the checkbox label by replacing "Allow
schema  evolution" with "Allow schema evolution" and make capitalization
consistent across the doc (e.g., use "Schema Evolution" or "schema evolution"
everywhere); update the string used in the text to the chosen capitalization so
the checkbox reference matches other occurrences.

🧹 Nitpick comments (2)

docs/cluster/import.md (2)

11-15: Consolidate duplicate data format listings.

The supported data formats are listed twice: once in the introduction (lines 12-14) and again in the File Import section (lines 41-43). Consider removing the duplicate in the File Import section or consolidating into a single reference, as this may confuse users about whether there are different format constraints for different import methods.

Also applies to: 39-45

126-165: File Format Limitations section is well-documented for JSON and CSV.

The examples are clear and helpful. However, Parquet format is listed as supported but has no details or examples, unlike CSV and JSON. Consider adding Parquet documentation for completeness.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94a841d and a256a0b.

📒 Files selected for processing (1)

docs/cluster/import.md

🔇 Additional comments (2)

docs/cluster/import.md (2)
30-47: File Import section is clear and well-structured.

The simplified flow with file format, source, and table selection is straightforward. The schema evolution explanation is helpful context at this level.

100-100: Add missing period.

Line 100 ends without a period.
Fix missing punctuation
-File size limitation for imports is 10 GiB per file.
+File size limitation for imports is 10 GiB per file.
Likely an incorrect or invalid review comment.

docs/cluster/import.md

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/cluster/import.md (1)

126-165: File Format Limitations section is incomplete; missing Parquet and MongoDB documentation.

The section documents CSV and JSON formats well with helpful examples, but is missing documentation for two formats that are prominently promoted earlier:

Parquet is mentioned in the introduction (Line 14) and File Import section (Line 43) but has no entry in File Format Limitations.

MongoDB collection is mentioned in the introduction (Line 15) but has no entry in File Format Limitations.

Additionally, the JSON code examples use :::{code} console language tags (Lines 138, 147, 162), which may be semantically incorrect; these should likely use :::{code} json for proper syntax highlighting.

Please add documentation for Parquet and MongoDB collection formats, explaining:

Parquet: any schema/type handling specifics, nested structure behavior

MongoDB collection: connection requirements, field mapping behavior

Also review the code block language tags for semantic accuracy.

🤖 Fix all issues with AI agents

In @docs/cluster/import.md:
- Around line 4-9: Add three new subsections under the "File Import" area
mirroring the scope of the existing "S3" and "Azure" sections: "Local file",
"URL", and "MongoDB". For "Local file" document the upload process (steps to
upload via UI/CLI/API), accepted file formats, size limits, and any
preprocessing or format requirements; for "URL" document supported URL schemes
(http/https), authentication options (basic, bearer, signed URLs), timeout/retry
behavior, and how to reference the URL in import commands; for "MongoDB"
document connection string format, required drivers/versions, auth mechanisms,
how to select a database/collection, field mapping/transform examples, and any
batching/throughput limits. Follow the same tone/structure and examples used in
the "S3 Import" and "Azure Import" sections so the new subsections are
consistent and include sample commands, configuration keys, and known
limitations.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a256a0b and 70e7909.

📒 Files selected for processing (1)

docs/cluster/import.md

🔇 Additional comments (6)

docs/cluster/import.md (6)

4-21: Verify import history UI reference and image asset.

Line 23-24 references a "Import history" tab, but this feature description appears after the introduction and before explaining how to perform imports. For new users, the logical flow would benefit from explaining the basic import process first before mentioning historical references. Additionally, verify that the referenced image at Line 28 (cluster-import.png) exists and correctly depicts the current UI.

30-48: Verify image asset and source documentation completeness.

The File Import section provides a clear overview of the unified workflow. However, verify that:

The image cluster-import-file-form.png (Line 47) exists and correctly depicts the current file import form.

All five import sources mentioned in the introduction (local file, URL, AWS S3, Azure, MongoDB) are documented in dedicated sections. Currently, only S3 (Lines 49-85) and Azure (Lines 86-101) have subsections; local file, URL, and MongoDB guidance is missing.

49-85: S3 guidance is comprehensive.

The AWS S3 section provides clear instructions including bucket/path requirements, authentication, wildcard support for multi-file imports, and relevant IAM policy examples. The 10 GiB file size limit and egress cost warning are appropriately documented.

Please verify that the IAM policy example (Lines 68-83) reflects current AWS S3 best practices and that no additional S3-specific permissions (e.g., s3:ListBucket for prefix matching) are required for wildcard imports to function correctly.

86-101: Azure guidance is clear and consistent with S3 structure.

The Azure section appropriately documents secret-based authentication, path format, wildcard support, and file size limits. The mention of admin-level secret management is important operational guidance.

102-106: Clarify the Integration section purpose.

The Integration section defers entirely to another documentation page via cross-reference. This is acceptable if comprehensive data integration guidance exists elsewhere, but the section feels incomplete for users reading the import documentation. Consider adding 1-2 sentences explaining what integrations are (e.g., "Integrations allow connecting external data sources for continuous sync") before the cross-reference to provide better context.

Also, verify that the reference {ref}cluster-integrations is correct and that the target page exists and is appropriately maintained.

108-124: Schema evolution section is well-documented with good examples.

The explanation of schema evolution behavior is clear, limitations are explicit, and the type-mismatch example effectively illustrates edge cases. The toggle naming is consistent with the File Import section (Line 36).

Confirm that the described schema evolution behavior (automatic column addition only, type mismatch failures) matches the current product implementation. Also verify whether there are additional limitations (e.g., constraints on column types, handling of nested JSON structures) that should be documented.

coderabbitai · 2026-01-08T16:23:35Z

docs/cluster/import.md

+You can import data into your CrateDB directly from sources like:
+- local file
+- URL
+- AWS S3 bucket
+- Azure storage
+- MongoDB


⚠️ Potential issue | 🟠 Major

Missing documentation for three import sources: local file, URL, and MongoDB.

The introduction lists five import sources (local file, URL, AWS S3, Azure, MongoDB), and the File Import section implies all are equally accessible. However, only S3 (Lines 49-85) and Azure (Lines 86-101) have dedicated documentation sections. Local file, URL, and MongoDB sources lack any guidance.

For users trying to use these sources, there is no information about:

Local file: upload process, file size limits, format requirements

URL: authentication (if needed), supported URL schemes, timeout behavior

MongoDB: connection string format, collection selection, authentication, field mapping

Please add subsections (similar in scope to the S3 and Azure sections) for each missing source, documenting their specific requirements, limitations, and any relevant configuration details.

Also applies to: 30-48

🤖 Prompt for AI Agents

In @docs/cluster/import.md around lines 4 - 9, Add three new subsections under the "File Import" area mirroring the scope of the existing "S3" and "Azure" sections: "Local file", "URL", and "MongoDB". For "Local file" document the upload process (steps to upload via UI/CLI/API), accepted file formats, size limits, and any preprocessing or format requirements; for "URL" document supported URL schemes (http/https), authentication options (basic, bearer, signed URLs), timeout/retry behavior, and how to reference the URL in import commands; for "MongoDB" document connection string format, required drivers/versions, auth mechanisms, how to select a database/collection, field mapping/transform examples, and any batching/throughput limits. Follow the same tone/structure and examples used in the "S3 Import" and "Azure Import" sections so the new subsections are consistent and include sample commands, configuration keys, and known limitations.

Refresh 'Import' documentation

94a841d

plaharanne requested review from bmunkholm, juanpardo and tomach January 8, 2026 16:07

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

docs/cluster/import.md Show resolved Hide resolved

docs/cluster/import.md Show resolved Hide resolved

fix typos

a256a0b

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

docs/cluster/import.md Outdated Show resolved Hide resolved

fix extra space

70e7909

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

plaharanne requested a review from joerg84 January 9, 2026 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refresh 'Import' documentation #114

Refresh 'Import' documentation #114

Uh oh!

plaharanne commented Jan 8, 2026 •

edited by bmunkholm

Loading

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refresh 'Import' documentation #114

Are you sure you want to change the base?

Refresh 'Import' documentation #114

Uh oh!

Conversation

plaharanne commented Jan 8, 2026 • edited by bmunkholm Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's Inside

Preview

Highlights

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

plaharanne commented Jan 8, 2026 •

edited by bmunkholm

Loading

coderabbitai bot commented Jan 8, 2026 •

edited

Loading