Skip to content

Conversation

@plaharanne
Copy link

@plaharanne plaharanne commented Jan 8, 2026

What's Inside

The import flow has been revised and therefore the documentation page is not anymore relevant as it is.
The new version of the page has new screenshots and the text fits what the user see on the screen and the available features and settings.

Preview

See https://crate-cloud--114.org.readthedocs.build/en/114/

Highlights

Checklist

Summary by CodeRabbit

  • Documentation
    • Rewrote import docs into a streamlined, file-centric flow covering local files, URLs, S3, Azure, and MongoDB sources.
    • Added a sample-data note and simplified import steps (select format → source → target table).
    • Updated visuals and terminology to file-based imports; clarified S3/Azure permission notes and multi-file wildcard (globbing) usage.
    • Expanded “Allow schema evolution” guidance with type-mismatch examples and a new File Format Limitations section.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

Walkthrough

Rewrote import documentation to center on file-based imports (local file, URL, AWS S3, Azure, MongoDB) with a unified "File Import" flow, updated images and form terminology, clarified S3/Azure/glob guidance, renamed and clarified schema-evolution behavior, and added File Format Limitations and sample-data guidance.

Changes

Cohort / File(s) Change Summary
Primary doc
docs/cluster/import.md
Full rewrite from URL/history-centric to file-centric import guide: new "File Import" flow, prompts (format, source, target table), sample-data note, updated images and filenames, removed per-source URL examples, consolidated integration references.
S3 & Globbing
docs/cluster/import.md (S3 sections)
Renamed sections to file-import context, clarified AWS permissions and wildcard (glob) usage for multi-file imports, removed some example specifics while keeping wildcard examples.
Azure guidance
docs/cluster/import.md (Azure sections)
Renamed to file-import-azure context, retained secret-based access guidance and path/size notes; wording aligned with file-centric flow.
Schema & Formats
docs/cluster/import.md (schema/formats)
"Schema evolution" renamed to "Allow schema evolution" with behavior clarification and type-mismatch example; added File Format Limitations for CSV, JSON (Documents/Arrays/JSON-Lines), and Parquet.
Integrations & Cross-references
docs/cluster/import.md
Replaced previous Integration/File fragments with cross-references to cluster-integrations and removed redundant per-source integration examples.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

🐰 I hopped through pages, nibbling through text,
From URL vines to file-based next,
S3 leaves and Azure dew,
Schema grows if you say it's true,
A happy hop for docs anew ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refresh 'Import' documentation' directly describes the main change—updating the Import documentation page to align with revised product features.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description follows the required template structure with all key sections present: What's Inside, Preview, Highlights, and Checklist with both items completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch p/import

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)

111-123: Fix double space in Schema Evolution section.

Line 115 contains a formatting inconsistency with double space before "evolution" in "Allow schema evolution".

✏️ Proposed fix
  import process. It can be toggled via the 'Allow schema  evolution' checkbox
+ import process. It can be toggled via the 'Allow schema evolution' checkbox
🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 11-15: Update the typo in the import formats list by replacing the
incorrect word "Paquet" with "Parquet" in the bullet list (the line that
currently reads "Paquet"); ensure the list remains: CSV, JSON (JSON-Lines, JSON
Arrays and JSON Documents), Parquet, MongoDB collection.
- Around line 52-58: Update the sentence in the S3 import documentation to
correct the typo "file form bucket" to "file from bucket" (in the paragraph
describing CrateDB Cloud imports in docs/cluster/import.md) so the sentence
reads "To import a file from a bucket, provide the name of your bucket, and path
to the file."; ensure only the typo is changed and punctuation remains
consistent.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54a7928 and 94a841d.

⛔ Files ignored due to path filters (9)
  • docs/_assets/img/cluster-import-file-form.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-globbing.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-azure.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-choice.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-file.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-s3.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-url.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import.png is excluded by !**/*.png
📒 Files selected for processing (1)
  • docs/cluster/import.md
🧰 Additional context used
🪛 LanguageTool
docs/cluster/import.md

[grammar] ~13-~13: Ensure spelling is correct
Context: ...N-Lines, JSON Arrays and JSON Documents) - Paquet - MongoDB collection :::{note} If you don...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~53-~53: Ensure spelling is correct
Context: ...S3-compatible storage. To import a file form bucket, provide the name of your bucket...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (4)
docs/cluster/import.md (4)

30-47: File Import section looks good.

Clear and concise explanation of the import process with appropriate image reference. The schema evolution toggle description is accurate and helpful.


86-100: Azure subsection is well-structured.

Clear documentation of Azure Blob Storage imports with appropriate guidance on secrets and file path configuration. Wildcard notation and file size limits are clearly documented.


102-105: Integration section reference is appropriate.

Brief section appropriately directs users to more comprehensive integration documentation. Cross-reference syntax is correct.


126-165: File Format Limitations section is well-documented.

Comprehensive coverage of format-specific limitations with clear code examples for JSON variants. Documentation effectively demonstrates expected input formats.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Line 115: Fix the double space in the checkbox label by replacing "Allow
schema  evolution" with "Allow schema evolution" and make capitalization
consistent across the doc (e.g., use "Schema Evolution" or "schema evolution"
everywhere); update the string used in the text to the chosen capitalization so
the checkbox reference matches other occurrences.
🧹 Nitpick comments (2)
docs/cluster/import.md (2)

11-15: Consolidate duplicate data format listings.

The supported data formats are listed twice: once in the introduction (lines 12-14) and again in the File Import section (lines 41-43). Consider removing the duplicate in the File Import section or consolidating into a single reference, as this may confuse users about whether there are different format constraints for different import methods.

Also applies to: 39-45


126-165: File Format Limitations section is well-documented for JSON and CSV.

The examples are clear and helpful. However, Parquet format is listed as supported but has no details or examples, unlike CSV and JSON. Consider adding Parquet documentation for completeness.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94a841d and a256a0b.

📒 Files selected for processing (1)
  • docs/cluster/import.md
🔇 Additional comments (2)
docs/cluster/import.md (2)

30-47: File Import section is clear and well-structured.

The simplified flow with file format, source, and table selection is straightforward. The schema evolution explanation is helpful context at this level.


100-100: Add missing period.

Line 100 ends without a period.

Fix missing punctuation
-File size limitation for imports is 10 GiB per file.
+File size limitation for imports is 10 GiB per file.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)

126-165: File Format Limitations section is incomplete; missing Parquet and MongoDB documentation.

The section documents CSV and JSON formats well with helpful examples, but is missing documentation for two formats that are prominently promoted earlier:

  1. Parquet is mentioned in the introduction (Line 14) and File Import section (Line 43) but has no entry in File Format Limitations.
  2. MongoDB collection is mentioned in the introduction (Line 15) but has no entry in File Format Limitations.

Additionally, the JSON code examples use :::{code} console language tags (Lines 138, 147, 162), which may be semantically incorrect; these should likely use :::{code} json for proper syntax highlighting.

Please add documentation for Parquet and MongoDB collection formats, explaining:

  • Parquet: any schema/type handling specifics, nested structure behavior
  • MongoDB collection: connection requirements, field mapping behavior

Also review the code block language tags for semantic accuracy.

🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 4-9: Add three new subsections under the "File Import" area
mirroring the scope of the existing "S3" and "Azure" sections: "Local file",
"URL", and "MongoDB". For "Local file" document the upload process (steps to
upload via UI/CLI/API), accepted file formats, size limits, and any
preprocessing or format requirements; for "URL" document supported URL schemes
(http/https), authentication options (basic, bearer, signed URLs), timeout/retry
behavior, and how to reference the URL in import commands; for "MongoDB"
document connection string format, required drivers/versions, auth mechanisms,
how to select a database/collection, field mapping/transform examples, and any
batching/throughput limits. Follow the same tone/structure and examples used in
the "S3 Import" and "Azure Import" sections so the new subsections are
consistent and include sample commands, configuration keys, and known
limitations.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a256a0b and 70e7909.

📒 Files selected for processing (1)
  • docs/cluster/import.md
🔇 Additional comments (6)
docs/cluster/import.md (6)

4-21: Verify import history UI reference and image asset.

Line 23-24 references a "Import history" tab, but this feature description appears after the introduction and before explaining how to perform imports. For new users, the logical flow would benefit from explaining the basic import process first before mentioning historical references. Additionally, verify that the referenced image at Line 28 (cluster-import.png) exists and correctly depicts the current UI.


30-48: Verify image asset and source documentation completeness.

The File Import section provides a clear overview of the unified workflow. However, verify that:

  1. The image cluster-import-file-form.png (Line 47) exists and correctly depicts the current file import form.
  2. All five import sources mentioned in the introduction (local file, URL, AWS S3, Azure, MongoDB) are documented in dedicated sections. Currently, only S3 (Lines 49-85) and Azure (Lines 86-101) have subsections; local file, URL, and MongoDB guidance is missing.

49-85: S3 guidance is comprehensive.

The AWS S3 section provides clear instructions including bucket/path requirements, authentication, wildcard support for multi-file imports, and relevant IAM policy examples. The 10 GiB file size limit and egress cost warning are appropriately documented.

Please verify that the IAM policy example (Lines 68-83) reflects current AWS S3 best practices and that no additional S3-specific permissions (e.g., s3:ListBucket for prefix matching) are required for wildcard imports to function correctly.


86-101: Azure guidance is clear and consistent with S3 structure.

The Azure section appropriately documents secret-based authentication, path format, wildcard support, and file size limits. The mention of admin-level secret management is important operational guidance.


102-106: Clarify the Integration section purpose.

The Integration section defers entirely to another documentation page via cross-reference. This is acceptable if comprehensive data integration guidance exists elsewhere, but the section feels incomplete for users reading the import documentation. Consider adding 1-2 sentences explaining what integrations are (e.g., "Integrations allow connecting external data sources for continuous sync") before the cross-reference to provide better context.

Also, verify that the reference {ref}cluster-integrations is correct and that the target page exists and is appropriately maintained.


108-124: Schema evolution section is well-documented with good examples.

The explanation of schema evolution behavior is clear, limitations are explicit, and the type-mismatch example effectively illustrates edge cases. The toggle naming is consistent with the File Import section (Line 36).

Confirm that the described schema evolution behavior (automatic column addition only, type mismatch failures) matches the current product implementation. Also verify whether there are additional limitations (e.g., constraints on column types, handling of nested JSON structures) that should be documented.

Comment on lines +4 to +9
You can import data into your CrateDB directly from sources like:
- local file
- URL
- AWS S3 bucket
- Azure storage
- MongoDB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing documentation for three import sources: local file, URL, and MongoDB.

The introduction lists five import sources (local file, URL, AWS S3, Azure, MongoDB), and the File Import section implies all are equally accessible. However, only S3 (Lines 49-85) and Azure (Lines 86-101) have dedicated documentation sections. Local file, URL, and MongoDB sources lack any guidance.

For users trying to use these sources, there is no information about:

  • Local file: upload process, file size limits, format requirements
  • URL: authentication (if needed), supported URL schemes, timeout behavior
  • MongoDB: connection string format, collection selection, authentication, field mapping

Please add subsections (similar in scope to the S3 and Azure sections) for each missing source, documenting their specific requirements, limitations, and any relevant configuration details.

Also applies to: 30-48

🤖 Prompt for AI Agents
In @docs/cluster/import.md around lines 4 - 9, Add three new subsections under
the "File Import" area mirroring the scope of the existing "S3" and "Azure"
sections: "Local file", "URL", and "MongoDB". For "Local file" document the
upload process (steps to upload via UI/CLI/API), accepted file formats, size
limits, and any preprocessing or format requirements; for "URL" document
supported URL schemes (http/https), authentication options (basic, bearer,
signed URLs), timeout/retry behavior, and how to reference the URL in import
commands; for "MongoDB" document connection string format, required
drivers/versions, auth mechanisms, how to select a database/collection, field
mapping/transform examples, and any batching/throughput limits. Follow the same
tone/structure and examples used in the "S3 Import" and "Azure Import" sections
so the new subsections are consistent and include sample commands, configuration
keys, and known limitations.

@plaharanne plaharanne requested a review from joerg84 January 9, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants