-
Notifications
You must be signed in to change notification settings - Fork 1
Refresh 'Import' documentation #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRewrote import documentation to center on file-based imports (local file, URL, AWS S3, Azure, MongoDB) with a unified "File Import" flow, updated images and form terminology, clarified S3/Azure/glob guidance, renamed and clarified schema-evolution behavior, and added File Format Limitations and sample-data guidance. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)
111-123: Fix double space in Schema Evolution section.Line 115 contains a formatting inconsistency with double space before "evolution" in "Allow schema evolution".
✏️ Proposed fix
import process. It can be toggled via the 'Allow schema evolution' checkbox + import process. It can be toggled via the 'Allow schema evolution' checkbox
🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 11-15: Update the typo in the import formats list by replacing the
incorrect word "Paquet" with "Parquet" in the bullet list (the line that
currently reads "Paquet"); ensure the list remains: CSV, JSON (JSON-Lines, JSON
Arrays and JSON Documents), Parquet, MongoDB collection.
- Around line 52-58: Update the sentence in the S3 import documentation to
correct the typo "file form bucket" to "file from bucket" (in the paragraph
describing CrateDB Cloud imports in docs/cluster/import.md) so the sentence
reads "To import a file from a bucket, provide the name of your bucket, and path
to the file."; ensure only the typo is changed and punctuation remains
consistent.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (9)
docs/_assets/img/cluster-import-file-form.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-globbing.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-tab-azure.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-tab-choice.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-tab-file.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-tab-s3.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-tab-url.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import-tab.pngis excluded by!**/*.pngdocs/_assets/img/cluster-import.pngis excluded by!**/*.png
📒 Files selected for processing (1)
docs/cluster/import.md
🧰 Additional context used
🪛 LanguageTool
docs/cluster/import.md
[grammar] ~13-~13: Ensure spelling is correct
Context: ...N-Lines, JSON Arrays and JSON Documents) - Paquet - MongoDB collection :::{note} If you don...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~53-~53: Ensure spelling is correct
Context: ...S3-compatible storage. To import a file form bucket, provide the name of your bucket...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (4)
docs/cluster/import.md (4)
30-47: File Import section looks good.Clear and concise explanation of the import process with appropriate image reference. The schema evolution toggle description is accurate and helpful.
86-100: Azure subsection is well-structured.Clear documentation of Azure Blob Storage imports with appropriate guidance on secrets and file path configuration. Wildcard notation and file size limits are clearly documented.
102-105: Integration section reference is appropriate.Brief section appropriately directs users to more comprehensive integration documentation. Cross-reference syntax is correct.
126-165: File Format Limitations section is well-documented.Comprehensive coverage of format-specific limitations with clear code examples for JSON variants. Documentation effectively demonstrates expected input formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Line 115: Fix the double space in the checkbox label by replacing "Allow
schema evolution" with "Allow schema evolution" and make capitalization
consistent across the doc (e.g., use "Schema Evolution" or "schema evolution"
everywhere); update the string used in the text to the chosen capitalization so
the checkbox reference matches other occurrences.
🧹 Nitpick comments (2)
docs/cluster/import.md (2)
11-15: Consolidate duplicate data format listings.The supported data formats are listed twice: once in the introduction (lines 12-14) and again in the File Import section (lines 41-43). Consider removing the duplicate in the File Import section or consolidating into a single reference, as this may confuse users about whether there are different format constraints for different import methods.
Also applies to: 39-45
126-165: File Format Limitations section is well-documented for JSON and CSV.The examples are clear and helpful. However, Parquet format is listed as supported but has no details or examples, unlike CSV and JSON. Consider adding Parquet documentation for completeness.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/cluster/import.md
🔇 Additional comments (2)
docs/cluster/import.md (2)
30-47: File Import section is clear and well-structured.The simplified flow with file format, source, and table selection is straightforward. The schema evolution explanation is helpful context at this level.
100-100: Add missing period.Line 100 ends without a period.
Fix missing punctuation
-File size limitation for imports is 10 GiB per file. +File size limitation for imports is 10 GiB per file.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)
126-165: File Format Limitations section is incomplete; missing Parquet and MongoDB documentation.The section documents CSV and JSON formats well with helpful examples, but is missing documentation for two formats that are prominently promoted earlier:
- Parquet is mentioned in the introduction (Line 14) and File Import section (Line 43) but has no entry in File Format Limitations.
- MongoDB collection is mentioned in the introduction (Line 15) but has no entry in File Format Limitations.
Additionally, the JSON code examples use
:::{code} consolelanguage tags (Lines 138, 147, 162), which may be semantically incorrect; these should likely use:::{code} jsonfor proper syntax highlighting.Please add documentation for Parquet and MongoDB collection formats, explaining:
- Parquet: any schema/type handling specifics, nested structure behavior
- MongoDB collection: connection requirements, field mapping behavior
Also review the code block language tags for semantic accuracy.
🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 4-9: Add three new subsections under the "File Import" area
mirroring the scope of the existing "S3" and "Azure" sections: "Local file",
"URL", and "MongoDB". For "Local file" document the upload process (steps to
upload via UI/CLI/API), accepted file formats, size limits, and any
preprocessing or format requirements; for "URL" document supported URL schemes
(http/https), authentication options (basic, bearer, signed URLs), timeout/retry
behavior, and how to reference the URL in import commands; for "MongoDB"
document connection string format, required drivers/versions, auth mechanisms,
how to select a database/collection, field mapping/transform examples, and any
batching/throughput limits. Follow the same tone/structure and examples used in
the "S3 Import" and "Azure Import" sections so the new subsections are
consistent and include sample commands, configuration keys, and known
limitations.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/cluster/import.md
🔇 Additional comments (6)
docs/cluster/import.md (6)
4-21: Verify import history UI reference and image asset.Line 23-24 references a "Import history" tab, but this feature description appears after the introduction and before explaining how to perform imports. For new users, the logical flow would benefit from explaining the basic import process first before mentioning historical references. Additionally, verify that the referenced image at Line 28 (
cluster-import.png) exists and correctly depicts the current UI.
30-48: Verify image asset and source documentation completeness.The File Import section provides a clear overview of the unified workflow. However, verify that:
- The image
cluster-import-file-form.png(Line 47) exists and correctly depicts the current file import form.- All five import sources mentioned in the introduction (local file, URL, AWS S3, Azure, MongoDB) are documented in dedicated sections. Currently, only S3 (Lines 49-85) and Azure (Lines 86-101) have subsections; local file, URL, and MongoDB guidance is missing.
49-85: S3 guidance is comprehensive.The AWS S3 section provides clear instructions including bucket/path requirements, authentication, wildcard support for multi-file imports, and relevant IAM policy examples. The 10 GiB file size limit and egress cost warning are appropriately documented.
Please verify that the IAM policy example (Lines 68-83) reflects current AWS S3 best practices and that no additional S3-specific permissions (e.g.,
s3:ListBucketfor prefix matching) are required for wildcard imports to function correctly.
86-101: Azure guidance is clear and consistent with S3 structure.The Azure section appropriately documents secret-based authentication, path format, wildcard support, and file size limits. The mention of admin-level secret management is important operational guidance.
102-106: Clarify the Integration section purpose.The Integration section defers entirely to another documentation page via cross-reference. This is acceptable if comprehensive data integration guidance exists elsewhere, but the section feels incomplete for users reading the import documentation. Consider adding 1-2 sentences explaining what integrations are (e.g., "Integrations allow connecting external data sources for continuous sync") before the cross-reference to provide better context.
Also, verify that the reference
{ref}cluster-integrationsis correct and that the target page exists and is appropriately maintained.
108-124: Schema evolution section is well-documented with good examples.The explanation of schema evolution behavior is clear, limitations are explicit, and the type-mismatch example effectively illustrates edge cases. The toggle naming is consistent with the File Import section (Line 36).
Confirm that the described schema evolution behavior (automatic column addition only, type mismatch failures) matches the current product implementation. Also verify whether there are additional limitations (e.g., constraints on column types, handling of nested JSON structures) that should be documented.
| You can import data into your CrateDB directly from sources like: | ||
| - local file | ||
| - URL | ||
| - AWS S3 bucket | ||
| - Azure storage | ||
| - MongoDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for three import sources: local file, URL, and MongoDB.
The introduction lists five import sources (local file, URL, AWS S3, Azure, MongoDB), and the File Import section implies all are equally accessible. However, only S3 (Lines 49-85) and Azure (Lines 86-101) have dedicated documentation sections. Local file, URL, and MongoDB sources lack any guidance.
For users trying to use these sources, there is no information about:
- Local file: upload process, file size limits, format requirements
- URL: authentication (if needed), supported URL schemes, timeout behavior
- MongoDB: connection string format, collection selection, authentication, field mapping
Please add subsections (similar in scope to the S3 and Azure sections) for each missing source, documenting their specific requirements, limitations, and any relevant configuration details.
Also applies to: 30-48
🤖 Prompt for AI Agents
In @docs/cluster/import.md around lines 4 - 9, Add three new subsections under
the "File Import" area mirroring the scope of the existing "S3" and "Azure"
sections: "Local file", "URL", and "MongoDB". For "Local file" document the
upload process (steps to upload via UI/CLI/API), accepted file formats, size
limits, and any preprocessing or format requirements; for "URL" document
supported URL schemes (http/https), authentication options (basic, bearer,
signed URLs), timeout/retry behavior, and how to reference the URL in import
commands; for "MongoDB" document connection string format, required
drivers/versions, auth mechanisms, how to select a database/collection, field
mapping/transform examples, and any batching/throughput limits. Follow the same
tone/structure and examples used in the "S3 Import" and "Azure Import" sections
so the new subsections are consistent and include sample commands, configuration
keys, and known limitations.
What's Inside
The import flow has been revised and therefore the documentation page is not anymore relevant as it is.
The new version of the page has new screenshots and the text fits what the user see on the screen and the available features and settings.
Preview
See https://crate-cloud--114.org.readthedocs.build/en/114/
Highlights
Checklist
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.