Fix issues #4, #6, and #14: Documentation, error messages, and invalid dump generation#16
Conversation
Changed incorrect `-Fd` to `-Fc` in the docstring for the `load()` function. pgdumplib only supports custom format dumps created with `pg_dump -Fc`, not directory format (`-Fd`). Fixes #4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Enhanced the error message when attempting to load invalid or incompatible dump files. The library now detects: - Plain SQL text files (starting with --, /*, CREATE, INSERT, etc.) - Invalid binary formats Instead of the generic "Invalid archive header" error, users now receive helpful guidance like: "This appears to be a plain SQL text file. pgdumplib only supports custom format dumps created with pg_dump -Fc" Added comprehensive test coverage for various invalid file formats. Fixes #14 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
WalkthroughDocstring corrected to state Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant L as pgdumplib.load()
participant R as dump._read_header/_read_entry
participant S as Dump.save()/_write_str
rect rgb(248,249,255)
U->>L: open file path
L->>R: read header bytes
alt archive magic detected
R->>R: parse entries\nnormalize '' -> None (tablespace/tableam)
R-->>L: return Dump
else plain SQL detected
R-->>L: raise ValueError("plain SQL text file" + hint "pg_dump -Fc")
else invalid binary
R-->>L: raise ValueError("Invalid archive" + hint "pg_dump -Fc")
end
end
rect rgb(249,255,250)
U->>S: save Dump to file
S->>S: write header\nfor optional strings: write -1 if None
S-->>U: output file
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
pgdumplib/dump.py (3)
656-680: Normalization oftablespace/tableamis good; consider aligningadd_entrydefaultsNormalizing empty
tablespaceandtableamtoNonein_read_entry()is a solid fix and pairs well with the new_write_str()semantics to avoid emitting invalid empty values.One minor inconsistency is that
add_entry()still coercesNoneto empty strings:tablespace=tablespace or '', tableam=tableam or '',so new entries created via the API will never have
Nonefor these fields, only''. That’s not a blocker for the load/save regression you’re fixing (since entries coming from real dumps are normalized toNone), but it does mean the public API can’t distinguish “unset” from “explicitly empty” when creating dumps.If you want the behavior to be fully consistent and future‑proof, you could let
Nonepass through unchanged:- tablespace=tablespace or '', - tableam=tableam or '', + tablespace=tablespace, + tableam=tableam,so that callers can choose between
None(not set) and''(explicitly empty), matching how_read_entry()and_write_str()now behave.Also applies to: 193-282
713-743: Header detection logic matches tests; optional: always mention-Fcin generic errorThe new
_read_header()logic cleanly separates three cases:
- Valid custom-format (
MAGICmatches) → normal path.- Plain SQL (comments or leading SQL keywords in the first 5 bytes) → “plain SQL text file … pg_dump -Fc” message.
- Non‑printable or empty prefix → “Invalid archive format … pg_dump -Fc” message.
This aligns with the new tests and gives much clearer guidance than the prior generic error.
If you’d like to make the UX even more consistent for other “printable but wrong” prefixes (e.g., a random text file starting with some other letters), consider updating the default
error_msgto also referencepg_dump -Fc, e.g.:- if magic_bytes != constants.MAGIC: - # Provide helpful error messages based on file content - error_msg = 'Invalid archive header' + if magic_bytes != constants.MAGIC: + # Provide helpful error messages based on file content + error_msg = ( + 'Invalid archive header. ' + 'pgdumplib only supports custom format dumps ' + 'created with pg_dump -Fc' + )leaving the more specific branches as-is. That way every “not a custom archive” error nudges users toward the correct
pg_dumpflag.
1082-1098:_write_strcorrectly distinguishesNonevs empty string; update docstring to reflectThe new
_write_str()behavior—writing-1forNoneand0for''—is exactly what’s needed to preserve the “not set vs empty” distinction in the archive format and resolves the invalid""tableam issue when combined with the normalization in_read_entry().The only small mismatch is the docstring, which still documents
valueasstronly. To keep the API self‑documenting, you could adjust it along these lines:- def _write_str(self, value: str | None) -> None: - """Write a string - - :param str value: The string to write + def _write_str(self, value: str | None) -> None: + """Write a string or a NULL marker. + + :param value: The string to write, or None to write a -1 length + (indicating an unset/NULL field in the archive format).Behavior itself looks good and is consistent with
_read_bytes().tests/test_save_dump.py (1)
3-6: Issue #6 regression test effectively covers emptytableambehaviorThe new
Issue6RegressionTestCasedoes a nice job of:
- Creating a dump for a Postgres version where
tableamis relevant (appear_as='13.1').- Ensuring at least one table has
tableam='heap'while another object (the sequence) has no tableam set.- Saving to disk and validating, via
pg_restore --schema-only, that:
- The process succeeds.
- No
SET default_table_access_method = ""appears.- A valid
SET default_table_access_method = heapis present.The use of
NamedTemporaryFile(delete=False)plus an explicitfinallycleanup is correct and mirrors existing pg_restore-based tests in this file. If you ever decide to extend coverage further, you might optionally consider adding a similar test that starts from a real fixture dump file to mirror the exact load→save scenario from the original issue, but what’s here is already a solid regression guard.Also applies to: 182-250
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
pgdumplib/__init__.py(1 hunks)pgdumplib/dump.py(3 hunks)tests/test_dump.py(1 hunks)tests/test_save_dump.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_dump.py (2)
pgdumplib/dump.py (1)
load(319-390)pgdumplib/__init__.py (1)
load(20-37)
tests/test_save_dump.py (2)
pgdumplib/__init__.py (1)
new(40-60)pgdumplib/dump.py (2)
add_entry(193-283)save(411-423)
pgdumplib/dump.py (1)
tests/test_edge_cases.py (1)
_write_int(21-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: test (3.14, 17)
- GitHub Check: test (3.14, 16)
- GitHub Check: test (3.14, 18)
- GitHub Check: test (3.13, 17)
- GitHub Check: test (3.13, 18)
- GitHub Check: test (3.12, 18)
- GitHub Check: test (3.11, 15)
- GitHub Check: test (3.13, 16)
- GitHub Check: test (3.11, 18)
- GitHub Check: test (3.12, 17)
- GitHub Check: test (3.11, 16)
- GitHub Check: test (3.12, 16)
- GitHub Check: test (3.11, 17)
- GitHub Check: test (3.13, 15)
- GitHub Check: test (3.12, 15)
- GitHub Check: test (3.13, 17)
- GitHub Check: test (3.13, 18)
- GitHub Check: test (3.12, 18)
- GitHub Check: test (3.13, 16)
- GitHub Check: test (3.13, 15)
- GitHub Check: test (3.12, 16)
- GitHub Check: test (3.11, 17)
- GitHub Check: test (3.12, 17)
- GitHub Check: test (3.11, 15)
- GitHub Check: test (3.12, 15)
- GitHub Check: test (3.11, 16)
- GitHub Check: test (3.11, 18)
🔇 Additional comments (2)
pgdumplib/__init__.py (1)
24-34: Docstring now correctly documents-Fccustom formatThe updated
load()docstring correctly states that onlypg_dump -Fccustom-format files are supported and is consistent with the new error messages elsewhere in the PR.tests/test_dump.py (1)
193-238: New error‑handling tests accurately capture header detection behaviorThe three added tests robustly cover plain SQL (commented and
CREATE-prefixed) and invalid binary inputs, assert against stable substrings ("plain SQL text file","Invalid archive","pg_dump -Fc"), and clean up temp files correctly. They give good regression coverage for the new_read_header()behavior.
When pgdumplib loaded dumps with PostgreSQL v12+ format (version 1.14.0+), entries with empty tableam fields were stored as empty strings. On save, these empty strings were written as zero-length fields, which pg_restore incorrectly interpreted as explicit empty values, generating invalid SQL: SET default_table_access_method = ""; This caused restoration failures with: ERROR: zero-length delimited identifier at or near "" Root cause: The _write_str() method treated both None and empty string identically, writing length 0 for both. PostgreSQL requires a distinction: - None (not set) should be represented as length -1 - Empty string should be represented as length 0 Changes made: 1. Normalize empty tableam/tablespace to None during load (_read_entry) - Prevents empty strings from entering the data model - Maintains consistency with the Entry dataclass defaults 2. Update add_entry() to default tableam/tablespace to None - Changed from `tableam or ''` to `tableam or None` - Ensures consistency across manual entry creation 3. Fix _write_str() to distinguish None from empty string - Write -1 for None (field not set) - Write 0 for empty string (field explicitly empty) - Critical fix: prevents pg_restore from generating invalid SQL Added comprehensive regression test that: - Creates dump with table (tableam='heap') and sequence (no tableam) - Verifies pg_restore doesn't generate invalid SET statements - Confirms valid SET statements are preserved Fixes #6 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2e66920 to
e0216be
Compare
- Update default error message to always mention pg_dump -Fc - Improve _write_str() docstring to document None handling
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_save_dump.py (1)
182-250: Consider adding a skip decorator for missing pg_restore.The test depends on
pg_restorebeing available in the system PATH, which may not be true in all CI/CD environments or developer machines. Withoutpg_restore, this test will fail rather than skip.Consider adding a check to skip the test gracefully:
import shutil import unittest class Issue6RegressionTestCase(unittest.TestCase): @unittest.skipIf( shutil.which('pg_restore') is None, 'pg_restore not found in PATH' ) def test_load_save_no_empty_tableam_sql(self): # ... existing test codeAlternatively, document in the test suite README that
pg_restoremust be installed for the full test suite to pass.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pgdumplib/dump.py(4 hunks)tests/test_save_dump.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_save_dump.py (2)
pgdumplib/__init__.py (1)
new(40-60)pgdumplib/dump.py (2)
add_entry(193-283)save(411-423)
pgdumplib/dump.py (1)
tests/test_edge_cases.py (1)
_write_int(21-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (3.11, 16)
- GitHub Check: test (3.13, 18)
- GitHub Check: test (3.12, 17)
- GitHub Check: test (3.11, 15)
- GitHub Check: test (3.13, 16)
- GitHub Check: test (3.11, 18)
- GitHub Check: test (3.11, 17)
- GitHub Check: test (3.12, 16)
- GitHub Check: test (3.12, 15)
- GitHub Check: test (3.13, 17)
- GitHub Check: test (3.13, 15)
- GitHub Check: test (3.12, 18)
🔇 Additional comments (8)
tests/test_save_dump.py (3)
3-6: LGTM!The added imports are all properly utilized in the new regression test.
222-226: LGTM!The subprocess call is secure (using list args without shell) and the noqa flags are appropriate. The command correctly uses
--schema-onlyto validate SQL generation without requiring a database connection.
229-247: LGTM!The test assertions comprehensively validate the fix:
- Ensures
pg_restoresucceeds- Confirms no invalid empty
tableamSQL is generated- Verifies correct
tableamSQL is present for tables with explicit access methodsThis properly tests the regression described in issue #6.
pgdumplib/dump.py (5)
275-276: LGTM! Consistent normalization to None.The change to use
or Noneensures that empty strings and None values are normalized to None, preventing the generation of invalid SQL statements likeSET default_table_access_method = "";. This is consistent with the corresponding changes in_read_entryand_write_str.
672-673: LGTM! Proper normalization of tablespace.The normalization ensures consistency by converting empty strings to None when reading entries, which aligns with the serialization logic in
_write_str.
674-680: LGTM! Critical fix for issue #6.The normalization of
tableamfrom empty string to None, combined with the explanatory comment, directly addresses the bug where emptytableamvalues causedpg_restoreto generate invalid SQL. The version gating is correct for backward compatibility.
721-743: LGTM! Excellent UX improvement for issue #14.The enhanced error detection provides clear, actionable error messages when users attempt to load incompatible file formats. The heuristics for detecting plain SQL files (checking for common SQL keywords and comment markers) are reasonable and will catch the vast majority of cases where users mistakenly try to load plain SQL dumps.
1090-1098: LGTM! Critical serialization fix for issue #6.The distinction between None (written as -1 length) and empty string (written as 0 length) is essential for the fix. Combined with the normalization in
_read_entryandadd_entry, this ensures thattablespaceandtableamare consistently represented as None (not set) rather than empty strings, preventing invalid SQL generation.The change maintains backward compatibility since both -1 and 0 length are read as empty bytes and normalized to None.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pgdumplib/dump.py (1)
721-743: Consider case-insensitive plain-text detection.
Lowercase SQL headers (create table,insert into, etc.) won’t trigger the specialized message today because the prefix list is upper-case only. Normalizing once before thestartswithchecks would capture those cases without affecting the fallback path.- if file_start.startswith(('--', '/*', 'SE', 'CR', 'IN', 'DR')): + file_start_upper = file_start.upper() + if file_start_upper.startswith(('--', '/*', 'SE', 'CR', 'IN', 'DR')):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pgdumplib/dump.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pgdumplib/dump.py (1)
tests/test_edge_cases.py (1)
_write_int(21-27)
🪛 Ruff (0.14.5)
pgdumplib/dump.py
1094-1094: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: test (3.13, 17)
- GitHub Check: test (3.14, 17)
- GitHub Check: test (3.13, 18)
- GitHub Check: test (3.13, 16)
- GitHub Check: test (3.14, 16)
- GitHub Check: test (3.11, 15)
- GitHub Check: test (3.12, 18)
- GitHub Check: test (3.12, 15)
- GitHub Check: test (3.11, 18)
- GitHub Check: test (3.11, 16)
- GitHub Check: test (3.13, 15)
- GitHub Check: test (3.11, 17)
- GitHub Check: test (3.12, 16)
- GitHub Check: test (3.12, 17)
- GitHub Check: test (3.13, 16)
- GitHub Check: test (3.13, 15)
- GitHub Check: test (3.13, 17)
- GitHub Check: test (3.13, 18)
- GitHub Check: test (3.11, 18)
- GitHub Check: test (3.11, 15)
- GitHub Check: test (3.12, 17)
- GitHub Check: test (3.12, 15)
- GitHub Check: test (3.11, 17)
- GitHub Check: test (3.12, 18)
- GitHub Check: test (3.12, 16)
- GitHub Check: test (3.11, 16)
🔇 Additional comments (3)
pgdumplib/dump.py (3)
275-276: Thanks for normalizing optional storage fields.
PropagatingNoneinto the in-memorymodels.Entrykeeps_write_stremitting the null sentinel and prevents another round of zero-length identifiers when the dump is rewritten. Nicely aligned with the regression fix.
673-679: Solid read-path normalization.
Coalescing empty strings back toNoneon load keeps the model symmetric with the writer and closes the loop on thedefault_table_access_methodregression.
1087-1103: Appreciate the_write_strnull sentinel fix.
Writing-1for unset fields matches the archive format and directly addresses the zero-length identifier bug—great coordination with the read-path changes.
Summary
This PR fixes three GitHub issues with comprehensive solutions:
-Fd→-Fc)Issues Resolved
Issue #4: Documentation Typo
Fixed incorrect reference to
-Fd(directory format) in theload()function docstring. pgdumplib only supports custom format dumps created withpg_dump -Fc.Issue #14: Better Error Messaging
Enhanced error messages when attempting to load invalid or incompatible dump files. The library now detects:
--,/*,CREATE,INSERT, etc.)Users now receive helpful guidance instead of generic "Invalid archive header" errors, making it clear that pgdumplib requires custom format dumps created with
pg_dump -Fc.Issue #6: Invalid Dump After Load/Save (Critical Bug)
Fixed a critical bug where loading and saving dumps with PostgreSQL v12+ format caused
pg_restoreto fail with:Root Cause: The
_write_str()method treated bothNoneand empty string identically, writing length 0 for both. PostgreSQL requires a distinction between "not set" (length -1) and "empty string" (length 0).Solution:
tableam/tablespacestrings toNoneduring loadadd_entry()to default these fields toNoneinstead of empty strings_write_str()to write -1 forNoneand 0 for empty stringsChanges Made
Documentation (
pgdumplib/__init__.py)-Fd→-FcCore Library (
pgdumplib/dump.py)_read_header()with intelligent file type detection and helpful error messagestableam/tablespacetoNonein_read_entry()add_entry()to defaulttableam/tablespacetoNone_write_str()to properly distinguishNonefrom empty stringTests
tests/test_dump.py)pg_restorevalidation (tests/test_save_dump.py)Testing
pg_restoreno longer generates invalid SQLVerification
Tested the fix for issue #6 with the original dump file:
Resolves #4
Resolves #6
Resolves #14
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.