TOOLS-4073 Make mongorestore work when restoring a dump with replicated record IDs to a cluster that doesn't support this#875
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
07fe9a7 to
c769b37
Compare
b879434 to
03b7e3b
Compare
c769b37 to
9e7e2c5
Compare
03b7e3b to
85fb02a
Compare
9e7e2c5 to
12a6c61
Compare
85fb02a to
3dc8fcb
Compare
12a6c61 to
c224ed2
Compare
3dc8fcb to
bd21afe
Compare
c224ed2 to
f4746cf
Compare
bd21afe to
1c5b465
Compare
f4746cf to
865983d
Compare
1c5b465 to
0712cf7
Compare
865983d to
a73c37a
Compare
0712cf7 to
afedc4d
Compare
|
See this eng proposal for more context. |
a73c37a to
b351dd5
Compare
0f7684d to
5b02458
Compare
954005c to
ff4ac8f
Compare
5b02458 to
30faca2
Compare
30faca2 to
84ee53f
Compare
…ed record IDs to a cluster that doesn't support this
ff4ac8f to
8279a5a
Compare
FGasper
left a comment
There was a problem hiding this comment.
The record ID type issue must be fixed.
We should also add tests to catch this & any other new/unknown BSON types for record IDs.
| if result.Err() != nil { | ||
| // If the command fails, the feature flag doesn't exist or isn't enabled. Or maybe something | ||
| // is totally broken, in which case we will find that out when we attempt other DB | ||
| // operations. |
There was a problem hiding this comment.
Can we instead check for an error code?
The failure may be intermittent, so ISTM we really shouldn’t just throw it away. At the very least it should be logged, but even then we’d want a good comment about why we can’t reliably fail.
| log.DebugLow, | ||
| "removing recordIdsReplicated option (target server does not support this feature)", | ||
| ) | ||
| return append(options[:i], options[i+1:]...) |
There was a problem hiding this comment.
IMO slices.Delete() would be clearer.
| TxnNumber *int64 `bson:"txnNumber,omitempty"` | ||
| PrevOpTime bson.Raw `bson:"prevOpTime,omitempty"` | ||
| MultiOpType *int `bson:"multiOpType,omitempty"` | ||
| RecordId *int64 `bson:"rid,omitempty"` |
There was a problem hiding this comment.
Record IDs are not all int64s, though. In clustered collections they’re binary strings. And in 5.0 timeseries they’re strings.
|
|
||
| filtered := make([]db.Oplog, len(ops)) | ||
| for i, v := range ops { | ||
| filtered[i], err = restore.filterRecordIds(v) |
There was a problem hiding this comment.
Is there a test case which shows this logic is necessary? My understanding is HandleNonTxnOp() already unrolls top-level applyOps oplog entries. The oplog will not contain a nested applyOps oplog entry in any server version an "rid" field may be present.
I would have expected omitting the RecordId field from the Oplog struct to work as desired automatically. What am I missing?
mongo-tools/mongorestore/oplog.go
Lines 307 to 324 in 8279a5a
[note] Prior to SERVER-45033 (7.0), the server would record the applyOps command the client had specified which means that the applyOps oplog entry could have another applyOps oplog entry within it. There was a finite limit imposed by the server of 10 nested applyOps.
| // filterRecordIdsReplicatedOption removes the `recordIdsReplicated` option from the collection | ||
| // options if the target server doesn't have the `featureFlagRecordIdsReplicated` feature enabled. | ||
| func (restore *MongoRestore) filterRecordIdsReplicatedOption(options bson.D) bson.D { | ||
| if restore.recordIdsReplicatedEnabled { |
There was a problem hiding this comment.
If featureFlagRecordIdsReplicated is enabled by default on the target system, then the create command is going to create a collection which has all replica set members using the same mapping of record ID → BSON document. The target system will do so even if recordIdsReplicated: true is omitted from the create command request.
There is zero interest in preserving the absence of recordIdsReplicated in the collection metadata as recordIdsReplicated: false on a target system with featureFlagRecordIdsReplicated enabled by default. Why not always remove the recordIdsReplicated collection option and let the target system decide based on its own default? This means there would be no need to run the getParameter command earlier.
There was a problem hiding this comment.
@henrikedin @erreina would you please confirm that following SERVER-85589, the recordIdsReplicated option will no longer appear in op=c create collection oplog entries? If not, then there would be another place in mongorestore to always remove the recordIdsReplicated option from.
mongo-tools/mongorestore/oplog.go
Lines 363 to 364 in 8279a5a
There was a problem hiding this comment.
It will not be on the options in the create collection oplog entries, but it will now be on the o2 field.
Always removing the recordIdsReplicated field from the collection metadata and let the target decide seems reasonable to me.

This was written by Copilot using Claude Opus and reviewed by me. I also made some additional changes based on issues I saw when testing these changes.
There are a few things this does:
recordIdsReplicatedflag from collection options when they're being created.rid) when restoring them. Per discussion with Server folks working on this, the cluster will handle assigning record IDs and does not need this information.ciandcdops when restoring from the oplog. These are new ops specific to replicated record IDs and index creation and deletion. They are internal operations that do not need to be replicated when restoring. The cluster will always manage indexes appropriately.