Skip to content

TOOLS-4073 Make mongorestore work when restoring a dump with replicated record IDs to a cluster that doesn't support this#875

Open
autarch wants to merge 1 commit into02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clustersfrom
02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this
Open

TOOLS-4073 Make mongorestore work when restoring a dump with replicated record IDs to a cluster that doesn't support this#875
autarch wants to merge 1 commit into02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clustersfrom
02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this

Conversation

@autarch
Copy link
Collaborator

@autarch autarch commented Feb 3, 2026

This was written by Copilot using Claude Opus and reviewed by me. I also made some additional changes based on issues I saw when testing these changes.

There are a few things this does:

  • If the cluster we're restoring doesn't have replicated record IDs enabled, it removes the recordIdsReplicated flag from collection options when they're being created.
    • In the future, this collection option will go away entirely. Instead, if the cluster is configured for replicated record IDs, collections will always have this enabled. When that happens, we will remove this part of the code. See TOOLS-4076 for more details.
  • It always removes the record ID field from oplog entries (rid) when restoring them. Per discussion with Server folks working on this, the cluster will handle assigning record IDs and does not need this information.
  • It always filters out ci and cd ops when restoring from the oplog. These are new ops specific to replicated record IDs and index creation and deletion. They are internal operations that do not need to be replicated when restoring. The cluster will always manage indexes appropriately.

Copy link
Collaborator Author

autarch commented Feb 3, 2026

@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from 07fe9a7 to c769b37 Compare February 4, 2026 04:05
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from b879434 to 03b7e3b Compare February 4, 2026 04:05
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from c769b37 to 9e7e2c5 Compare February 4, 2026 04:13
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 03b7e3b to 85fb02a Compare February 4, 2026 04:13
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from 9e7e2c5 to 12a6c61 Compare February 4, 2026 06:51
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 85fb02a to 3dc8fcb Compare February 4, 2026 06:51
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from 12a6c61 to c224ed2 Compare February 4, 2026 06:55
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 3dc8fcb to bd21afe Compare February 4, 2026 06:55
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from c224ed2 to f4746cf Compare February 4, 2026 14:13
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from bd21afe to 1c5b465 Compare February 4, 2026 14:13
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from f4746cf to 865983d Compare February 4, 2026 15:08
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 1c5b465 to 0712cf7 Compare February 4, 2026 15:08
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from 865983d to a73c37a Compare February 5, 2026 03:11
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 0712cf7 to afedc4d Compare February 5, 2026 03:11
@autarch autarch requested a review from FGasper February 5, 2026 03:16
@autarch autarch marked this pull request as ready for review February 5, 2026 03:16
@autarch autarch requested a review from a team as a code owner February 5, 2026 03:16
Copy link
Collaborator Author

autarch commented Feb 5, 2026

See this eng proposal for more context.

@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from a73c37a to b351dd5 Compare February 5, 2026 03:20
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch 2 times, most recently from 0f7684d to 5b02458 Compare February 5, 2026 15:52
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch 2 times, most recently from 954005c to ff4ac8f Compare February 5, 2026 15:58
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 5b02458 to 30faca2 Compare February 5, 2026 15:58
@autarch autarch mentioned this pull request Feb 5, 2026
@autarch autarch force-pushed the 02-03-add_a_tool_to_perform_manual_dump/restore_testing_across_two_clusters branch from 30faca2 to 84ee53f Compare February 5, 2026 16:29
…ed record IDs to a cluster that doesn't support this
@autarch autarch force-pushed the 02-03-tools-4073_make_mongorestore_work_when_restoring_a_dump_with_replicated_record_ids_to_a_cluster_that_doesn_t_support_this branch from ff4ac8f to 8279a5a Compare February 5, 2026 16:30
Copy link
Collaborator

@FGasper FGasper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The record ID type issue must be fixed.

We should also add tests to catch this & any other new/unknown BSON types for record IDs.

if result.Err() != nil {
// If the command fails, the feature flag doesn't exist or isn't enabled. Or maybe something
// is totally broken, in which case we will find that out when we attempt other DB
// operations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead check for an error code?

The failure may be intermittent, so ISTM we really shouldn’t just throw it away. At the very least it should be logged, but even then we’d want a good comment about why we can’t reliably fail.

log.DebugLow,
"removing recordIdsReplicated option (target server does not support this feature)",
)
return append(options[:i], options[i+1:]...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO slices.Delete() would be clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, lo.Filter.

TxnNumber *int64 `bson:"txnNumber,omitempty"`
PrevOpTime bson.Raw `bson:"prevOpTime,omitempty"`
MultiOpType *int `bson:"multiOpType,omitempty"`
RecordId *int64 `bson:"rid,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record IDs are not all int64s, though. In clustered collections they’re binary strings. And in 5.0 timeseries they’re strings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest: *bson.RawValue

Copy link
Member

@visemet visemet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding my own thoughts here.


filtered := make([]db.Oplog, len(ops))
for i, v := range ops {
filtered[i], err = restore.filterRecordIds(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case which shows this logic is necessary? My understanding is HandleNonTxnOp() already unrolls top-level applyOps oplog entries. The oplog will not contain a nested applyOps oplog entry in any server version an "rid" field may be present.

I would have expected omitting the RecordId field from the Oplog struct to work as desired automatically. What am I missing?

case "applyOps":
rawOps, ok := op.Object[0].Value.(bson.A)
if !ok {
return fmt.Errorf("unknown format for applyOps: %#v", op.Object)
}
for _, rawOp := range rawOps {
bytesOp, err := bson.Marshal(rawOp)
if err != nil {
return fmt.Errorf("could not marshal applyOps operation: %v: %v", rawOp, err)
}
var nestedOp db.Oplog
err = bson.Unmarshal(bytesOp, &nestedOp)
if err != nil {
return fmt.Errorf("could not unmarshal applyOps command: %v: %v", rawOp, err)
}
err = restore.HandleOp(oplogCtx, nestedOp)

[note] Prior to SERVER-45033 (7.0), the server would record the applyOps command the client had specified which means that the applyOps oplog entry could have another applyOps oplog entry within it. There was a finite limit imposed by the server of 10 nested applyOps.

// filterRecordIdsReplicatedOption removes the `recordIdsReplicated` option from the collection
// options if the target server doesn't have the `featureFlagRecordIdsReplicated` feature enabled.
func (restore *MongoRestore) filterRecordIdsReplicatedOption(options bson.D) bson.D {
if restore.recordIdsReplicatedEnabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If featureFlagRecordIdsReplicated is enabled by default on the target system, then the create command is going to create a collection which has all replica set members using the same mapping of record ID → BSON document. The target system will do so even if recordIdsReplicated: true is omitted from the create command request.

There is zero interest in preserving the absence of recordIdsReplicated in the collection metadata as recordIdsReplicated: false on a target system with featureFlagRecordIdsReplicated enabled by default. Why not always remove the recordIdsReplicated collection option and let the target system decide based on its own default? This means there would be no need to run the getParameter command earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henrikedin @erreina would you please confirm that following SERVER-85589, the recordIdsReplicated option will no longer appear in op=c create collection oplog entries? If not, then there would be another place in mongorestore to always remove the recordIdsReplicated option from.

case "create":
collName, ok := op.Object[0].Value.(string)

Copy link

@erreina erreina Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not be on the options in the create collection oplog entries, but it will now be on the o2 field.

Always removing the recordIdsReplicated field from the collection metadata and let the target decide seems reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants