Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds Point-in-Time Recovery (PITR) support to the Percona Server for MySQL Operator by introducing a PITR restore workflow that (1) searches binlogs from the Binlog Server for a requested target (GTID or timestamp) and (2) runs a dedicated PITR restore Job after the base xtrabackup restore completes.
Changes:
- Introduces PITR restore orchestration in the restore controller (binlog search → ConfigMap → PITR Job).
- Adds a PITR restore Job builder (
pkg/pitr) plus a newpitrCLI used inside the PITR Job. - Extends Binlog Server configuration generation (TLS/replication/storage options) and adds binlog search exec helpers.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/pitr/pitr.go | Builds PITR restore Job + env/volume wiring for PITR execution. |
| pkg/controller/psrestore/controller.go | Adds PITR config creation, binlog search, and PITR Job lifecycle after base restore. |
| pkg/controller/ps/controller.go | Updates Binlog Server reconcile: endpoint parsing, S3 URI formatting, TLS/replication/storage config, cleanup. |
| pkg/binlogserver/search.go | Adds exec-based binlog search helpers returning structured entries. |
| pkg/binlogserver/config.go | Extends Binlog Server config schema (SSL/TLS, replication mode/rewrite, storage options). |
| pkg/binlogserver/binlog_server.go | Adds buffer volume/mount, uses shared binary const, changes replicas behavior. |
| api/v1/perconaservermysqlrestore_types.go | Adds .spec.pitr to Restore CRD API types. |
| api/v1/zz_generated.deepcopy.go | Deepcopy support for new PITR restore spec. |
| deploy/crd.yaml / deploy/bundle.yaml / deploy/cw-bundle.yaml / config/crd/bases/... | CRD schema updates for .spec.pitr. |
| deploy/cr.yaml | Example CR updates to include PiTR binlogServer configuration. |
| cmd/pitr/main.go | New PITR helper CLI (setup downloads relay logs; apply runs replication until target). |
| cmd/internal/db/db.go | Adds replication/PITR helper SQL methods for the PITR CLI. |
| build/run-pitr-restore.sh | New script executed by PITR Job container to run setup/apply around a local mysqld. |
| build/ps-init-entrypoint.sh | Installs pitr binary and PITR restore script into the shared /opt/percona volume. |
| build/Dockerfile | Builds and packages the new pitr binary into the operator image. |
| api/v1/perconaservermysql_types.go | Disables PITR when cluster is paused (defaulting behavior). |
Comments suppressed due to low confidence (1)
pkg/binlogserver/binlog_server.go:84
- StatefulSet replicas are hard-coded to 1, ignoring the user-provided spec.Size from BinlogServerSpec (PodSpec inline). This is a behavior change from using spec.Size and makes the size field effectively meaningless for this component. Either honor spec.Size (including allowing 0 to disable) or remove/ignore the field consistently at the API level.
func StatefulSet(cr *apiv1.PerconaServerMySQL, initImage, configHash string) *appsv1.StatefulSet {
spec := cr.Spec.Backup.PiTR.BinlogServer
labels := MatchLabels(cr)
annotations := make(map[string]string)
if configHash != "" {
annotations[string(naming.AnnotationConfigHash)] = configHash
}
return &appsv1.StatefulSet{
TypeMeta: metav1.TypeMeta{
APIVersion: "apps/v1",
Kind: "StatefulSet",
},
ObjectMeta: metav1.ObjectMeta{
Name: Name(cr),
Namespace: cr.Namespace,
Labels: labels,
Annotations: cr.GlobalAnnotations(),
},
Spec: appsv1.StatefulSetSpec{
Replicas: ptr.To(int32(1)),
Selector: &metav1.LabelSelector{
MatchLabels: labels,
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: labels,
Annotations: util.SSMapMerge(cr.GlobalAnnotations(), annotations),
},
Spec: spec.Core(
labels,
volumes(cr),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/controller/ps/controller.go
Outdated
| s3Uri := fmt.Sprintf("%s://%s:%s@%s/%s", protocol, accessKey, secretKey, host, s3.Bucket) | ||
| if len(s3.Prefix) > 0 { | ||
| s3Uri += fmt.Sprintf("/%s", s3.Prefix) |
There was a problem hiding this comment.
reconcileBinlogServer builds s3Uri using s3.Bucket and s3.Prefix directly. BackupStorageS3Spec.Bucket can already include a prefix (BucketWithPrefix), so this can produce an invalid URI like "...///". Use BucketAndPrefix() and join the resulting prefix once to avoid double-prefixing and to keep bucket vs key separation correct.
| s3Uri := fmt.Sprintf("%s://%s:%s@%s/%s", protocol, accessKey, secretKey, host, s3.Bucket) | |
| if len(s3.Prefix) > 0 { | |
| s3Uri += fmt.Sprintf("/%s", s3.Prefix) | |
| bucket, prefix := s3.BucketAndPrefix() | |
| s3Uri := fmt.Sprintf("%s://%s:%s@%s/%s", protocol, accessKey, secretKey, host, bucket) | |
| if len(prefix) > 0 { | |
| s3Uri += fmt.Sprintf("/%s", prefix) |
| if binlogServer.Storage.S3 != nil { | ||
| s3 := binlogServer.Storage.S3 | ||
| bucket, _ := s3.BucketAndPrefix() | ||
| envs = append(envs, | ||
| corev1.EnvVar{ | ||
| Name: "STORAGE_TYPE", | ||
| Value: "s3", | ||
| }, | ||
| corev1.EnvVar{ | ||
| Name: "AWS_ACCESS_KEY_ID", | ||
| ValueFrom: &corev1.EnvVarSource{ | ||
| SecretKeyRef: k8s.SecretKeySelector(s3.CredentialsSecret, secret.CredentialsAWSAccessKey), | ||
| }, | ||
| }, | ||
| corev1.EnvVar{ | ||
| Name: "AWS_SECRET_ACCESS_KEY", | ||
| ValueFrom: &corev1.EnvVarSource{ | ||
| SecretKeyRef: k8s.SecretKeySelector(s3.CredentialsSecret, secret.CredentialsAWSSecretKey), | ||
| }, | ||
| }, | ||
| corev1.EnvVar{ | ||
| Name: "AWS_DEFAULT_REGION", | ||
| Value: s3.Region, | ||
| }, | ||
| corev1.EnvVar{ | ||
| Name: "AWS_ENDPOINT", | ||
| Value: s3.EndpointURL, | ||
| }, | ||
| corev1.EnvVar{ | ||
| Name: "S3_BUCKET", | ||
| Value: bucket, | ||
| }, | ||
| ) | ||
| } |
There was a problem hiding this comment.
restoreContainer() doesn’t propagate storage.VerifyTLS into the PITR job environment, but cmd/pitr defaults to VERIFY_TLS=true. This prevents disabling TLS verification for S3-compatible endpoints with custom/self-signed certs (a capability that existing backup/restore jobs support via VERIFY_TLS). Add VERIFY_TLS env based on storage.VerifyTLS (similar to xtrabackup restoreContainer).
cmd/pitr/main.go
Outdated
| // e.g. "https://minio-service:9000/binlogs/binlog.000001" -> "binlogs/binlog.000001" | ||
| func objectKeyFromURI(uri string) (string, error) { | ||
| u, err := url.Parse(uri) | ||
| if err != nil { | ||
| return "", fmt.Errorf("parse URL: %w", err) | ||
| } | ||
| // Path starts with "/", trim the leading slash to get the object key | ||
| return strings.TrimPrefix(u.Path, "/"), nil |
There was a problem hiding this comment.
BinlogEntry.URI is parsed and the entire URL path (minus leading "/") is used as the S3 object key. If the URI is path-style (common for s3://host/bucket/key or https://endpoint/bucket/key), this returns "bucket/key" and will not match when the S3 client is already scoped to that bucket. Consider making objectKeyFromURI aware of the bucket (strip a leading "//" segment when present) and support both virtual-host and path-style URIs robustly.
| // e.g. "https://minio-service:9000/binlogs/binlog.000001" -> "binlogs/binlog.000001" | |
| func objectKeyFromURI(uri string) (string, error) { | |
| u, err := url.Parse(uri) | |
| if err != nil { | |
| return "", fmt.Errorf("parse URL: %w", err) | |
| } | |
| // Path starts with "/", trim the leading slash to get the object key | |
| return strings.TrimPrefix(u.Path, "/"), nil | |
| // It supports both virtual-hosted-style and path-style S3/MinIO URLs. | |
| // Examples: | |
| // "https://minio-service:9000/binlogs/binlog.000001" -> "binlogs/binlog.000001" | |
| // "https://bucket.s3.amazonaws.com/binlogs/binlog.000001" -> "binlogs/binlog.000001" | |
| func objectKeyFromURI(uri string) (string, error) { | |
| u, err := url.Parse(uri) | |
| if err != nil { | |
| return "", fmt.Errorf("parse URL: %w", err) | |
| } | |
| // Normalize path by trimming the leading slash. | |
| path := strings.TrimPrefix(u.Path, "/") | |
| if path == "" { | |
| return "", nil | |
| } | |
| // Try to infer the bucket from the hostname for virtual-hosted-style URLs, | |
| // e.g. "bucket.s3.amazonaws.com" -> bucket "bucket". | |
| host := u.Hostname() | |
| if host != "" { | |
| parts := strings.Split(host, ".") | |
| // Heuristic: if the first label is not "s3" and there is more than one label, | |
| // treat the first label as a bucket name. | |
| if len(parts) > 1 && parts[0] != "s3" { | |
| bucketFromHost := parts[0] | |
| prefix := bucketFromHost + "/" | |
| if strings.HasPrefix(path, prefix) { | |
| // Strip a redundant leading "<bucket>/" segment when present. | |
| path = strings.TrimPrefix(path, prefix) | |
| } | |
| } | |
| } | |
| return path, nil |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 37 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Spec: appsv1.StatefulSetSpec{ | ||
| Replicas: &spec.Size, | ||
| Replicas: ptr.To(int32(1)), | ||
| Selector: &metav1.LabelSelector{ |
There was a problem hiding this comment.
StatefulSet replicas are hard-coded to 1, which ignores the configured .spec.backup.pitr.binlogServer.size (PodSpec.Size). This prevents users from scaling the binlog server and is inconsistent with other components that honor PodSpec.Size; use the spec.Size value here (and ensure it’s non-zero/defaulted) rather than forcing 1.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 77 out of 77 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
we have conflicts with the e2e tests |
|
|
||
| if cr.Spec.PITR != nil { | ||
| if !cluster.Spec.Backup.PiTR.Enabled || cluster.Spec.Backup.PiTR.BinlogServer == nil { | ||
| if cluster.Spec.Backup.PiTR.BinlogServer == nil { |
There was a problem hiding this comment.
Shouldn't we also check that the pitr is enabled? We may have the server fully configured, but have enabled set to false 🤔
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 106 out of 106 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local ts=$(date +%Y-%m-%dT%H:%M:%S.%N%z --utc | sed 's/+0000/Z/g') | ||
| echo "${ts} 0 [Info] [K8SPS-642] [Job] $*" >&2 |
There was a problem hiding this comment.
[shfmt] reported by reviewdog 🐶
| local ts=$(date +%Y-%m-%dT%H:%M:%S.%N%z --utc | sed 's/+0000/Z/g') | |
| echo "${ts} 0 [Info] [K8SPS-642] [Job] $*" >&2 | |
| local ts=$(date +%Y-%m-%dT%H:%M:%S.%N%z --utc | sed 's/+0000/Z/g') | |
| echo "${ts} 0 [Info] [K8SPS-642] [Job] $*" >&2 |
| --admin-address=127.0.0.1 \ | ||
| --user=mysql \ | ||
| --gtid-mode=ON \ | ||
| --enforce-gtid-consistency=ON & |
There was a problem hiding this comment.
[shfmt] reported by reviewdog 🐶
| --admin-address=127.0.0.1 \ | |
| --user=mysql \ | |
| --gtid-mode=ON \ | |
| --enforce-gtid-consistency=ON & | |
| --admin-address=127.0.0.1 \ | |
| --user=mysql \ | |
| --gtid-mode=ON \ | |
| --enforce-gtid-consistency=ON & |
|
|
||
| log "waiting for mysqld to be ready" | ||
| until mysqladmin -u operator -p"$(</etc/mysql/mysql-users-secret/operator)" ping --silent 2>/dev/null; do | ||
| sleep 1; |
There was a problem hiding this comment.
[shfmt] reported by reviewdog 🐶
| sleep 1; | |
| sleep 1 |
| SLEEP_FOREVER_FILE=/var/lib/mysql/sleep-forever | ||
| log "sleeping forever... remove ${SLEEP_FOREVER_FILE} to terminate." | ||
| touch ${SLEEP_FOREVER_FILE} | ||
| while [[ -f ${SLEEP_FOREVER_FILE} ]]; do | ||
| sleep 10 | ||
| done | ||
| exit 0 |
There was a problem hiding this comment.
[shfmt] reported by reviewdog 🐶
| SLEEP_FOREVER_FILE=/var/lib/mysql/sleep-forever | |
| log "sleeping forever... remove ${SLEEP_FOREVER_FILE} to terminate." | |
| touch ${SLEEP_FOREVER_FILE} | |
| while [[ -f ${SLEEP_FOREVER_FILE} ]]; do | |
| sleep 10 | |
| done | |
| exit 0 | |
| SLEEP_FOREVER_FILE=/var/lib/mysql/sleep-forever | |
| log "sleeping forever... remove ${SLEEP_FOREVER_FILE} to terminate." | |
| touch ${SLEEP_FOREVER_FILE} | |
| while [[ -f ${SLEEP_FOREVER_FILE} ]]; do | |
| sleep 10 | |
| done | |
| exit 0 |
| entries: defaultEntries, | ||
| pitrType: "gtid", | ||
| pitrGTID: "uuid:1", | ||
| db: &fakeDB{getGTIDExecutedResult: "uuid:1-5"}, | ||
| applyErr: errors.New("fetch binlog binlogs/binlog.000001: download failed"), |
There was a problem hiding this comment.
[gofmt] reported by reviewdog 🐶
| entries: defaultEntries, | |
| pitrType: "gtid", | |
| pitrGTID: "uuid:1", | |
| db: &fakeDB{getGTIDExecutedResult: "uuid:1-5"}, | |
| applyErr: errors.New("fetch binlog binlogs/binlog.000001: download failed"), | |
| entries: defaultEntries, | |
| pitrType: "gtid", | |
| pitrGTID: "uuid:1", | |
| db: &fakeDB{getGTIDExecutedResult: "uuid:1-5"}, | |
| applyErr: errors.New("fetch binlog binlogs/binlog.000001: download failed"), |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 106 out of 106 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| operatorPass, err := getSecret(apiv1.UserOperator) | ||
| if err != nil { | ||
| return fmt.Errorf("get operator password: %w", err) | ||
| } |
There was a problem hiding this comment.
Why not just mount it as an env variable (from secret) in the restore job?
| Orchestrator StatefulAppStatus `json:"orchestrator,omitempty"` | ||
| HAProxy StatefulAppStatus `json:"haproxy,omitempty"` | ||
| Router StatefulAppStatus `json:"router,omitempty"` | ||
| BinlogServer StatefulAppStatus `json:"binlogServer,omitempty"` |
There was a problem hiding this comment.
This is fine, but for future, let's try to add these as conditions
| if bls.SSLMode == "" { | ||
| bls.SSLMode = "verify_identity" | ||
| } | ||
| if bls.VerifyChecksum == nil { | ||
| t := true | ||
| bls.VerifyChecksum = &t | ||
| } | ||
| if bls.RewriteFileSize == "" { | ||
| bls.RewriteFileSize = "128M" | ||
| } | ||
| if bls.CheckpointSize == "" { | ||
| bls.CheckpointSize = "16M" | ||
| } | ||
| if bls.CheckpointInterval == "" { | ||
| bls.CheckpointInterval = "30s" | ||
| } | ||
| if bls.ConnectTimeout == 0 { | ||
| bls.ConnectTimeout = 30 | ||
| } | ||
| if bls.ReadTimeout == 0 { | ||
| bls.ReadTimeout = 30 | ||
| } | ||
| if bls.WriteTimeout == 0 { | ||
| bls.WriteTimeout = 30 | ||
| } | ||
| if bls.IdleTime == 0 { | ||
| bls.IdleTime = 30 | ||
| } |
There was a problem hiding this comment.
Lets set defaults on CRD also
CHANGE DESCRIPTION
Problem:
Jira: https://perconadev.atlassian.net/browse/K8SPS-642
Based on this proposal: #1251
Cause:
Short explanation of the root cause of the issue if applicable.
Solution:
Short explanation of the solution we are providing with this PR.
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability