Skip to content

QSFS: failed to update deployment #246

@mohamedamer453

Description

@mohamedamer453

I was testing the flow mentioned in TC242 to test the reliability and stability of qsfs.

Firstly i started with a (16+4+4) setup

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

minimal_shards = 16
expected_shards = 20

after deploying with this setup, i was able to ssh to the machine and write a 1gb file then i changed the setup as mentioned in the test case by removing 4 ZDBs.

new setup (16+4)

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20"]
}

after updating the deployment i was still able to ssh to the machine and access the old files and then i created a 300mb file and changed the setup again by removing another 4 ZDBs

new setup

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16"]
}

but when i tried to update the deployment this time i got the following error

╷
│ Error: failed to deploy deployments: error waiting deployment: workload 1 failed within deployment 5211 with error failed to update qsfs mount: failed to restart zstor process: non-zero exit code: 1; failed to revert deployments: error waiting deployment: workload 0 failed within deployment 5211 with error failed to update qsfs mount: failed to restart zstor process: non-zero exit code: 1; try again
│ 
│   with grid_deployment.qsfs,
│   on main.tf line 51, in resource "grid_deployment" "qsfs":
│   51: resource "grid_deployment" "qsfs" {
│ 
╵
  • main.tf
terraform {
  required_providers {
    grid = {
      source = "threefoldtech/grid"
    }
  }
}

provider "grid" {
}

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

resource "grid_network" "net1" {
    nodes = [7]
    ip_range = "10.1.0.0/16"
    name = "network"
    description = "newer network"
}

resource "grid_deployment" "d1" {
    node = 7
    dynamic "zdbs" {
        for_each = local.metas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "user"
        }
    }
    dynamic "zdbs" {
        for_each = local.datas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "seq"
        }
    }
}

resource "grid_deployment" "qsfs" {
  node = 7
  network_name = grid_network.net1.name
  ip_range = lookup(grid_network.net1.nodes_ip_range, 7, "")
  qsfs {
    name = "qsfs"
    description = "description6"
    cache = 10240 # 10 GB
    minimal_shards = 16
    expected_shards = 20
    redundant_groups = 0
    redundant_nodes = 0
    max_zdb_data_dir_size = 512 # 512 MB
    encryption_algorithm = "AES"
    encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
    compression_algorithm = "snappy"
    metadata {
      type = "zdb"
      prefix = "hamada"
      encryption_algorithm = "AES"
      encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode != "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
    groups {
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode == "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
  }
  vms {
    name = "vm"
    flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"
    cpu = 2
    memory = 1024
    entrypoint = "/sbin/zinit init"
    planetary = true
    env_vars = {
      SSH_KEY = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC533B35CELELtgg2d7Tsi5KelLxR0FYUlrcTmRRQuTNP9arP01JYD8iHKqh6naMbbzR8+M0gdPEeRK4oVqQtEcH1C47vLyRI/4DqahAE2nTW08wtJM5uiIvcQ9H2HMzZ3MXYWWlgyHMgW2QXQxzrRS0NXvsY+4wxe97MMZs9MDs+d+X15DfG6JffjMHydi+4tHB50WmHe5tFscBFxLbgDBUxNGiwi3BQc1nWIuYwMMV1GFwT3ndyLAp19KPkEa/dffiqLdzkgs2qpXtfBhTZ/lFeQRc60DHCMWExr9ySDbavIMuBFylf/ZQeJXm9dFXJN7bBTbflZIIuUMjmrI7cU5eSuZqAj5l+Yb1mLN8ljmKSIM3/tkKbzXNH5AUtRVKTn+aEPvJAEYtserAxAP5pjy6nmegn0UerEE3DWEV2kqDig3aPSNhi9WSCykvG2tz7DIr0UP6qEIWYMC/5OisnSGj8w8dAjyxS9B0Jlx7DEmqPDNBqp8UcwV75Cot8vtIac= root@mohamed-Inspiron-3576"
    }
    mounts {
        disk_name = "qsfs"
        mount_point = "/qsfs"
    }
  }
}
output "metrics" {
    value = grid_deployment.qsfs.qsfs[0].metrics_endpoint
}
output "ygg_ip" {
    value = grid_deployment.qsfs.vms[0].ygg_ip
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions