Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QSFS: failed to update deployment #246

Open
mohamedamer453 opened this issue May 23, 2022 · 3 comments
Open

QSFS: failed to update deployment #246

mohamedamer453 opened this issue May 23, 2022 · 3 comments
Milestone

Comments

@mohamedamer453
Copy link
Contributor

I was testing the flow mentioned in TC242 to test the reliability and stability of qsfs.

Firstly i started with a (16+4+4) setup

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

minimal_shards = 16
expected_shards = 20

after deploying with this setup, i was able to ssh to the machine and write a 1gb file then i changed the setup as mentioned in the test case by removing 4 ZDBs.

new setup (16+4)

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20"]
}

after updating the deployment i was still able to ssh to the machine and access the old files and then i created a 300mb file and changed the setup again by removing another 4 ZDBs

new setup

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16"]
}

but when i tried to update the deployment this time i got the following error

╷
│ Error: failed to deploy deployments: error waiting deployment: workload 1 failed within deployment 5211 with error failed to update qsfs mount: failed to restart zstor process: non-zero exit code: 1; failed to revert deployments: error waiting deployment: workload 0 failed within deployment 5211 with error failed to update qsfs mount: failed to restart zstor process: non-zero exit code: 1; try again
│ 
│   with grid_deployment.qsfs,
│   on main.tf line 51, in resource "grid_deployment" "qsfs":
│   51: resource "grid_deployment" "qsfs" {
│ 
╵
  • main.tf
terraform {
  required_providers {
    grid = {
      source = "threefoldtech/grid"
    }
  }
}

provider "grid" {
}

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

resource "grid_network" "net1" {
    nodes = [7]
    ip_range = "10.1.0.0/16"
    name = "network"
    description = "newer network"
}

resource "grid_deployment" "d1" {
    node = 7
    dynamic "zdbs" {
        for_each = local.metas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "user"
        }
    }
    dynamic "zdbs" {
        for_each = local.datas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "seq"
        }
    }
}

resource "grid_deployment" "qsfs" {
  node = 7
  network_name = grid_network.net1.name
  ip_range = lookup(grid_network.net1.nodes_ip_range, 7, "")
  qsfs {
    name = "qsfs"
    description = "description6"
    cache = 10240 # 10 GB
    minimal_shards = 16
    expected_shards = 20
    redundant_groups = 0
    redundant_nodes = 0
    max_zdb_data_dir_size = 512 # 512 MB
    encryption_algorithm = "AES"
    encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
    compression_algorithm = "snappy"
    metadata {
      type = "zdb"
      prefix = "hamada"
      encryption_algorithm = "AES"
      encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode != "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
    groups {
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode == "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
  }
  vms {
    name = "vm"
    flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"
    cpu = 2
    memory = 1024
    entrypoint = "/sbin/zinit init"
    planetary = true
    env_vars = {
      SSH_KEY = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC533B35CELELtgg2d7Tsi5KelLxR0FYUlrcTmRRQuTNP9arP01JYD8iHKqh6naMbbzR8+M0gdPEeRK4oVqQtEcH1C47vLyRI/4DqahAE2nTW08wtJM5uiIvcQ9H2HMzZ3MXYWWlgyHMgW2QXQxzrRS0NXvsY+4wxe97MMZs9MDs+d+X15DfG6JffjMHydi+4tHB50WmHe5tFscBFxLbgDBUxNGiwi3BQc1nWIuYwMMV1GFwT3ndyLAp19KPkEa/dffiqLdzkgs2qpXtfBhTZ/lFeQRc60DHCMWExr9ySDbavIMuBFylf/ZQeJXm9dFXJN7bBTbflZIIuUMjmrI7cU5eSuZqAj5l+Yb1mLN8ljmKSIM3/tkKbzXNH5AUtRVKTn+aEPvJAEYtserAxAP5pjy6nmegn0UerEE3DWEV2kqDig3aPSNhi9WSCykvG2tz7DIr0UP6qEIWYMC/5OisnSGj8w8dAjyxS9B0Jlx7DEmqPDNBqp8UcwV75Cot8vtIac= root@mohamed-Inspiron-3576"
    }
    mounts {
        disk_name = "qsfs"
        mount_point = "/qsfs"
    }
  }
}
output "metrics" {
    value = grid_deployment.qsfs.qsfs[0].metrics_endpoint
}
output "ygg_ip" {
    value = grid_deployment.qsfs.vms[0].ygg_ip
}
@xmonader xmonader transferred this issue from threefoldtech/test_feedback May 24, 2022
@mohamedamer453
Copy link
Contributor Author

I encountered this issue again in another scenario as described in TC354 & TC355.

The initial setup was (16+4+4)

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

minimal_shards = 16
expected_shards = 20

and then i wrote some data small/mid size files and then updated the setup to (16+0+0)

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16"]
}

after killing 8 ZDBs the storage was working and i was able to access the files i created and then i re added 4 ZDBs but it failed and i initially got this error

╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for grid_deployment.qsfs to include new values learned so far during apply, provider
│ "registry.terraform.io/threefoldtech/grid" produced an invalid new value for
│ .qsfs[0].groups[0].backends[16].namespace: was cty.StringVal(""), but now cty.StringVal("451-5297-data17").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for grid_deployment.qsfs to include new values learned so far during apply, provider
│ "registry.terraform.io/threefoldtech/grid" produced an invalid new value for
│ .qsfs[0].groups[0].backends[17].namespace: was cty.StringVal(""), but now cty.StringVal("451-5297-data18").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for grid_deployment.qsfs to include new values learned so far during apply, provider
│ "registry.terraform.io/threefoldtech/grid" produced an invalid new value for
│ .qsfs[0].groups[0].backends[18].namespace: was cty.StringVal(""), but now cty.StringVal("451-5297-data19").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for grid_deployment.qsfs to include new values learned so far during apply, provider
│ "registry.terraform.io/threefoldtech/grid" produced an invalid new value for
│ .qsfs[0].groups[0].backends[19].namespace: was cty.StringVal(""), but now cty.StringVal("451-5297-data20").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

and then i re applied again and finally it produced this error

╷
│ Error: failed to deploy deployments: error waiting deployment: workload 0 failed within deployment 5299 with error failed to update qsfs mount: failed to restart zstor process: non-zero exit code: 1; failed to revert deployments: error waiting deployment: workload 0 failed within deployment 5299 with error failed to update qsfs mount: failed to restart zstor process: non-zero exit code: 1; try again
│ 
│   with grid_deployment.qsfs,
│   on main.tf line 52, in resource "grid_deployment" "qsfs":
│   52: resource "grid_deployment" "qsfs" {
│ 
╵

@mohamedamer453
Copy link
Contributor Author

Same error occurred with the setup in TC356.

  • Initial setup (16+4+4)
  • Wrote some small/med files after deployment.
  • Kill 8 ZDBs, everything is working fine and i can access the files i created.
  • Re add 8 ZDBs, failed and i got the same errors mentioned in the comment above.

@ad-astra-industries
Copy link

Is there any update for this issue?

@xmonader xmonader added this to the 1.6.0 milestone Nov 14, 2022
@ashraffouda ashraffouda removed this from the 1.6.0 milestone Nov 14, 2022
@mariobassem mariobassem added this to the future milestone Dec 22, 2022
@xmonader xmonader added the type_bug Something isn't working label Feb 26, 2023
@rawdaGastan rawdaGastan removed this from 3.15.x Sep 5, 2024
@rawdaGastan rawdaGastan removed the type_bug Something isn't working label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants