Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minio as persistence fails during pre-aggregation #9206

Open
LeftoversTodayAppAdmin opened this issue Feb 9, 2025 · 3 comments
Open

Minio as persistence fails during pre-aggregation #9206

LeftoversTodayAppAdmin opened this issue Feb 9, 2025 · 3 comments
Assignees
Labels
enhancement New feature proposal help wanted Community contributions are welcome.

Comments

@LeftoversTodayAppAdmin
Copy link

LeftoversTodayAppAdmin commented Feb 9, 2025

Describe the bug
A clear and concise description of what the bug is.

When using Minio for storage for pre-aggregation, I am able to see Cube write files to the temp-uploads folder in the Minio bucket but then it fails with the following error from this line of code, and Cube generates lots of copies of the same file in temp-uploads

Line of code emitting the error:

format!("File {} can't be listed after upload. Either there's Cube Store cluster misconfiguration, or storage can't provide the required consistency.", remote_path),

Error: Error during upload of dev_pre_aggregations.order_main20240928_ookkyqrs_dp11nujr_1jqgh50-0.csv.gz create table: CREATE TABLE dev_pre_aggregations.order_main20240928_ookkyqrs_dp11nujr_1jqgh50 (order__customfieldsvendoridentifiervarchar(255),order__updatedat_daytimestamp,order__countint) WITH (build_range_end = '2024-09-28T23:59:59.999'): Internal: File temp-uploads/dev_pre_aggregations.order_main20240928_ookkyqrs_dp11nujr_1jqgh50-0.csv.gz can't be listed after upload. Either there's Cube Store cluster misconfiguration, or storage can't provide the required consistency.

Minio integration was added here: #3738
cc: @PieterVanZyl-Dev @paveltiunov

To Reproduce
Steps to reproduce the behavior:

  1. Use the following config in docker:
  cubestore_router:
    restart: always
    image: cubejs/cubestore:v1.2.3-non-avx
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_DATA_DIR=/cube/data
      - CUBESTORE_MINIO_SERVER_ENDPOINT=http://leftoverstoday-dev-01:9000
      - CUBESTORE_MINIO_BUCKET=cube
      - CUBESTORE_MINIO_REGION=''
      - CUBESTORE_MINIO_ACCESS_KEY_ID=minio
      - CUBESTORE_MINIO_SECRET_ACCESS_KEY=<KEY>
    volumes:
      - .cubestore:/cube/data
  1. Create a pre-aggregation
  2. Open the playground and run the query
  3. Cube will try to generate the pre-aggregation files
  4. The files are created successfully in the Minio bucket under the folder temp-uploads but when it immediately tries to read it again, it fails and creates another copy of the file, repeating indefinitely

Expected behavior
A clear and concise description of what you expected to happen.
The file is created and read successfully

Screenshots
If applicable, add screenshots to help explain your problem.

Image

Minimally reproducible Cube Schema
In case your bug report is data modelling related please put your minimally reproducible Cube Schema here.
You can use selects without tables in order to achieve that as follows.

cubes:
  - name: order
    sql_table: vendure.order
    data_source: default

    joins: []

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true

      - name: type
        sql: type
        type: string

      - name: code
        sql: code
        type: string

      - name: state
        sql: state
        type: string

      - name: couponcodes
        sql: "{CUBE}.`couponCodes`"
        type: string

      - name: shippingaddress
        sql: "{CUBE}.`shippingAddress`"
        type: string

      - name: billingaddress
        sql: "{CUBE}.`billingAddress`"
        type: string

      - name: currencycode
        sql: "{CUBE}.`currencyCode`"
        type: string

      - name: aggregateorderid
        sql: "{CUBE}.`aggregateOrderId`"
        type: string

      - name: customerid
        sql: "{CUBE}.`customerId`"
        type: string

      - name: taxzoneid
        sql: "{CUBE}.`taxZoneId`"
        type: string

      - name: customfieldstotalWeightLbs
        sql: "{CUBE}.`customFieldsTotalWeightLbs`"
        type: number

      - name: customfieldssavingsDollars
        sql: "{CUBE}.`customFieldsSavingsDollars`"
        type: number

      - name: customfieldsvendoridentifier
        sql: "{CUBE}.`customFieldsVendoridentifier`"
        type: string

      - name: customfieldssnapebt
        sql: "{CUBE}.`customFieldsSnapebt`"
        type: string

      - name: customfieldsdob
        sql: "{CUBE}.`customFieldsDob`"
        type: string

      - name: customfieldsphone
        sql: "{CUBE}.`customFieldsPhone`"
        type: string

      - name: createdat
        sql: "{CUBE}.`createdAt`"
        type: time

      - name: updatedat
        sql: "{CUBE}.`updatedAt`"
        type: time

      - name: orderplacedat
        sql: "{CUBE}.`orderPlacedAt`"
        type: time

    measures:
      - name: count
        type: count

      - name: subtotal
        sql: "{CUBE}.`subTotal`"
        type: sum
      
      - name: weight
        sql: "{CUBE}.`customfieldstotalWeightLbs`"
        type: sum
      
      - name: dollars
        sql: "{CUBE}.`customFieldsSavingsDollars`"
        type: sum

    pre_aggregations:
      # Pre-aggregation definitions go here.
      # Learn more in the documentation: https://cube.dev/docs/caching/pre-aggregations/getting-started
      - name: main
        measures:
          - order.count
          - order.weight
          - order.dollars
        dimensions:
          - order.customfieldsvendoridentifier
          - order.state
        refreshKey:
          every: 1 hour
          updateWindow: 3 day
          incremental: true
        partitionGranularity: day
        timeDimension: order.orderplacedat
        granularity: day

Version:
cubejs/cube:v1.2.3
cubejs/cubestore:v1.2.3-non-avx

Additional context
Add any other context about the problem here.

@LeftoversTodayAppAdmin
Copy link
Author

✅ - SOLVED THE ISSUE
You have to provide this env variable to use MINIO, without it, it fails to read the file that was just created:

  • CUBESTORE_MINIO_SUB_PATH=pre-aggregation

@igorlukanin - can you please update the documentation for Minio, update configurations page to mandate this env variable, and this page that currently shows the error I was seeing until I added the sub_path: https://cube.dev/docs/product/caching/running-in-production

@igorlukanin igorlukanin self-assigned this Feb 12, 2025
@igorlukanin
Copy link
Member

@LeftoversTodayAppAdmin Amazing discovery and report! As we discussed in Slack, if you'd be open to provide a PR that just sets a default value for that env variable, I believe it would be very helpful for MinIO users.

@igorlukanin igorlukanin added enhancement New feature proposal help wanted Community contributions are welcome. labels Feb 13, 2025
Copy link

If you are interested in working on this issue, please go ahead and provide PR for that.
We'd be happy to review it and merge it.
If this is the first time you are contributing a Pull Request to Cube, please check our contribution guidelines.
You can also post any questions while contributing in the #contributors channel in the Cube Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature proposal help wanted Community contributions are welcome.
Projects
None yet
Development

No branches or pull requests

2 participants