Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Snapshots defined in yaml fail if yaml file exists in target/run directory #11321

Open
2 tasks done
joshuanits opened this issue Feb 19, 2025 · 1 comment · May be fixed by #11323
Open
2 tasks done

[Bug] Snapshots defined in yaml fail if yaml file exists in target/run directory #11321

joshuanits opened this issue Feb 19, 2025 · 1 comment · May be fixed by #11323
Labels
bug Something isn't working triage

Comments

@joshuanits
Copy link

joshuanits commented Feb 19, 2025

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Snapshots defined in a yaml file such as dbt_project/models/schema.yml fail to build/run if a file dbt_project/target/run/dbt_project/models/schema.yml exists:

Unhandled error while executing target/run/dbt_project/models/schema.yml/schema.yml/snapshot.sql
[Errno 20] Not a directory: 'dbt_project/target/run/dbt_project/models/schema.yml/schema.yml'

I'm not sure exactly what causes the schema.yml to end up in the target directory, but it has happened multiple times.

Expected Behavior

  • Replace . with _ in folder names, i.e. target/run/models/schema_yml/... - ideal because it's clearer than having folders with file extensions or
  • Check that item in target/run path is folder and handle gracefully (i.e. remove)

Steps To Reproduce

# dbt_project.yml
name: 'dbt_project'

profile: 'dbt_project'

model-paths: ["models"]
snapshot-paths: ["snapshots"]
# models/schema.yml
snapshots:
  - name: 'snapshot'
    relation: ref('model')
    config:
      strategy: 'check'
      unique_key: 'col'
      check_cols: 'all'
-- models/model.sql
SELECT 1 as col
dbt build --select model
touch target/run/dbt_project/models/schema.yml
dbt build --select snapshot

Relevant log output

$ dbt build --select snapshot
07:15:25  Running with dbt=1.9.2
07:15:25  Registered adapter: duckdb=1.9.2
07:15:25  Found 1 model, 1 snapshot, 426 macros
07:15:25  
07:15:25  Concurrency: 1 threads (target='dev')
07:15:25  
07:15:26  1 of 1 START snapshot main.snapshot ............................................ [RUN]
07:15:26  Unhandled error while executing target/run/dbt_project/models/schema.yml/schema.yml/snapshot.sql
[Errno 20] Not a directory: '~/dbt_project/target/run/dbt_project/models/schema.yml/schema.yml'
07:15:26  1 of 1 ERROR snapshotting main.snapshot ........................................ [ERROR in 0.25s]
07:15:26  
07:15:26  Finished running 1 snapshot in 0 hours 0 minutes and 0.43 seconds (0.43s).
07:15:26  
07:15:26  Completed with 1 error, 0 partial successes, and 0 warnings:
07:15:26  
07:15:26    [Errno 20] Not a directory: '~/dbt_project/target/run/dbt_project/models/schema.yml/schema.yml'
07:15:26  
07:15:26  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Environment

- OS: Ubuntu 22.04.5
- Python: 3.12.7
- dbt: 1.9.2
- dbt-duckdb: 1.9.2

Which database adapter are you using with dbt?

Reproduced with duckdb and snowflake

Additional Context

If not already known - I'll look at fixing myself and making PR

@joshuanits joshuanits added bug Something isn't working triage labels Feb 19, 2025
@joshuanits
Copy link
Author

For these yml models ParsedNode.get_target_write_path() is running with self.original_file_path = "models/schema.yml" and self.path = "schema.yml/snapshot.sql", which are joined to get the path models/schema.yml/schema.yml/snapshot.sql.

def get_target_write_path(
self, target_path: str, subdirectory: str, split_suffix: Optional[str] = None
):
# This is called for both the "compiled" subdirectory of "target" and the "run" subdirectory
if os.path.basename(self.path) == os.path.basename(self.original_file_path):
# One-to-one relationship of nodes to files.
path = self.original_file_path
else:
# Many-to-one relationship of nodes to files.
path = os.path.join(self.original_file_path, self.path)
if split_suffix:
pathlib_path = Path(path)
path = str(
pathlib_path.parent
/ pathlib_path.stem
/ (pathlib_path.stem + f"_{split_suffix}" + pathlib_path.suffix)
)
target_write_path = os.path.join(target_path, subdirectory, self.package_name, path)
return target_write_path

This seems pretty inelegant, a path such as models/schema_yml/snapshot.sql would avoid having directories that look like and might collide with files and remove the extra nesting.

Adding this elif does this - although it probably needs to be more robust.

if os.path.basename(self.path) == os.path.basename(self.original_file_path):
    # One-to-one relationship of nodes to files.
    path = self.original_file_path
elif os.path.dirname(self.path) == os.path.basename(self.original_file_path):
    parent_dirname = os.path.dirname(self.original_file_path)
    dirname = os.path.dirname(self.path).replace(".", "_")
    basename = os.path.basename(self.path)
    path = os.path.join(parent_dirname, dirname, basename)
else:
    #  Many-to-one relationship of nodes to files.
    path = os.path.join(self.original_file_path, self.path)

The resultant structure is much easier to understand:

.
├── dbt_project.yml
├── models
│   ├── model.sql
│   └── schema.yml
└── target
    └── run
        └── dbt_project
            └── models
                ├── model.sql
                └── schema_yml
                    └── snapshot.sql

@joshuanits joshuanits linked a pull request Feb 19, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant