[Bug] `adapter.get_columns_in_relation` does not work cross database #639

jeremyyeo · 2024-03-21T02:53:41Z

Is this a new bug in dbt-redshift?

I believe this is a new bug in dbt-redshift
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Not too sure if this calls for it's own issue but pretty much an extension of #652 - adapter.get_columns_in_relation does not support introspecting a table that is in a different database.

Expected Behavior

Just like #652 - we need to make adapter.get_columns_in_relation work across databases.

Steps To Reproduce

Using an RA3 redshift cluster with 2 database (dev and sources_only) add 2 tables.

-- using a connection to database sources_only
create table sources_only.public.source_foo as select 1 id;

-- using a connection to database dev
create table dev.public.source_bar as select 1 id;

Connect your dbt project to database dev:

# ~/.dbt/profiles.yml
redshift:
  target: default
  outputs:
    default: 
        type: redshift
        host: ...
        port: 5439
        database: dev
        user: root
        password: ...
        schema: public
        ra3_node: true

Add 2 sources:

# models/sources.yml
version: 2
sources:
  - name: from_default_db
    schema: public
    tables:
      - name: source_bar
  - name: from_the_other_db
    database: sources_only
    schema: public
    tables:
      - name: source_foo

Add a model to test:

-- models/checker.sql
select * from {{ source('from_the_other_db', 'source_foo') }}
union all
select * from {{ source('from_default_db', 'source_bar') }}

$ dbt compile -s checker
02:42:54  Running with dbt=1.7.9
02:42:55  Registered adapter: redshift=1.7.4
02:42:57  Found 1 model, 2 sources, 0 exposures, 0 metrics, 606 macros, 0 groups, 0 semantic models
02:42:57  
02:43:03  Concurrency: 1 threads (target='rs')
02:43:03  
02:43:03  Compiled node 'checker' is:
select * from "sources_only"."public"."source_foo"
union all
select * from "dev"."public"."source_bar"

$ dbt show -s checker
02:44:36  Running with dbt=1.7.9
02:44:38  Registered adapter: redshift=1.7.4
02:44:40  Found 1 model, 2 sources, 0 exposures, 0 metrics, 606 macros, 0 groups, 0 semantic models
02:44:40  
02:44:46  Concurrency: 1 threads (target='rs')
02:44:46  
02:44:47  Previewing node 'checker':
| id |
| -- |
|  1 |
|  1 |

Test out get_columns_in_relation:

-- models/checker.sql
{% set c1 = adapter.get_columns_in_relation(source('from_the_other_db', 'source_foo')) %}
{% set c2 = adapter.get_columns_in_relation(source('from_default_db', 'source_bar')) %}
---------
{{ c1 }}
----------
{{ c2 }}

$ dbt compile -s checker
02:48:36  Running with dbt=1.7.9
02:48:37  Registered adapter: redshift=1.7.4
02:48:39  Found 1 model, 2 sources, 0 exposures, 0 metrics, 606 macros, 0 groups, 0 semantic models
02:48:39  
02:48:44  Concurrency: 1 threads (target='rs')
02:48:44  
02:48:48  Compiled node 'checker' is:


---------
[]
----------
[<Column id (integer)>]

^ We didn't manage to retrieve column id for the source that is in the other database (sources_only).

The reason for that is straightforward - if we look at the get_columns_in_relation implementation and try and run that query straight up in Redshift:

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11
- dbt-core: 1.7.9
- dbt-redshift: 1.7.4

Additional Context

This results in dbt-labs/dbt-codegen#167 - which further materializes itself in dbt Cloud IDE as users try and use the "generate model" function and find that it doesn't work as expected:

The text was updated successfully, but these errors were encountered:

jeremyyeo · 2024-03-21T04:24:36Z

Looks like svv_redshift_columns (https://docs.aws.amazon.com/redshift/latest/dg/cross-database-overview.html) might be what we want to query here

VolkerSchiewe · 2024-03-25T14:58:59Z

We are running into the same issue. We are trying to create the dbt output in an other database than the source data is located.
adapter.get_columns_in_relation is not returning any columns for tables located in a different database.

I tracked it down to this line where information_schema."columns" is used. This table only returns information about the current database and therefore returns an empty list of columns for tables located in another database.

Maybe something like this could replace information_schema."coloumns" :

SELECT ordinal_position,
          table_name,
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale
FROM SVV_ALL_COLUMNS  WHERE database_name = '<database>' AND schema_name = '<schema>' AND table_name = '<table>'

dataders · 2024-03-25T15:56:50Z

We are running into the same issue

@VolkerSchiewe are you also using RA3 nodes?

VolkerSchiewe · 2024-03-25T16:17:04Z

We are running into the same issue

@VolkerSchiewe are you also using RA3 nodes?

yes

dataders · 2024-03-25T17:21:40Z

@VolkerSchiewe @jeremyyeo I just opened dbt-labs/dbt-redshift#738, do you want to it's version of redshift__get_columns_in_relation()? I'm also going to ask the Redshift team for advice

github-actions · 2024-06-25T01:52:03Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

VolkerSchiewe · 2024-06-25T09:39:15Z

@dataders any update on that topic? Have you talked to the redshift team?

dataders · 2024-06-26T17:14:05Z

hey @VolkerSchiewe yeah we've been in discussion with them. Have you looked over #637? The consensus between their team and ours right now is that rather than solve one-off issues like this one, we instead holistically address cross-database functionality.

Are you on a team currently that would benefit from this feature? If so I can log this internally so we can track interest over time.

bteh · 2024-07-18T00:59:35Z

Hey @dataders , my team and I are also facing this same exact issue. We were planning to data shares to get our production tables into a lower environment, but it comes in as a separate database.
Whenever I look at dbt logs, I see that it queries information_schema."columns" which doesn't return those data-shared (separate database) tables.
However, when I query svv_redshift_columns, we see those production tables (datashared) in this table.

This would help us tremendously and is a huge blocker for us to get our lower environment setup.

dlassanske-daxko · 2024-07-18T13:10:15Z

@bteh the solution for Redshift that I came up with in the meantime is to create a macro like this:

{% macro get_redshift_columns(database, schema, table) %}
  {% set columns = [] %}
  {% if execute %}
    {% set columns = run_query("SHOW COLUMNS FROM TABLE "~database~"."~schema~"."~table) %}
  {% endif %}
  {% do return(columns) %}
{% endmacro %}

and then call it in your model.

gmatheou366 · 2024-07-30T15:26:26Z

It seems that this issue also affects codegen's generate_source command (https://github.com/dbt-labs/dbt-codegen/blob/0.12.1/macros/generate_source.sql)

…639) * update `dev-requirements.txt` to point to the feature branch on `dbt-core` * organized materializations directory * organized materializations directory * added materialized view stubs * added materialized view test case * accounted for ordered lists in test case * changie * updated record type to List[tuple] * updated file structure to new macros structure * reverting to old structure for non-MV relations * reverting to old structure for non-MV relations * reverting to old structure for non-MV relations * updating to move towards adap-400 * remove db api files as shift was considered out of scope * remove strategy files as shift was considered out of scope * point back to main branch as adap-2 has been merged * start to build out base tests and modify input format for some macros * add very rough draft of bigquery version of materialization * continued work on bigquery mat view materialization * eod update * break out MV-relevant macros in adapters.sql into separate files in macros/relations * changie * update `dev-requirements.txt` to point to the feature branch on `dbt-core` * organized materializations directory * organized materializations directory * added materialized view stubs * added materialized view test case * accounted for ordered lists in test case * updated record type to List[tuple] * changie * updated file structure to new macros structure * reverting to old structure for non-MV relations * reverting to old structure for non-MV relations * reverting to old structure for non-MV relations * updating to move towards adap-400 * remove db api files as shift was considered out of scope * remove strategy files as shift was considered out of scope * point back to main branch as adap-2 has been merged * start to build out base tests and modify input format for some macros * add very rough draft of bigquery version of materialization * continued work on bigquery mat view materialization * eod update * fix whitespacing from rebase * point back to dbt-core/main since MVs are merged in dbt-core * fix rebasing misses * revert unrelated test file migration * move materialized view macros into the new macro file structure * refactor after pair, still failing for dropping uncreated backup * pull create or replace view into BQ, added drop statements for replace functionality, fixed signatures on some macros, added new type to adapter class * revert dev-requirements.txt change * implement rename for table and view * updated renameable and replaceable relations * updated renameable and replaceable relations to use default factory * updated renameable and replaceable relations to use frozenset * revert dev-requirements.txt to point back to main --------- Co-authored-by: Matthew McKnight <[email protected]> Co-authored-by: Matthew McKnight <[email protected]>

…rc1 (#639) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: nicor88 <[email protected]>

jeremyyeo added type:bug Something isn't working as documented triage:product In Product's queue labels Mar 21, 2024

jeremyyeo mentioned this issue Mar 21, 2024

generate_base_model() doesn't work across databases on Redshift dbt-labs/dbt-codegen#167

Closed

5 tasks

dataders self-assigned this Mar 22, 2024

dataders mentioned this issue Mar 25, 2024

enable crossdb column metadata with ra3 nodes dbt-labs/dbt-redshift#738

Closed

4 tasks

dataders added triage:awaiting-response Awaiting a response from the reporter feature:ra3-node Issues related to Redshift's ra3 node feature and removed triage:product In Product's queue labels Mar 25, 2024

dataders mentioned this issue Jan 23, 2025

[Feature] Datasharing: cross-database support #637

Open

dataders mentioned this issue Apr 19, 2024

[Feature] use redshift_connector for metadata wherever possible dbt-labs/dbt-redshift#776

Closed

3 tasks

github-actions bot added the Stale Mark an issue or PR as stale, to be closed label Jun 25, 2024

github-actions bot added triage:product In Product's queue and removed triage:awaiting-response Awaiting a response from the reporter Stale Mark an issue or PR as stale, to be closed labels Jun 25, 2024

amychen1776 removed the triage:product In Product's queue label Jul 25, 2024

mikealfare self-assigned this Sep 6, 2024

mikealfare linked a pull request Sep 9, 2024 that will close this issue

Support metadata for cross database refs dbt-labs/dbt-redshift#905

Draft

4 tasks

mikealfare removed their assignment Dec 11, 2024

mikealfare added the pkg:dbt-redshift Issue affects dbt-redshift label Jan 15, 2025

mikealfare transferred this issue from dbt-labs/dbt-redshift Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] `adapter.get_columns_in_relation` does not work cross database #639

[Bug] `adapter.get_columns_in_relation` does not work cross database #639

jeremyyeo commented Mar 21, 2024 •

edited

Loading

jeremyyeo commented Mar 21, 2024

VolkerSchiewe commented Mar 25, 2024

dataders commented Mar 25, 2024

VolkerSchiewe commented Mar 25, 2024

dataders commented Mar 25, 2024

github-actions bot commented Jun 25, 2024

VolkerSchiewe commented Jun 25, 2024

dataders commented Jun 26, 2024 •

edited

Loading

bteh commented Jul 18, 2024 •

edited

Loading

dlassanske-daxko commented Jul 18, 2024 •

edited

Loading

gmatheou366 commented Jul 30, 2024

[Bug] adapter.get_columns_in_relation does not work cross database #639

[Bug] adapter.get_columns_in_relation does not work cross database #639

Comments

jeremyyeo commented Mar 21, 2024 • edited Loading

Is this a new bug in dbt-redshift?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

jeremyyeo commented Mar 21, 2024

VolkerSchiewe commented Mar 25, 2024

dataders commented Mar 25, 2024

VolkerSchiewe commented Mar 25, 2024

dataders commented Mar 25, 2024

github-actions bot commented Jun 25, 2024

VolkerSchiewe commented Jun 25, 2024

dataders commented Jun 26, 2024 • edited Loading

bteh commented Jul 18, 2024 • edited Loading

dlassanske-daxko commented Jul 18, 2024 • edited Loading

gmatheou366 commented Jul 30, 2024

[Bug] `adapter.get_columns_in_relation` does not work cross database #639

[Bug] `adapter.get_columns_in_relation` does not work cross database #639

jeremyyeo commented Mar 21, 2024 •

edited

Loading

dataders commented Jun 26, 2024 •

edited

Loading

bteh commented Jul 18, 2024 •

edited

Loading

dlassanske-daxko commented Jul 18, 2024 •

edited

Loading