Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature - Add identity resolution #30

Open
awoehrl opened this issue Dec 7, 2023 · 1 comment
Open

feature - Add identity resolution #30

awoehrl opened this issue Dec 7, 2023 · 1 comment

Comments

@awoehrl
Copy link
Contributor

awoehrl commented Dec 7, 2023

It would be great to have an identity resolution logic in the package. This would consist of multiple steps:

  1. Have a stitching table that combines the user identifiers
    Depending on the use case this could be a table defined in dbt or in a more complex case a python implementation or a graph database.

Example code

--- user_identity_mapping.sql

select distinct
  anonymous_customer_id,
  last_value(customer_id) over(
    partition by anonymous_customer_id
    order by event_tstamp
    rows between unbounded preceding and unbounded following
  ) as customer_id,
  max(event_tstamp) over (partition by anonymous_customer_id) as end_tstamp

from events

where customer_id is not null
and anonymous_customer_id is not null
  1. Update the activities via a post hook
    With that table we could implement a post hook that updates the customer_id as well as the activity_occurence and activity_repeated_at fields.

Example code for the customer_id update

      update activity as a
      set a.customer_id = ui.customer_id
      from user_identity_mapping as ui
      where a.anonymous_customer_id = ui.anonymous_customer_id;

Alternatives to consider:

  • Is it really neccessary to have two id columns in activities?
@bcodell
Copy link
Owner

bcodell commented Jan 24, 2024

Possible implementations:

  • explicit/manual recursion (see rudderlabs/dbt-id-stitching)
  • recursive query (see dbt-graph-theory)
  • python dataproc/snowpark
  • python via duckdb process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants