Best way to check for duplicates of one column grouped by another? #394
-
What's the best way to check, for example, that ID numbers are not duplicated within years. I suspect there is a better way than what I'm trying to do: library(tidyverse)
df <- tibble(year = rep(2001:2003, each =3),
ID = c("A1", "A2", "A3",
"A1", "A1", "A3",
"A1", "A2", "A3"))
df
#row 5 should fail test
library(pointblank)
#I think this worked in a previous version, but doesn't anymore?
create_agent(df) %>%
col_vals_lte(vars(n), 1,
preconditions = ~ . %>%
group_by(ID, year) %>%
count()) %>%
interrogate()
|
Beta Was this translation helpful? Give feedback.
Answered by
rich-iannone
Feb 9, 2022
Replies: 2 comments 4 replies
-
Hey Eric, try using the relatively new segmentation feature. Here's an example: library(pointblank)
library(tidyverse)
df <-
tibble(
year = rep(2001:2003, each = 3),
ID = c("A1", "A2", "A3",
"A1", "A1", "A3",
"A1", "A2", "A3"
)
)
agent <-
create_agent(
tbl = df,
actions = action_levels(warn_at = 1)
) %>%
rows_distinct(segments = vars(year)) %>%
interrogate()
agent Here's a screen capture of the reporting: |
Beta Was this translation helpful? Give feedback.
2 replies
-
Eric, I gotcha there too! With library(pointblank)
library(tidyverse)
df <-
tibble(
year = rep(2001:2003, each = 3),
ID = c("A1", "A2", "A3",
"A1", "A1", "A3",
"A1", "A2", "A3"
)
)
agent <-
create_agent(
tbl = df,
actions = action_levels(warn_at = 1)
) %>%
rows_distinct(columns = "ID", segments = vars(year)) %>%
interrogate()
agent This yields this report: |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
Aariq
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Eric, I gotcha there too! With
rows_distinct()
you can focus on a subset of columns:This yields this report: