`summarize_wqdata()` function throws with some datasets #163

aylapear · 2022-01-25T02:57:58Z

Two examples using the same EMS_ID/Station but different parameters/variables and in one case the function summarize_wqdata() works and provides a summary table while in the other case the table throws an error

Example where it works properly

data_works <- tibble::tibble(
            EMS_ID = c("0200016", "0200016", "0200016"),
           Station = c("ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93"),
          Variable = c("Nitrogen Total", "Nitrogen Total", "Nitrogen Total"),
              Code = c("0114", "0114", "0114"),
             Value = c(0.844, 0.949, 0.754),
             Units = c("mg/L", "mg/L", "mg/L"),
    DetectionLimit = c(0.03, 0.03, 0.03),
      ResultLetter = c(NA, NA, NA),
              Date = c("2021-11-07", "2021-11-21", "2021-12-05"),
           Outlier = c(FALSE, FALSE, FALSE),
      Site_Renamed = c("ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93"),
       UPPER_DEPTH = as.factor(c(NA, NA, NA)),
          Detected = as.factor(c("TRUE", "TRUE", "TRUE")),
         Timeframe = as.factor(c("2021", "2021", "2021"))
)


wqbc::summarise_wqdata(
  data_works,
  by = c("EMS_ID"),
  censored = TRUE,
  na.rm = TRUE
)

Output

# A tibble: 1 × 14
  Variable       EMS_ID      n  ncen   min   max  mean median lowerQ upperQ     sd     se lowerCL upperCL
  <chr>          <chr>   <int> <int> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
1 Nitrogen Total 0200016     3     0 0.754 0.949 0.849  0.845  0.793  0.901 0.0799 0.0461   0.763   0.944

Example where it fails

data_fails <- tibble::tibble(
            EMS_ID = c("0200016", "0200016", "0200016", "0200016", "0200016"),
           Station = c("ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93"),
          Variable = c("Aluminum Total",
                       "Aluminum Total","Aluminum Total","Aluminum Total","Aluminum Total"),
              Code = c("AL-T", "AL-T", "AL-T", "AL-T", "AL-T"),
             Value = c(0.031, 0.0192, 0.397, 0.0183, 0.1),
             Units = c("mg/L", "mg/L", "mg/L", "mg/L", "mg/L"),
    DetectionLimit = c(0.5, 0.5, 0.5, 0.5, 0.5),
      ResultLetter = c(NA, NA, NA, NA, NA),
              Date = c("2020-01-05","2020-01-27",
                       "2020-02-02","2020-02-17","2020-03-01"),
           Outlier = c(FALSE, FALSE, FALSE, FALSE, FALSE),
      Site_Renamed = c("ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93","ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93",
                       "ELK RIVER ABOVE HIGHWAY 93"),
       UPPER_DEPTH = as.factor(c(NA, NA, NA, NA, NA)),
          Detected = as.factor(c("FALSE", "FALSE", "FALSE", "FALSE", "FALSE")),
         Timeframe = as.factor(c("2020", "2020", "2020", "2020", "2020"))
)

wqbc::summarise_wqdata(
  data_fails,
  by = c("EMS_ID"),
  censored = TRUE,
  na.rm = TRUE
)

Output

Error in names(ret) <- c("mean", "se", LCL(x), UCL(x)) : 
  'names' attribute [4] must be the same length as the vector [0]
In addition: Warning message:
In survreg.fit(X, Y, weights, offset, init = init, controlvals = control,  :
  Ran out of iterations and did not converge
Backtrace:
     ▆
  1. └─wqbc::summarise_wqdata(...)
  2.   └─plyr::ddply(...)
  3.     └─plyr::ldply(...)
  4.       └─plyr::llply(...)
  5.         ├─plyr:::loop_apply(n, do.ply)
  6.         └─plyr `<fn>`(1L)
  7.           └─wqbc .fun(piece, ...)
  8.             ├─base::mean(ml)
  9.             └─NADA::mean(ml)
 10.               └─NADA .local(x, ...)

The text was updated successfully, but these errors were encountered:

aylapear · 2022-01-25T02:58:23Z

@joethorley

joethorley · 2022-01-25T04:58:11Z

thanks @aylapear - I'll look into

HeatherGranger · 2022-11-28T22:24:25Z

@joethorley do you remember if this is still ongoing? if so, perhaps something to examine what the dependencies are and what's worth updating in it's current format.

aylapear · 2022-12-18T23:48:35Z

The cause of this is error when all the data points are censored, specifically when the Censored = TRUE for every row.

# real data output
EMS_ID                    Station       Variable Code Value Units DetectionLimit ResultLetter       Date
1 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T   0.5  mg/L            0.5           NA 2020-01-05
2 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T   0.5  mg/L            0.5           NA 2020-01-27
3 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T   0.5  mg/L            0.5           NA 2020-02-02
4 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T   0.5  mg/L            0.5           NA 2020-02-17
5 0200016 ELK RIVER ABOVE HIGHWAY 93 Aluminum Total AL-T   0.5  mg/L            0.5           NA 2020-03-01
  Outlier               Site_Renamed UPPER_DEPTH Detected Timeframe Censored
1   FALSE ELK RIVER ABOVE HIGHWAY 93        <NA>    FALSE      2020     TRUE
2   FALSE ELK RIVER ABOVE HIGHWAY 93        <NA>    FALSE      2020     TRUE
3   FALSE ELK RIVER ABOVE HIGHWAY 93        <NA>    FALSE      2020     TRUE
4   FALSE ELK RIVER ABOVE HIGHWAY 93        <NA>    FALSE      2020     TRUE
5   FALSE ELK RIVER ABOVE HIGHWAY 93        <NA>    FALSE      2020     TRUE

Example to show which code is throwing error

# this throws error
df <- tibble(
  Value = c(0.844, 0.949, 0.754),
  Censored = c(TRUE, TRUE, TRUE)
)

ml <- with(
  df,
  cenmle(
    Value,
    Censored,
    dist = "lognormal",
    conf.int = 0.95
  )
)
ml

est <- mean(ml)

# As long as one censored value is false it will run
df <- tibble(
  Value = c(0.844, 0.949, 0.754),
  Censored = c(FALSE, TRUE, TRUE)
)

ml <- with(
  df,
  cenmle(
    Value,
    Censored,
    dist = "lognormal",
    conf.int = 0.95
  )
)
ml

est <- mean(ml)

aylapear · 2022-12-18T23:49:48Z

This error is caused by the internal function summarise_wqdata_by() in the summaries-wqdata.R file

aylapear · 2022-12-19T00:26:53Z

It also fails on a single value

> df <- tibble(
+   Value = c(0.011),
+   Censored = c(TRUE)
+ )
> ml <- with(
+   df,
+   cenmle(
+     Value,
+     Censored,
+     dist = "lognormal",
+     conf.int = 0.95
+   )
+ )
> ml
Error in exp(x@survreg$coef[1]) : 
  non-numeric argument to mathematical function

# fails
df <- tibble(
  Value = c(0.011),
  Censored = c(FALSE)
)

# passes when two values even if one is censored 
df <- tibble(
  Value = c(0.011, 0.0001),
  Censored = c(FALSE, TRUE)
)

aylapear · 2022-12-19T17:56:23Z

In these edge cases when either all values are censored or only a single value is given the function should return NA' s instead of an error.

aylapear · 2022-12-19T18:11:47Z

This would then generate this table when the site that has no data display NA's instead of throwing an error.

# A tibble: 2 × 14
  Variable       EMS_ID      n  ncen   min   max  mean median lowerQ upperQ     sd     se lowerCL upperCL
  <chr>          <chr>   <int> <int> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
1 Nitrogen Total 0200016     3     0 0.754 0.949 0.849  0.845  0.793  0.901 0.0799 0.0461   0.763   0.944
1 Nitrogen Total 0478416     NA          NA      NA      NA       NA       NA      NA        NA       NA        NA        NA

joethorley · 2022-12-19T18:39:23Z

I agree - and when no data it should return a table with the same columns and classes and no rows.

aylapear mentioned this issue Jan 25, 2022

Summary table bug on Plot/Statistics tab bcgov/shinyrems#139

Open

aylapear mentioned this issue Dec 31, 2022

Summarize bug #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`summarize_wqdata()` function throws with some datasets #163

`summarize_wqdata()` function throws with some datasets #163

aylapear commented Jan 25, 2022

aylapear commented Jan 25, 2022

joethorley commented Jan 25, 2022

HeatherGranger commented Nov 28, 2022

aylapear commented Dec 18, 2022

aylapear commented Dec 18, 2022

aylapear commented Dec 19, 2022

aylapear commented Dec 19, 2022

aylapear commented Dec 19, 2022

joethorley commented Dec 19, 2022

summarize_wqdata() function throws with some datasets #163

summarize_wqdata() function throws with some datasets #163

Comments

aylapear commented Jan 25, 2022

aylapear commented Jan 25, 2022

joethorley commented Jan 25, 2022

HeatherGranger commented Nov 28, 2022

aylapear commented Dec 18, 2022

aylapear commented Dec 18, 2022

aylapear commented Dec 19, 2022

aylapear commented Dec 19, 2022

aylapear commented Dec 19, 2022

joethorley commented Dec 19, 2022

`summarize_wqdata()` function throws with some datasets #163

`summarize_wqdata()` function throws with some datasets #163