Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bifactor analysis #73

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft

Bifactor analysis #73

wants to merge 17 commits into from

Conversation

Gootjes
Copy link
Contributor

@Gootjes Gootjes commented Jun 13, 2023

This PR is a Proof of Concept. I would like to receive feedback on how to better integrate this with the tidySEM framework.

EDIT: finished

@cjvanlissa
Copy link
Owner

What a nice idea! I would make one top-down suggestion at this point, before getting into the details of the code, and that is to try to follow the tidySEM design philosophy as closely as possible. So the bifactor() function should work similarly to the measurement() function, and there really shouldn't be a run_bifactor() function, especially not one that only works with lavaan.

Smaller notes:

  1. Add an argument that sets the default factor name "G"
  2. Currently does not support cross loadings. Why? Should be possible with add_paths() no?
  3. "G is not an allowed scale name": See point 1. Instead, give an error that there is ALREADY a variable named G. Or change the name to "G.1" and continue as usual.
  4. Make the compute_omega function generic, so it works with arbitrary sem models and not just with lavaan. The way to do this is to have a function compute_omega.default() that accepts only exported coefficients, and to have functions like compute_omega.lavaan() grab those coefficients from lavaan output and then call the default. Also: rename to omega()?

@Gootjes
Copy link
Contributor Author

Gootjes commented Jun 19, 2023

I have removed run_bifactor()
And created a omega.lavaan and omega.default. The latter requires two arguments that are completely different classes (parameter table of class data.frame and covariance matrix of class matrix) than those of omega.lavaan (lavaan fit object). Is this the way to go? Or should the .default implementation call the data.frame implementation?

  1. Added the default factor name argument. A custom factor name needs to be passed to both bifactor and omega().
  2. I implemented cross loadings and made sure they align with the implementation of psych.

The syntax to add a cross-loading is a mistake-prone for an end-user because the defaults of add_path are not what is needed, so it needs custom defaults.
The current situation looks like the code below. As a solution, should I give the result of bifactor an custom class tidy_sem_bifactor such that I can implement a custom add_paths ?

bf %>% add_paths("a =~ c_2", "", std.lv = T,
                 auto.fix.single = T,
                 auto.fix.first = F,
                 orthogonal = T,
                 int.ov.free = F,
                 int.lv.free = F,
                 meanstructure = F) %>% run_lavaan() -> bf_fit2
  1. Fixed because of 1, but the error message wording can be debated.
  2. haven't worked on implementations outside of lavaan yet.

@cjvanlissa
Copy link
Owner

The syntax to add a cross-loading is a mistake-prone for an end-user because the defaults of add_path are not what is needed, so it needs custom defaults.
The current situation looks like the code below. As a solution, should I give the result of bifactor an custom class tidy_sem_bifactor such that I can implement a custom add_paths ?

bf %>% add_paths("a =~ c_2", "", std.lv = T,
auto.fix.single = T,
auto.fix.first = F,
orthogonal = T,
int.ov.free = F,
int.lv.free = F,
meanstructure = F) %>% run_lavaan() -> bf_fit2

Not sure what the problem is... are you trying to prevent model non-identification by fixing variances of indicators that load on too many factors?

@Gootjes
Copy link
Contributor Author

Gootjes commented Jun 22, 2023

# An issue occurs because I am not specifying a mean structure.
# The tidysem as_ram does not like that.
# So we specify a meanstructure = TRUE, but set int.ov.free and int.lv.free to FALSE
m <- measurement(xG,
std.lv = T,
auto.fix.single = T,
auto.fix.first = F,
orthogonal = T,
int.ov.free = F,
int.lv.free = F,
meanstructure = T,
...
)

The issue is that without any parameters to add_paths, it makes the cross-loading fixed at value 1, and estimates an intercept. I don't want either: latent variables have fixed variance 1, and no intercepts should be estimated (fixed to 0).
But what happens is that the implementation of add_paths adds them again to the model for the variable mentioned in the arguments.

tidy_sem(c("a_1", "a_2", "a_3", "b_1", "b_2", "b_3", "c_1", "c_2", "c_3", "c_4")) %>% 
   bifactor() %>%
  # This adds a cross-loading
   add_paths("a =~ c_4") %>% 
   as_lavaan() %>% 
   filter(lhs == "c_4" | rhs == "c_4")
#   lhs op rhs block group free label ustart plabel
# 1   G =~ c_4     1     1    1           NA  .p10.
# 2   c =~ c_4     1     1    1           NA  .p20.
# 3 c_4 ~~ c_4     1     1    1           NA  .p30.
# 4 c_4 ~1         1     1    0            0  .p50.
# 5   a =~ c_4     1     1    0            1  .p55.   

The options to add_paths that solve this are int.ov.free = F, auto.fix.first = F. But I find it kind of a hassle that respecifying this is necessary. Maybe add_paths should reuse the options that were specified earlier in the call to measurement?
An alternative solution is to make bifactor add a class to the tidy_sem object, e.g. tidy_sem_bifactor, and then:

add_paths.tidy_sem_bifactor <- function() {
  cl <- match.call()

  cl["int.ov.free"] <- FALSE
  cl["auto.fix.first"] <- FALSE

  # Copy pasted from add_paths.tidy_sem
  cl["model"] <- list(model$syntax)
  # I have to comment this line though otherwise R complains 
  # about a missing implementation for group_var for the tidy_sem_bifactor class.
  # I don't need it anyways.
  # if(!is.null(group_var(model))) cl[["ngroups"]] <- group_var(model, "ngroups")
  cl[[1L]] <- quote(add_paths)
  #Args <- c(list(model = model$syntax), as.list(match.call()[-c(1:2)]))
  model$syntax <- eval.parent(cl)
  return(model)
}

@Gootjes
Copy link
Contributor Author

Gootjes commented Jun 22, 2023

Maybe I am using measurement not what it was intended for, but I expect this issue to crop up again when someone uses add_paths.

Should I maybe change bifactor to include arguments for cross-loadings? Then calling add_paths of a bifactor tidy_sem is not necessary anymore: everything is set from the get go. But then I need to copy quite some code from what is already there for CFA.

Semi-related point: I can tweak tidy_sem$syntax to include an equality constraint by specifying .p1. == .p2., but this does not translate when using as_ram or as_mplus. Is there already something in place to do equality constraints in cfa models?

@cjvanlissa
Copy link
Owner

As per standard lavaan syntax, to specify equality constraints you could use:

F=~a*x1
F=~a*x2

I need to plan in some time to look into the issue you're experiencing with add_paths, can't speak to that now. The risk of implicit defaults (as in your add_paths.tidy_sem_bifactor solution) is that, as soon as someone has a different use case than you had in mind, they break down (ironically, this might be happening to you; I have to investigate to see). I agree that manually specifying these arguments to add_paths is awkward, but at least it's explicit!

@cjvanlissa
Copy link
Owner

PS The .p1. labels are, to my understanding, intended as lavaan internals.

@Gootjes
Copy link
Contributor Author

Gootjes commented Jun 22, 2023

As per standard lavaan syntax, to specify equality constraints you could use:

F=~a*x1
F=~a*x2

I need to plan in some time to look into the issue you're experiencing with add_paths, can't speak to that now. The risk of implicit defaults (as in your add_paths.tidy_sem_bifactor solution) is that, as soon as someone has a different use case than you had in mind, they break down (ironically, this might be happening to you; I have to investigate to see). I agree that manually specifying these arguments to add_paths is awkward, but at least it's explicit!

Haha yes it is happening to me. Yea so reimplementing some of the logic in measurement to make it apply to the bifactor situation might be the best approach then. Accidental misuse is far less likely, and bifactor models are so well defined that customisation options for the end user are not a priority.

Regarding equality constraints. Thanks for the quick reply, I was trying to do it all manually by adding rows to the syntax data.frame. I understand now that I can do:

add_paths(tidy_sem(x), "F=~a*x_1; F=~a*x_2; F=~a*x_3")

And that runs with mx and lavaan. Nice! That helps. I was trying to do add_paths on a tidy_sem with existing syntax (from measurement), but that doesn't produce valid syntax. So another reason to rewrite bifactor to do it all my way from the get go.

There is no need right now for you to invest time in reading this all. I will redesign the bifactor a little based on code you wrote for measurement and then rely heavily on add_paths to do the actual magic.

Tests already work, I can reproduce results from the psych::omegaFromSem function based on psych datasets. (Turns out that function has a slightly odd way of doing things regarding cross loadings and also mixes unstandardized loadings with correlation matrices to compute variances...).

So after this is done and I factor out tidyverse usage, I will press the request review button :)

@cjvanlissa
Copy link
Owner

That sounds good, thank you for being proactive and understanding! Don't forget to add yourself as "ctb" to the authors@R field in the DESCRIPTION file too.

@Gootjes Gootjes marked this pull request as ready for review June 23, 2023 09:00
@cjvanlissa
Copy link
Owner

I see you're making a commit here as well.. would you like me to incorporate this into 0.2.7 as well? If so: it still needs tests for the new functions, and I see that it's failing (for some reason) the integration tests.

Would you please make sure that these functions align as closely as possible to the interface of functions like add_paths() and measurement()?

@Gootjes
Copy link
Contributor Author

Gootjes commented Jan 17, 2024

The implementation is not entirely complete because I ran into this unsolved issue #74
Let's not merge this yet.

The mentioned issue is preventing me from imposing equality constraints in the case when a latent variable has only two indicators.

@cjvanlissa
Copy link
Owner

I can't get into this until after end March due to time constraints!

@Gootjes
Copy link
Contributor Author

Gootjes commented Jan 17, 2024

No problem!

@Gootjes Gootjes marked this pull request as draft March 12, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants