diff --git a/README.md b/README.md index 5cb9947..8802d53 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Giving reconstruction algorithms a warm start can speed up reconstruction time b We employ three different iterative algortihms. ### 1) EM-preconditioner, DOwG step size rule, SAGA gradient estimation (in branch: main) -**Update rule**: SGD-like for first two epochs, then SAGA-like afterwards with full-gradients computed as 2nd, 6th, 10th and 14th epochs. +**Update rule**: SGD-like for the first epochs, then SAGA-like afterwards with full-gradients computed as 2nd, 6th, 10th and 14th epochs. Here, we do not use random subsets, but rather accessing the subsets according to a Herman Meyer order. **Step-size rule**: All iterations use [DoWG](https://arxiv.org/abs/2305.16284) (Distance over Weighted Gradients) for the step size calculation. @@ -25,7 +25,7 @@ We employ three different iterative algortihms. **Step-size rule**: All iterations use [DoWG](https://arxiv.org/abs/2305.16284) (Distance over Weighted Gradients) for the step size calculation. -**Preconditioner**: EM-preconditioner the same as used in the BSREM example. +**Preconditioner**: For the precondtioner, the ratio between the norm of RDP gradient vs. the norm of the full objective gradient is used to gauge the dominance of the RDP component of the objective function. If this fraction is larger than 0.5 (i.e. RDP is dominating) a preconditioner utilising an approximation of the Hessian of RDP is used. The preconditioner used is similar to [Tsai et al. (2018)](https://pubmed.ncbi.nlm.nih.gov/29610077/), however, the Hessian row sum of the likelihood term is not updated each iteration. Additionally, we found that the Hessian row sum of the RDP prior was instable, so instead we only used the diagonal approximation of the Hessian evaluated at each iterate. This defines a kind of strange preconditioner, where only the RDP component of the preconditioner is updated per iteration. For the case when the fraction was lower than 0.5, the EM-preconditioner was used. This was observed to provided better performance when the likelihood component is more dominant, also this avoid the costly computation of the diagonal of the RDP Hessian. ### 3) Adaptive preconditioner, full gradient descent, Barzilai-Borwein step size rule (in branch: full_gd) @@ -34,6 +34,8 @@ The characteristics of the datasets varied a lot, i.e., we had low count data, d For the precondtioner, the ratio between the norm of RDP gradient vs. the norm of the full objective gradient is used to gauge the dominance of the RDP component of the objective function. If this fraction is larger than 0.5 (i.e. RDP is dominating) a preconditioner utilising an approximation of the Hessian of RDP is used. The preconditioner used is similar to [Tsai et al. (2018)](https://pubmed.ncbi.nlm.nih.gov/29610077/), however, the Hessian row sum of the likelihood term is not updated each iteration. Additionally, we found that the Hessian row sum of the RDP prior was instable, so instead we only used the diagonal approximation of the Hessian evaluated at each iterate. This defines a kind of strange preconditioner, where only the RDP component of the preconditioner is updated per iteration. For the case when the fraction was lower than 0.5, the EM-preconditioner was used. This was observed to provided better performance when the likelihood component is more dominant, also this avoid the costly computation of the diagonal of the RDP Hessian. +Note, for TOF data, we do not use the full gradient, but instead a small number of subsets. + **Update rule**: GD for all iterations. **Step-size rule**: Barzilai-Borwein long step size rule. @@ -41,7 +43,7 @@ For the precondtioner, the ratio between the norm of RDP gradient vs. the norm o **Preconditioner**: Diagonal RDP Hessian + Row-sum of likelihood hessian, or EM-precondition based on a dominance of RDP component of the objective function. ### Number of subset choice -To compute the number of subsets we use the functions in **utils/number_of_subsets.py**. This is a set of heuristic rules: 1) number of subsets has to be divisible by the number of views, 2) the number of subsets should have many prime factors (this results in a good herman meyer order), 3) I want at least 8 views in each subset and 4) I want at least 5 subsets. The function in **utils/number_of_subsets.py** is probably not really efficient. +To compute the number of subsets we use the functions in **utils/number_of_subsets.py**. This is a set of heuristic rules: 1) number of subsets has to be divisible by the number of views, 2) the number of subsets should have many prime factors (this results in a good Herman Meyer order), 3) I want at least 8 views in each subset and 4) I want at least 5 subsets. The function in **utils/number_of_subsets.py** is probably not really efficient. For TOF flight data, we do not use rule 3) and 4). If several number of subsets have the same number of prime factors, we take the larger number of subsets.