Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine the matrix size of padded Hamiltonian (Padding Hamiltonian Matrix) #223

Open
3 tasks
tsuyoshi38 opened this issue Jul 20, 2023 · 5 comments
Open
3 tasks
Assignees

Comments

@tsuyoshi38
Copy link
Contributor

tsuyoshi38 commented Jul 20, 2023

When the code is available to treat the padded Hamiltonian matrix, we should set the good value for block_size_r (and block_size_c).

First,

  1. Manually Given or Default Setting:
  • Deafult setting for block_size_r and block_size_c should be given

  • - block_size_r ~ 20 (suggested from our test calculations, though depends on hardware.)

  • Is it okay to assume that block_size_r = block_size_c

  • How to set proc_rows and proc_cols ?

  • - we may be able to follow the present method.

  • - But, see the constraints for ELPA shown below.

  1. Constraints in Scalapack
  2. Constraints in ELPA
    1. block_size_r = block_size_c
    2. proc_cols = M (integer) x proc_rows
    3. ?? needs more than 1 blocks in the corresponding column or row. ??
@tsuyoshi38 tsuyoshi38 changed the title How to determine the matrix size of padded Hamiltonian How to determine the matrix size of padded Hamiltonian (Padding Hamiltonian Matrix) Jul 20, 2023
@tsuyoshi38 tsuyoshi38 self-assigned this Jul 20, 2023
@tsuyoshi38
Copy link
Contributor Author

tsuyoshi38 commented Jul 31, 2023

In the last comment for the constraints related to ELPA ...

For the constraint i) (block_size_r = block_size_c), judging from the following example shown in the page of ELPA,
call elpa%set("nblk", nblk, success) ! size of the BLACS block cyclic distribution
I guess we have only one parameter for the size of block.

On the other hand, the constraints ii) and iii) for ELPA are only from our benchmark tests.
What I heard is ELPA was very inefficient if these two are not satisfied.
Since I could not find any documents or other examples showing them on the internet, it is probably better to ignore these constraints for a while.

Then, we can simply set a default value of block size without considering the number of MPI processes and matrix size of Hamiltonian.

@tsuyoshi38
Copy link
Contributor Author

I think I have finished introducing "Padding H and S matrices" to make the dimension of matrix a multiple of Block size.
Note that if the block size is small (1-5?), Scalapack is very inefficient.

I think the code is already useful for may users, but the test calculations I have doe so far may not be enough.
Later, I will explain more about the relationship between # of MPI processes, dimension of matrix (H and S), and block size of matrices. We basically want to set a good default value of the block size (Diag.BlockSizeR and Diag.BlockSizeC).
But, it is not so simple as I first thought to set the appropriate block size for the given # of MPI processes and the dimension of matrix. In addition, the appropriate size of block may strongly depend on the hardware.

Considering these situations, I wonder it is better to introduce the changes in the following 2 steps.
Stage 1: we will collect the information from the users.
Default: use the present CQ setting (without padding.)
Option : we can set Diag.BlockSizeR for padding H and S.

Stage 2. we will provide the default value of the block size.
Default: using a default setting of Diag.BlockSizeR with padding.
if the users set the inappropriate # of processes, CQ warns -> changing BlockSize?
Option: If user sets Diag.BlockSizeR, use the given value and
just warning for inappropriate settings.

If anyone has a comment or suggestion, please let me know.

@tsuyoshi38
Copy link
Contributor Author

tsuyoshi38 commented Aug 2, 2023

(( no. of processes, block size, dimension of the H and S matrices ))

  1. First, let me remind you that we have two sizes for the dimension of Hamiltonian and overlap matrices.
  • matrix_size = actual dimension of Hamiltonian
  • matrix_size_padH = size of padded H or S matrix, to be a multiple of the block size.
  1. Usually, proc_rows & proc_cols are determined from (no. of MPI processes).
    It is also possible for users to set these parameters by setting Diag.ProcRows and Diag.ProcCols .
  • no. of processes => proc_rows, proc_cols
  • When (no. of processes) < 9 => proc_rows = 1, proc_cols = (no. of processes) / (no. of parallelisation for k-points)
  1. On the other hand, we want to set the default size of block_size_r (and c) in the future. But, the values can be also given by setting Diag.BlockSizeR and Diag.BlockSizeC. As mentioned above, CQ is very slow if block_size_r (and c) is set to be less than 5. If we assume, block_size_r (and c) is given by CQ or a user, number of block matrices along row or column is calculated.
  • block_size_r, blocks_size_c => blocks_r, blocks_c (no. of blocks along row or column)
  1. Her, we have a restriction;
  • blocks_r needs to be equal or larger than proc_rows
  • blocks_c needs to be equal or larger than proc_cols
  1. Of course, users should not set a large number of processes when the matrix size is not large. Then, we may be able to introduce a new rule or restriction.
  • proc_cols is equal to or larger than proc_rows.
  • proc_cols must be smaller than blocks_c = (matrix_size_padH/ block size)

For large systems, it should be okay. We usually use large number of MPI processes, then proc_rows is proportional to the square root of (no. of processes), while (matrix_size) is proportional to the number of atoms and (block size) should be almost constant.

On the other hand, it may cause a problem for small systems. The number of processes can be smaller than 9, and (matrix_size) and (block size) may be comparable.
But... If we simply ignore the efficiency for small systems, it may be much easier to set the value of block size.

@tsuyoshi38
Copy link
Contributor Author

tsuyoshi38 commented Aug 3, 2023

I have made a branch f-proj_PHM_BlockSize.
Here, the subroutine checking the condition mentioned above (condition 4) is made and put just after matrix_size_padH is calculated.

At present, the part is in the subroutine allocate_arrays in ScalapackFormat.f90. But, It can be put also in readDiagInfo in initial_read_module.f90. I thought it is better if the initial_read_module is smaller, for the readability.

I think the code is now ready for Stage 1 and would like to put it into develop version.
It is probably better to release v1.2 first and then merge this version to develop.
(But.. I forgot how to merge the present version of f-proj_PadHamiltonianMatrix, which was made from the old version of develop, to the latest version of develop. )

And..
I think I can finish my project (Implementing Padding ...) for now. Then, we will restart it after we collect the data for appropriate block size.

@davidbowler
Copy link
Contributor

I agree that we should release version 1.2 first so for now please don't try to merge this into develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants