Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tn #74

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ repos:
additional_dependencies: [tomli]
args: [--in-place, --config, ./pyproject.toml]
- repo: https://github.com/asottile/pyupgrade
rev: v3.18.0
rev: v3.16.0
hooks:
- id: pyupgrade
- repo: https://github.com/hadialqattan/pycln
Expand Down
12 changes: 6 additions & 6 deletions src/qibotn/backends/cutensornet.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ class CuTensorNet(NumpyBackend): # pragma: no cover

def __init__(self, runcard):
super().__init__()
from cuquantum import cudaDataType, ComputeType, __version__ # pylint: disable=import-error
from cuquantum import cutensornet as cutn # pylint: disable=import-error

if runcard is not None:
Expand Down Expand Up @@ -58,22 +59,21 @@ def __init__(self, runcard):
self.expectation_enabled = False

self.name = "qibotn"
self.cuquantum = cuquantum
self.cutn = cutn
self.platform = "cutensornet"
self.versions["cuquantum"] = self.cuquantum.__version__
self.versions["cuquantum"] = __version__
self.supports_multigpu = True
self.handle = self.cutn.create()

global CUDA_TYPES
CUDA_TYPES = {
"complex64": (
self.cuquantum.cudaDataType.CUDA_C_32F,
self.cuquantum.ComputeType.COMPUTE_32F,
cudaDataType.CUDA_C_32F,
ComputeType.COMPUTE_32F,
),
"complex128": (
self.cuquantum.cudaDataType.CUDA_C_64F,
self.cuquantum.ComputeType.COMPUTE_64F,
cudaDataType.CUDA_C_64F,
ComputeType.COMPUTE_64F,
),
}

Expand Down
86 changes: 66 additions & 20 deletions src/qibotn/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def dense_vector_tn_MPI(qibo_circ, datatype, n_samples=8):
Dense vector of quantum circuit.
"""

import cuquantum.cutensornet as cutn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to keep the imports within the functions? (instead of top-level)

I know it was like this even before this PR...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was that not all functions require the import, specifically dense_vector_tn(), expectation_pauli_tn(), dense_vector_mps(), pauli_string_gen(). Do you think it is better to bring them to the top-level?

from cuquantum import Network
from mpi4py import MPI

Expand All @@ -71,21 +72,30 @@ def dense_vector_tn_MPI(qibo_circ, datatype, n_samples=8):
size = comm.Get_size()

device_id = rank % getDeviceCount()
cp.cuda.Device(device_id).use()

# Perform circuit conversion
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)
if rank == 0:
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)

operands = myconvertor.state_vector_operands()
operands = myconvertor.state_vector_operands()
else:
operands = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the actual purpose of this?

If rank != 0, qibo_circ is fully ignored...

Even if it is somehow meaningful (I'm not seeing how, but that may be my limitation), the result could only be trivial, so you could even return immediately, without executing all the other operations...

Copy link
Contributor Author

@Tankya2 Tankya2 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each rank needs the same initial set of operands for computation. Here, the operands are created in Rank 0, for all other rank the operands are just set to None. In line 86, the operands created in Rank 0 is then broadcasted to all other ranks.


# Assign the device for each process.
device_id = rank % getDeviceCount()
Comment on lines -80 to -81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember why it was repeated before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment may be still useful, and you could lift to the line above.

operands = comm.bcast(operands, root)

# Create network object.
network = Network(*operands, options={"device_id": device_id})

# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction.
path, info = network.contract_path(
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}}
optimize={
"samples": n_samples,
"slicing": {
"min_slices": max(32, size),
"memory_model": cutn.MemoryModel.CUTENSOR,
},
}
)

# Select the best path from all ranks.
Expand Down Expand Up @@ -136,6 +146,7 @@ def dense_vector_tn_nccl(qibo_circ, datatype, n_samples=8):
Returns:
Dense vector of quantum circuit.
"""
import cuquantum.cutensornet as cutn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

from cupy.cuda import nccl
from cuquantum import Network
from mpi4py import MPI
Expand All @@ -155,14 +166,25 @@ def dense_vector_tn_nccl(qibo_circ, datatype, n_samples=8):
comm_nccl = nccl.NcclCommunicator(size, nccl_id, rank)

# Perform circuit conversion
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)
operands = myconvertor.state_vector_operands()
if rank == 0:
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)
operands = myconvertor.state_vector_operands()
else:
operands = None

operands = comm_mpi.bcast(operands, root)

network = Network(*operands)

# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction.
path, info = network.contract_path(
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}}
optimize={
"samples": n_samples,
"slicing": {
"min_slices": max(32, size),
"memory_model": cutn.MemoryModel.CUTENSOR,
},
}
)

# Select the best path from all ranks.
Expand Down Expand Up @@ -226,6 +248,7 @@ def expectation_pauli_tn_nccl(qibo_circ, datatype, pauli_string_pattern, n_sampl
Returns:
Expectation of quantum circuit due to pauli string.
"""
import cuquantum.cutensornet as cutn
from cupy.cuda import nccl
from cuquantum import Network
from mpi4py import MPI
Expand All @@ -245,16 +268,28 @@ def expectation_pauli_tn_nccl(qibo_circ, datatype, pauli_string_pattern, n_sampl
comm_nccl = nccl.NcclCommunicator(size, nccl_id, rank)

# Perform circuit conversion
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)
operands = myconvertor.expectation_operands(
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern)
)
if rank == 0:

myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)
operands = myconvertor.expectation_operands(
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern)
)
else:
operands = None

operands = comm_mpi.bcast(operands, root)

network = Network(*operands)

# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction.
path, info = network.contract_path(
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}}
optimize={
"samples": n_samples,
"slicing": {
"min_slices": max(32, size),
"memory_model": cutn.MemoryModel.CUTENSOR,
},
}
)

# Select the best path from all ranks.
Expand Down Expand Up @@ -318,6 +353,7 @@ def expectation_pauli_tn_MPI(qibo_circ, datatype, pauli_string_pattern, n_sample
Returns:
Expectation of quantum circuit due to pauli string.
"""
import cuquantum.cutensornet as cutn
from cuquantum import Network
from mpi4py import MPI # this line initializes MPI

Expand All @@ -326,24 +362,34 @@ def expectation_pauli_tn_MPI(qibo_circ, datatype, pauli_string_pattern, n_sample
rank = comm.Get_rank()
size = comm.Get_size()

# Assign the device for each process.
device_id = rank % getDeviceCount()
cp.cuda.Device(device_id).use()

# Perform circuit conversion
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)
if rank == 0:
myconvertor = QiboCircuitToEinsum(qibo_circ, dtype=datatype)

operands = myconvertor.expectation_operands(
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern)
)
operands = myconvertor.expectation_operands(
pauli_string_gen(qibo_circ.nqubits, pauli_string_pattern)
)
else:
operands = None

# Assign the device for each process.
device_id = rank % getDeviceCount()
operands = comm.bcast(operands, root)

# Create network object.
network = Network(*operands, options={"device_id": device_id})

# Compute the path on all ranks with 8 samples for hyperoptimization. Force slicing to enable parallel contraction.
path, info = network.contract_path(
optimize={"samples": n_samples, "slicing": {"min_slices": max(32, size)}}
optimize={
"samples": n_samples,
"slicing": {
"min_slices": max(32, size),
"memory_model": cutn.MemoryModel.CUTENSOR,
},
}
)

# Select the best path from all ranks.
Expand Down
Loading