Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATLAS integration #1144

Open
7 of 14 tasks
sethrj opened this issue Mar 8, 2024 · 11 comments
Open
7 of 14 tasks

ATLAS integration #1144

sethrj opened this issue Mar 8, 2024 · 11 comments
Assignees
Labels
external Dependencies and framework-oriented features

Comments

@sethrj
Copy link
Member

sethrj commented Mar 8, 2024

This is the primary tracking issue for integrating Celeritas with the simulation workflow for the ATLAS experiment. The primary Celeritas contact points are the "assignees" to the right.

Athena integration

ATLAS JIRA issue is accessible to ATLAS collaboration members

Core capabilities

Extra capabilities

  • Woodcock tracking improved performance 17.5% overall in ATLAS due to highly segmented detectors
@sethrj sethrj added the external Dependencies and framework-oriented features label Mar 8, 2024
@drbenmorgan
Copy link
Contributor

To briefly follow up here, the internal ATLAS ticket on this isn't visible, but as of today there is a basic CPU-only build of atlasexternals here:

https://gitlab.cern.ch/bmorgan/atlasexternals/-/tree/integrate-celeritas?ref_type=heads

and the corresponding Athena integration (compile and CPU only) here:

https://gitlab.cern.ch/bmorgan/athena/-/commits/integrate-celeritas/?ref_type=heads

So, a way to go, but there is a starting point.

@sethrj sethrj changed the title Integrate Celeritas into ATLAS ATLAS integration Jul 31, 2024
@sethrj
Copy link
Member Author

sethrj commented Nov 5, 2024

Blocked by #1462 : vecgeom issues

@esseivaju
Copy link
Contributor

@sethrj Since ATLAS uses GCC 13.1/C++20, should we add CI jobs for this configuration?

@sethrj
Copy link
Member Author

sethrj commented Nov 12, 2024

Good idea. Static build? Try linking against VecGeom with cuda? 😅

@esseivaju
Copy link
Contributor

A static build would be good to have. Atlas is in the process of switching to GCC 14 so we could directly jump to that compiler. It would require updating the CI images to Ubuntu 24.04.

@sethrj
Copy link
Member Author

sethrj commented Nov 13, 2024

I think the main blocker to static builds was disk space... 🤔

@sethrj
Copy link
Member Author

sethrj commented Dec 1, 2024

@esseivaju @drbenmorgan Are there any stupid-simple ATLAS validation problems that use geantinos (or charged geantinos) to check the behavior of stepping and doing feedback? I don't think it would be a huge lift to add conversion and offloading of celeritons/celerinos 😉 and for testing purposes adding a callback that would directly interact with the stepping user action if that can be used to produce validation plots.

@esseivaju
Copy link
Contributor

esseivaju commented Dec 16, 2024

Instructions for building and running AthSimulation+Celeritas GPU

Following a message on Slack, here are notes on my setup to build and run AthSimulation + Celeritas GPU. This is only tested by myself on Perlmutter, using a container to provide cuda, I might have missed something, let me know if something doesn't work.

Requirements

  • CVMFS
  • Celeritas develop (e.g., 6338b6c)
  • VecGeom 1.2.10
  • an alma9 system with Cuda (12.4+) and GPUs (e.g., lxplus). I'm using a container on Perlmutter.

If using the container above, source this after starting it:

#!/bin/bash
export PATH=/usr/local/cuda/bin:/opt/local/bin:$PATH
export LD_LIBRARY_PATH=/opt/local/lib:/opt/local/lib64:$LD_LIBRARY_PATH
export LD_RUN_PATH=/opt/local/lib:/opt/local/lib64:$LD_RUN_PATH
export LIBRARY_PATH=/opt/local/lib:/opt/local/lib64:$LIBRARY_PATH
export CPATH=/opt/local/include:$CPATH
export CMAKE_PREFIX_PATH=/opt/local:$CMAKE_PREFIX_PATH
export PKG_CONFIG_PATH=/opt/local/lib/pkgconfig:$PKG_CONFIG_PATH
eval "$(starship init bash)"
export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase"
function setupATLAS {
  source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
}
setupATLAS

Building AtlasExternals

Since we haven't integrated all the changes into the repo yet, start from the branch atlassim-6635-cuda and update the Celeritas and VecGeom versions as specified above. This is my build script on Perlmutter (using the docker container linked above):

# to configure asetup, see https://gitlab.cern.ch/bmorgan/atlassim-6635#option-1-development-of-athenaathsimulation-only
asetup none,gcc13,cmakesetup
lsetup git

export PATH=/cvmfs/sft.cern.ch/lcg/contrib/ninja/1.11.1/Linux-x86_64/bin:$PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
BASE_DIR=`pwd`
BUILD_DIR=$BASE_DIR/externals_build
# set to your cuda architecture
cmake -GNinja -DCMAKE_CUDA_ARCHITECTURES=80 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCTEST_USE_LAUNCHERS=TRUE -S $BASE_DIR/atlasexternals/Projects/AthSimulationExternals -B $BUILD_DIR
cmake --build $BUILD_DIR
DESTDIR=$BASE_DIR/install cmake --install $BUILD_DIR

Building AthSimulation

⚠️ If building outside lxplus follow these instructions to configure G4PATH before building AthSimulation

Use the branch atlassim-6635-cuda. This only supports Athena in single-thread mode. To run with AthenaMT, include this PR in your build. This works with both multi-thread and single-thread. This is the build script I use to build AthSimulation:

asetup none,gcc13,cmakesetup
lsetup git

BASE_DIR=`pwd`
source $BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/setup.sh
export PATH=/cvmfs/sft.cern.ch/lcg/contrib/ninja/1.11.1/Linux-x86_64/bin:$PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
export AthSimulationExternals_DIR=$BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt
export CMAKE_PREFIX_PATH=$BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt:$CMAKE_PREFIX_PATH
# set to your G4DATA path if not building on lxplus
export G4PATH=/pscratch/sd/e/esseivaj/celer-athena/g4data/releases

BUILD_DIR=$BASE_DIR/athsim_build
cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -S $BASE_DIR/athena/Projects/AthSimulation/ -B $BUILD_DIR
cmake --build $BUILD_DIR

Running a transform

Once you have built the full stack, you can run AthSimulation+Celeritas. To setup the environment in a new shell:

# Again, set according to your environment
export G4PATH=/pscratch/sd/e/esseivaj/celer-athena/g4data/releases
asetup none,gcc13,cmakesetup
BASE_DIR=`pwd`
source $BASE_DIR/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/setup.sh
source $BASE_DIR/athsim_build/x86_64-el9-gcc13-opt/setup.sh

# This should only be needed outside lxplus, the default points to an $AFS location
export DATAPATH=/cvmfs/atlas.cern.ch/repo/sw/software/25.0/atlas/offline/ReleaseData/v20:$DATAPATH
export ATLASCALDATA=/cvmfs/atlas.cern.ch/repo/sw/software/25.0/atlas/offline/ReleaseData/v20

export PATH=/cvmfs/sft.cern.ch/lcg/contrib/ninja/1.11.1/Linux-x86_64/bin:$PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
export CMAKE_PREFIX_PATH=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/LCG_106_ATLAS_13/XercesC/3.2.4/x86_64-el9-gcc13-opt:$CMAKE_PREFIX_PATH

This is an example transform running AthenaMT+Celeritas GPU. For single-threaded, remove export ATHENA_PROC_NUMBER, export ATHENA_CORE_NUMBER and --multithreaded True. If you didn't build AthSim with the change in the linked PR, remove the flags.Sim.GPU from the --preExec argument

export ATHENA_PROC_NUMBER=16
export ATHENA_CORE_NUMBER=$ATHENA_PROC_NUMBER
INPUTFILE="/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CampaignInputs/mc23/EVNT/mc23_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.evgen.EVNT.e8514/EVNT.32288062._002040.pool.root.1"
AtlasG4_tf.py \
  --CA True \
  --multithreaded True \
  --perfmon none \
  --detectors 'Calo' \
  --conditionsTag 'OFLCOND-MC23-SDR-RUN3-01' \
  --postInclude 'PyJobTransforms.TransformUtils.UseFrontier' \
  --preInclude 'AtlasG4Tf:Campaigns.MC23SimulationSingleIoV,SimulationConfig.disablePhotonRussianRoulette,SimulationConfig.disableNeutronRussianRoulette,SimulationConfig.disableFrozenShowersFCalOnly' \
  --geometryVersion 'ATLAS-R3S-2021-03-02-00' \
  --inputEVNTFile "$INPUTFILE" \
  --outputHITSFile "mc23_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep_Celer_gpu_Calo.HITS.pool.root" \
  --maxEvents '1000' \
  --skipEvents '0' \
  --randomSeed '10' \
  --preExec "flags.Sim.OptionalUserActionList +=[\"G4UserActions.G4UserActionsConfig.GPUOffloadToolCfg\"];flags.Exec.FPE=-2;flags.GeoModel.EMECStandard=True;from SimulationConfig.SimEnums import CalibrationRun;flags.Sim.CalibrationRun=CalibrationRun.Off;flags.Sim.GPU.StackSize=8192;flags.Sim.GPU.HeapSize=512*1024*1024" \
  --postExec 'cfg.getService("StandardField").UseSoleCurrent=0.;cfg.getService("StandardField").UseToroCurrent=0.' \
  --imf False

@sethrj
Copy link
Member Author

sethrj commented Jan 22, 2025

From @drbenmorgan today:

Very quickly, there is a new CVMFS install of the current Athena main branch, plus Celeritas, AdePT and VecGeom at the current tips of there develop/main branches. This can be setup via

$  asetup AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2024-01-22T1700

This has both CPU/GPU Celeritas in, so it should be possible to run in CPU only mode by exporting CELER_DISABLE_DEVICE=1 before running any Athena transform. It doesn't seem to want to run on lxplus-gpu in GPU mode though. Will try and diagnose that further, and on my local GPU machine, but there for people to try out if you want. As far as I know all of the scripts etc from the hackathon and that people have been running should work.

@sethrj
Copy link
Member Author

sethrj commented Jan 25, 2025

Update from Seth+Julien hackathon

  • Julien successfully reproduced matching results between Athena with/without Celeritas with Tilecal enabled and a 50GeV pion test beam. Yay!
  • Results with the LAr calorimeters (EMEC, barrel) do not match at all
  • To eliminate physics from the comparison I thought we could do a test beam of electrons and reduce the material density to a tiny fraction so that we're tracking almost through void and immediately sending the tracks to Celeritas; @tsulaiav helpfully gave us the code to set up the particle gun and reduce the material density

LAr detectors

Here's the full raytrace with pseudorapidity lines from 0.5 to 3:
raytrace

And the sensitive regions via debut output:
detectors

And the output @esseivaju produced clearly shows we're moving electrons to Celeritas and sending lots of hits back. So I think there's something not meshing in the LAr SD code.

Yep, the last one checks step->GetStepLength which we don't set!

@sethrj
Copy link
Member Author

sethrj commented Jan 29, 2025

Note about EndOfRunAction not being called on worker threads from John Chapman via @drbenmorgan:

It is a feature of tbb that during an Athena job threads get created/destroyed seemingly at random during the event loop. This is why:
https://gitlab.cern.ch/bmorgan/athena/-/blob/atlassim-6635-cuda/Simulation/G4Atlas/G4AtlasTools/src/G4ThreadInitTool.cxx
is necessary. On the Athena side "end of the event loop" implies the finalize() method:
https://gitlab.cern.ch/bmorgan/athena/-/blob/atlassim-6635-cuda/Simulation/G4Atlas/G4AtlasAlg/src/G4AtlasAlg.cxx#L288-315
Currently G4AtlasAlg is cloned once per thread, this means that G4AtlasAlg::finalize will be called once per thread also. It wasn't a specific design choice not to call EndOfRunAction for all threads, but you can see that we are very careful only to call runMgr->RunTermination(); once. I can't remember the reason for this, but I suspect in the past calling it multiple times caused a crash. We can add something to call the EndOfRunActions per thread, but it might need a bit of unpicking to avoid whatever issue made us avoid calling RunTermination() multiple times.
If not before, this will be something that we can address in Michael Duehrssen-Debling's rewrite of the Athena-Geant4 interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Dependencies and framework-oriented features
Projects
None yet
Development

No branches or pull requests

3 participants