-
Notifications
You must be signed in to change notification settings - Fork 597
Debugging Spark
It is possible to connect a remote debugger to a spark process.
Append the following to your spark submit (or gatk-launch) options:
replace 5005 with a different available port if necessary
--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
This will suspend the driver until it gets a remote connection from intellij.
Configure a new intellij remote debugging configuration as follows:
- Select Run -> Edit Configurations
- Hit the + to add a new configuration.
- Choose Remote
- set Mode to Attach
- set Host to your driver node name i.e.
dataflow01.broadinstitute.org
- set Port to whatever port you used before
- Click OK
Now start your spark tool and then run your debug configuration.
#To debug an executor
add the following to your gatk-launch command
--num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.broadinstitute.org:5005,suspend=n"
Replace the given address with your local computer's address and port. (intellij's remote debug configuration screen will show you the address if you're not sure what it is)
(It's important to set num-executors to 1 or each executor will try to connect to your debugger causing problems.)
Note that this will not suspend the executor (or the spark program will crash when run..) Instead, set the Mode in your run configuration to listen. Start your debug configuration before you start the spark program and it will wait for a connection from the executor.