Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not suspend JobManager during startup by default #2671

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jukzi
Copy link
Contributor

@jukzi jukzi commented Jan 7, 2025

Because of possible deadlock when Javaeditor from JDT calls PDE calls OOMPH calls P2 to download new target platform waiting for a Job that is not executed because JobManager is suspendend

eclipse-pde/eclipse.pde#1481

Was already suggested in
https://bugs.eclipse.org/bugs/show_bug.cgi?id=514090

Old behavior can be used by setting VM property
-Dorg.eclipse.ui.suspendJobManagerDuringStart=true

@eclipse-platform-bot
Copy link
Contributor

This pull request changes some projects for the first time in this development cycle.
Therefore the following files need a version increment:

bundles/org.eclipse.ui.ide.application/META-INF/MANIFEST.MF

An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch.

Git patch
From d86179bf2b2b1635f52f7f93cceb230eb5d051d6 Mon Sep 17 00:00:00 2001
From: Eclipse Platform Bot <[email protected]>
Date: Tue, 7 Jan 2025 15:25:20 +0000
Subject: [PATCH] Version bump(s) for 4.35 stream


diff --git a/bundles/org.eclipse.ui.ide.application/META-INF/MANIFEST.MF b/bundles/org.eclipse.ui.ide.application/META-INF/MANIFEST.MF
index 2a8197c69e..98a8744054 100644
--- a/bundles/org.eclipse.ui.ide.application/META-INF/MANIFEST.MF
+++ b/bundles/org.eclipse.ui.ide.application/META-INF/MANIFEST.MF
@@ -2,7 +2,7 @@ Manifest-Version: 1.0
 Bundle-ManifestVersion: 2
 Bundle-Name: %Plugin.name
 Bundle-SymbolicName: org.eclipse.ui.ide.application;singleton:=true
-Bundle-Version: 1.5.600.qualifier
+Bundle-Version: 1.5.700.qualifier
 Bundle-Vendor: %Plugin.providerName
 Bundle-Localization: plugin
 Require-Bundle: org.eclipse.ui.ide;bundle-version="[3.21.0,4.0.0)",
-- 
2.47.1

Further information are available in Common Build Issues - Missing version increments.

@sratz
Copy link
Member

sratz commented Jan 7, 2025

514090#2 also says:

Complete removal of job manager suspension is quite risky since it has a potential of unmasking many dormant bugs. A safer alternative would be to make sure the the job manager is suspended before the first job is scheduled.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 7, 2025

Complete removal of job manager suspension is quite risky since it has a potential of unmasking many dormant bugs.

We can find and fix those, if any

@merks
Copy link
Contributor

merks commented Jan 7, 2025

If the bugs uncovered are downstream, someone else will need to find and fix them. It seems best to err on the side of caution.

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Test Results

 1 818 files  ±0   1 818 suites  ±0   1h 35m 38s ⏱️ + 7m 1s
 7 732 tests ±0   7 504 ✅ ±0  228 💤 ±0  0 ❌ ±0 
24 357 runs  ±0  23 608 ✅ ±0  749 💤 ±0  0 ❌ ±0 

Results for commit 824b356. ± Comparison against base commit 690fac0.

♻️ This comment has been updated with latest results.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

If there are no objections i would like to merge and see what happens.

@iloveeclipse
Copy link
Member

If there are no objections i would like to merge and see what happens.

Could you please wait a bit? I would like to run a build now to see effects of eclipse-jdt/eclipse.jdt.core#3531 only.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

Could you please wait a bit?

sure

@iloveeclipse
Copy link
Member

I've triggered https://ci.eclipse.org/releng/job/Builds/job/I-build-4.35/80/ to get intermediate results.

@vogella
Copy link
Contributor

vogella commented Jan 8, 2025

I've triggered https://ci.eclipse.org/releng/job/Builds/job/I-build-4.35/80/ to get intermediate results.

the discovery UI problem has been fixed with eclipse-equinox/p2#604 , restarted build with https://ci.eclipse.org/releng/job/Builds/job/I-build-4.35/81/

@merks
Copy link
Contributor

merks commented Jan 8, 2025

Given my comment earlier I will explicitly state my objection for historical purposes. I want to ensure that in the future when things are broken and folks arrive at this issue about who it got broken it will be clear that when someone says "We can find and fix those, if any" that it means that someone here has taken responsibility and has volunteered to help fix this problem for the entire Eclipse ecosystem. Of course I may misinterpret this to mean we will fix the problems we find in our stuff, but is the implication that we don't care so much for anything that grows beyond our patch of the garden?

@merks
Copy link
Contributor

merks commented Jan 8, 2025

And, if I find any problems in our own stuff that needs fixing, then I will raise my objection to a strong objection and will ask that a revert be applied if we've steam rolled ahead anyway.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

@merks i.e. you prefer to have a known deadlock over the vague possibility of some regression? Of cause we could revert this if we find anything that we can not solve.

@vogella
Copy link
Contributor

vogella commented Jan 8, 2025

I share the concerns of @merks (To avoid any misreading here: I agree with Eds statements).

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

would you accept a system property that controls the behavior for a possible emergency fall back?, defaulting to not suspending.

@iloveeclipse
Copy link
Member

would you accept a system property that controls the behavior for a possible emergency fall back?, defaulting to not suspending.

That is actually a good idea. Disable by default, but give a chance to "fix" that for product owners.

@merks
Copy link
Contributor

merks commented Jan 8, 2025

Would you consider @sratz suggestion?

The problem with product owners is that mostly the users use EPP packages but then install a whack of things from SimRel and from marketplace. Any one of those might be a problem and then where is the owner? The user will be clueless and will have one more reason why Eclipse sucks so much and in fact sucks more with each release. Err on the side of caution. Default to not changing the behavior giving product owners a chance to test what breaks. But of course no one is likely to do that because there seems not a big advantage to changing the behavior in general, i.e., it seems having a potential avalanche of jobs starting during startup, all of which have validly been able to assume that the workbench is fully initialized by the time they start running is more likely to do more harm than good.

Because of possible deadlock when Javaeditor from JDT calls PDE calls
OOMPH calls P2 to download new target platform waiting for a Job that is
not executed because JobManager is suspendend

eclipse-pde/eclipse.pde#1481

Was already suggested in
https://bugs.eclipse.org/bugs/show_bug.cgi?id=514090

Old behavior can be used by setting VM property
-Dorg.eclipse.ui.suspendJobManagerDuringStart=true
@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

I do not understand what you actually suggest. To keep IDE as bad as is (deadlock)? The suspend was initially done for performance reasons without any measurements given. That there are issues with a not initialized UI is pure speculation at this point.

@jukzi jukzi changed the title Do not suspend JobManager during startup Do not suspend JobManager during startup by default Jan 8, 2025
@laeubi
Copy link
Contributor

laeubi commented Jan 8, 2025

Reading the old bugzilla it seems to me that actually the suspension was already placed earlier (but I might be wrong), from an IDE perspective it makes sense to only start jobs when there is an UI because then user can cancel them and instead show UI as soon as possible (might not be a big deal today with multi CPU systems being the default).

In any case code that relies on fire a job and then join it always looks suspicious to me (even though I was surprised about this join behavior) because it feels one better should use a JobListener instead in most cases.

So I'm not 100% sure what is the use case for firing a job and then join it (in the startup phase) even though it would help here but more in hiding the problem that we do heavy work in the startup phase.

@merks
Copy link
Contributor

merks commented Jan 8, 2025

The suspend state iorg.eclipse.core.internal.jobs.JobManager.suspended s initialized like this:

image

What happens if this is initialized to true? Of course the concern about the default behavior remains the same, but if the value is initialized via a system property that defaults to false, we change nothing in the default behavior, but the Eclipse SDK can set that property to true, as can EPP. This would prevent any job from starting before suspend is called. I've seen assertions to suggest this fixes the problem we're trying to fix. So given there is a historical desire to avoid jobs starting during startup, that would seem the best way to ensure the past target behavior.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

heavy work in the startup phase.

I guess our multi core CPUs are annoyed by a single threaded startup. I agree that creating a job and joining it is an antipattern that could also be fixed.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

What happens if this is initialized to true?

I do not understand which problem that should fix. it does not fix the deadlock nor i am aware of any job that starts before jobmanager is suspended

@vogella
Copy link
Contributor

vogella commented Jan 8, 2025

I've triggered https://ci.eclipse.org/releng/job/Builds/job/I-build-4.35/80/ to get intermediate results.

the discovery UI problem has been fixed with eclipse-equinox/p2#604 , restarted build with https://ci.eclipse.org/releng/job/Builds/job/I-build-4.35/81/

Build available for testing: https://download.eclipse.org/eclipse/downloads/drops4/I20250108-0430/

@merks
Copy link
Contributor

merks commented Jan 8, 2025

So let's do nothing until everyone fully understands the problem and what we are doing to fix it. Could someone please take a step back and fully describe the problem and how this change fixes it.

( I was traveling for a day, and am dealing with 150 emails, many related to this topic, and sorry to say this out loud, many with a tone that I find offensive and embarrassing. So now I'm beyond frustrated and am shocked to find things being pushed through as if a fire flared up and needed to be put out before the entire building is engulfed in frames. )

If I read this details correctly, some job is running and then the job manager is suspended but that already-running job is waiting for some other job (that it launched?) to complete, but it cannot because the job manager is not starting any jobs. So if we avoid starting any jobs at all, then we can avoid that problem. I could be wrong. Too much email, too much noise, and not enough simple factual detail. So please let's clarify the facts and the details.


@vogella

Is eclipse-equinox/p2#604 related to this PR. I'm so confused.

@vogella
Copy link
Contributor

vogella commented Jan 8, 2025

Is eclipse-equinox/p2#604 related to this PR. I'm so confused.

@iloveeclipse did ask @jukzi in #2671 (comment) to wait with this PR until an aggregator build is done with the change in JDT core. The build failed and eclipse-equinox/p2#604 fixed this failure. Now everyone who wants can test the JDT core change before this change. The result of the build can be found here: https://download.eclipse.org/eclipse/downloads/drops4/I20250108-0430/

@merks
Copy link
Contributor

merks commented Jan 8, 2025

In the interest of gathering facts, I did the following. I installed the Eierlegende Wollmilchsau, created a sample PDE project, opened the Java editor in that project, and closed the IDE. I enabled remote debug in the eclipse.ini, suspending the JVM on start. I set a breakpoint in org.eclipse.core.internal.jobs.JobManager.suspend() and when the debugger stopped there, I change the value back to false. Then I let the IDE complete the startup phase. This appears in the Error log:

java.lang.IllegalStateException: Workbench has not been created yet.
	at org.eclipse.ui.PlatformUI.getWorkbench(PlatformUI.java:119)
	at org.eclipse.php.internal.ui.text.correction.CorrectionCommandInstaller.registerCommands(CorrectionCommandInstaller.java:31)
	at org.eclipse.php.internal.ui.PHPUiPlugin$1.run(PHPUiPlugin.java:141)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

So definitely jobs might assume that org.eclipse.ui.PlatformUI.getWorkbench() will work nicely to find that now it does not.


The next experiment I did with the same setup, but I stopped in the JobManager constructor and changed suspended to true in the debugger. The IDE starts with the open editor just fine with no logged errors.

@jukzi
Copy link
Contributor Author

jukzi commented Jan 8, 2025

In the interest of gathering facts

thats good.

So definitely jobs might assume that org.eclipse.ui.PlatformUI.getWorkbench() will work nicely to find that now it does not.

And could be fixed for example using org.eclipse.ui.PlatformUI.isWorkbenchRunning() https://bugs.eclipse.org/bugs/show_bug.cgi?id=49316

JDT's org.eclipse.jdt.internal.ui.text.correction.CorrectionCommandInstaller works fine.

Anyhow i will wait with any further investigation to find out if the deadlock still ever happens after eclipse-equinox/p2#600

@laeubi
Copy link
Contributor

laeubi commented Jan 8, 2025

heavy work in the startup phase.

I guess our multi core CPUs are annoyed by a single threaded startup. I agree that creating a job and joining it is an antipattern that could also be fixed.

I just wanted to note that we are talking about the UI Thread (!) so it is always single threaded and the goal seems to be keep that as short as possible so we show something to the user and then start the "heavy" work,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants