Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process.execFile may resolve with incomplete output #56430

Open
jcplist opened this issue Jan 2, 2025 · 2 comments
Open

child_process.execFile may resolve with incomplete output #56430

jcplist opened this issue Jan 2, 2025 · 2 comments

Comments

@jcplist
Copy link

jcplist commented Jan 2, 2025

Version

v20.18.0

Platform

Linux .. 6.8.0-49-generic #49~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov  6 17:42:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

child_process

What steps will reproduce the bug?

script.js:

import cp from 'child_process';
import util from 'util';

const throttle = Number(process.argv[2]);
const execFile = util.promisify(cp.execFile);

let i = 0;
const ps = [];
while (i < throttle) {
  const p = execFile(
    'python3',
    ['-c', 'import time; time.sleep(0.9); print(\'Hello, World!\')'],
    { timeout: 1000 },
  );
  ps.push(p);
  i += 1;
};

ps.forEach(async (p) => {
  try {
    const { stdout, stderr } = await p;
    if (!stdout) {
      console.log(p.child.killed);
    }
  }
  catch (e) {
    console.log(e);
  }
});

In command line, run:

node script.js 3000

How often does it reproduce? Is there a required condition?

The bug is always reproducible on a 32-core computer with low background cpu usage.

What is the expected behavior? Why is that the expected behavior?

Assuming no promise is rejected. Nothing should be logged to the terminal, and the main program should exit with code 0, indicating all subprocesses resolved successfully with complete stdout output.

What do you see instead?

Many subprocesses resolve with an empty stdout, and their child.killed attributes are logged in the terminal. The logged output consists of multiple consecutive true values, followed by multiple consecutive false values.

Additional information

Related Issue

The root cause behind the bug is the same as https://github.com/orgs/nodejs/discussions/47062. The discussion had several mistakes, which results in a different conclusion.

Root Cause

The internal kill function of execFile destroys stdout and stderr streams, leading to a race condition when the following requirement is met:

  • The child process exits with code 0, but some data remains unread by Node.js.

When this occurs, the race condition manifests as one of two scenarios:

  1. The timeout triggers before child._handle.onexit completes:

    • The kill function destroys the IO streams.
    • child._handle.onexit triggers with exitCode 0.
    • The Promise resolves with incomplete stdout, and child.killed is true since it is valid to send a signal to a zombie process.
  2. child._handle.onexit completes before the timeout triggers:

    • child._handle.onexit triggers with exitCode 0, but the close event is not emitted since streams are still open.
    • The timeout triggers, the kill function destroys the IO streams, the child.kill function is effectively a no-op because child._handle is cleared.
    • The Promise resolves with incomplete stdout, and child.killed is false since child.kill function did nothing.
Why I think this is a bug
  1. Unlike the conclusion of the discussion, there is no way in client code to know if the resolved output is complete.
  2. The timeout (kill) behavior of execFile is different from spawn, the latter one does not destroy the output streams, which means that it is possible to have a valid implementation that resolves the promise correctly. (Actually, this is our current workaround.)
@bnoordhuis
Copy link
Member

Nothing should be logged to the terminal, and the main program should exit with code 0, indicating all subprocesses resolved successfully with complete stdout output.

It feels like your unspoken assumption here is:

  • my child processes sleep for 0.9 seconds
  • my timeout is 1 second
  • therefore all my child processes wake up and exit before my timeout expires

That assumption is wrong though: your child processes sleep for at least 0.9 seconds but they can definitely sleep longer; and at that point there's intrinsically a race between waking up and getting killed.

Yes, node stops reading stdio when it receives the "child exited" signal from the operating system. What would you have it do instead? Sit around waiting for output that may never come? Even though the child process exited, its children may still be alive - and keeping the stdio alive. In the limit, the stdout and stderr streams may never see EOF.

Unless you have a specific suggestion how to improve execFile(), I think the conclusion here should be that there's nothing fundamentally wrong with it; it's working as designed. Maybe it's the wrong tool for the job for you but that's why spawn() exists.

@jcplist
Copy link
Author

jcplist commented Jan 3, 2025

That assumption is wrong though: your child processes sleep for at least 0.9 seconds but they can definitely sleep longer; and at that point there's intrinsically a race between waking up and getting killed.

In the case the child process is killed by the signal, we will see the reject path instead.

Yes, node stops reading stdio when it receives the "child exited" signal from the operating system.

This is not correct, each stream increases child._closesNeeded.

What would you have it do instead? Sit around waiting for output that may never come? Even though the child process exited, its children may still be alive - and keeping the stdio alive. In the limit, the stdout and stderr streams may never see EOF.

This is what other parts of node do, spawn does this, execFile without timeout does this. If the timeout behavior of execFile is intentional, I would expect it is well documented.

Unless you have a specific suggestion how to improve execFile(), I think the conclusion here should be that there's nothing fundamentally wrong with it; it's working as designed. Maybe it's the wrong tool for the job for you but that's why spawn() exists.

In my opinion, node should either

  1. Document that there is no guarantee of the resolved stdout and stderr if execFile is used with a timeout.
  2. Don't destroy output streams in execFile's internal kill, or make it an option.
  3. Reject the promise or have an attribute indicating the timeout of execFile is triggered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants