This document provides basic information on how to use the Puppeteer-Cluster library along with Puppeteer to capture screenshots concurrently.
npm install puppeteer puppeteer-cluster
- Puppeteer is a Node library to control headless Chrome or Chromium.
- Puppeteer-Cluster manages a pool of Chromium instances to process tasks in parallel.
-
Import and Launch the Cluster
const { Cluster } = require("puppeteer-cluster"); (async () => { const cluster = await Cluster.launch({ concurrency: Cluster.CONCURRENCY_BROWSER, maxConcurrency: 3, puppeteerOptions: { headless: true, }, }); })();
-
Define a Task
await cluster.task(async ({ page, data }) => { // Use page.goto(...) to visit a URL // Then do something like capturing a screenshot });
-
Queue Tasks
cluster.queue("https://example.com"); cluster.queue("https://github.com");
-
Wait and Close
await cluster.idle(); await cluster.close();
- Puppeteer Documentation
- Official Puppeteer-Cluster Package Docs:
https://www.npmjs.com/package/puppeteer-cluster - Provides details about controlling headless Chrome/Chromium, available page methods, and more advanced usage examples.
- Official Puppeteer-Cluster Package Docs:
- High-volume Screenshot Captures
- PDF Generation (using additional libraries like pdf-lib)
- Web Scraping for data collection
- Automation & Testing in parallel
- Experiment with different
maxConcurrency
values to see performance changes. - Handle errors gracefully; some URLs might fail to load.
- Keep your code well-structured and documented, especially in the
cluster.task(...)
part. - Consider adding logging or debugging information when capturing many URLs.