Archibald is a simple CLI utility designed to archive web pages effortlessly. With Archibald, you can capture web pages in multiple formats—screenshot, PDF, HTML, and Markdown—while preserving their content even if the pages disappear. This tool is perfect for researchers, journalists, and anyone who needs to keep a record of web content.
- Screenshot Capture: Save a full-page screenshot of the web page.
- PDF Export: Generate a PDF version of the web page.
- HTML Archiving: Save the web page’s HTML with inlined CSS styles
- Markdown Conversion: Extract readable content and convert it to Markdown
To install Archibald, you need Node.js installed on your machine. Then, you can install Archibald globally using npm:
npm install -g archibald
Once installed, you can use Archibald from the command line. Here’s the basic usage:
archibald <url> [options]
<url>
: The URL of the web page you want to archive.
-n, --name <name>
: Add your name or a custom identifier to the archive (optional).
-
Archive a Web Page:
Archive a web page with default settings:
archibald https://example.com
-
Archive with a Custom Name:
Include a custom name in the archive:
archibald https://example.com -n "JohnDoe"
Archibald creates a folder named with the domain and a sanitized version of the page title. Inside this folder, you will find:
screenshot.png
: A full-page screenshot of the web page.page.pdf
: A PDF version of the web page.page.html
: The HTML content with inlined styles.page.md
: The readable content converted to Markdown.
Archibald is licensed under the MIT License. Feel free to use, modify, and distribute it as you wish.
For any issues or questions, please open an issue on the GitHub repository.