-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capturing two URLs are not being properly read by Webrecorder Player? #27
Comments
My guess is that you are not writing a single warc info record using writeWebrecorderBookmarksInfoRecord that contains all all the URLs of the pages you wish to be viewable via WR player To fix that you can wait to append that record till the very end of capturing all the pages or view them using pywb which has no such restriction. |
Hmm, oddly I also tried pywb, but it didn't display anything. Will look. I am basically just capturing everything from a webview tag, and I navigated just to one URL, and then store all packages to a warcfile almost the same as with the remoteChromeGenerator |
Have you tried to use puppeteer rather than electron? |
Ultimately the best advice I can give without seeing how you are doing the capturing (either src or minimal working example) is to treat each page as a standalone WARC that is either appended to a single WARC or written to its own WARC with concatenation done afterwards. |
If you can, here's my source: I'll eventually wrap this better, for now is a PoC. I took out your RemoteChromeWARCGenerator and RemoteChromeRequestCapturer, change the network interface for Electron's Debugger which gave me access to the same events. So it should be basically the same. The writing of the warc file is as per your example for chrome on the project's page. I only tried puppeteer for a quick test, might do some better one next week but I would have expected to work. |
😱 I didn't see them or knew they were there! Sorry. Quick look at the code looks like I ended up doing something very similar. Will try it anyway to see if I get the same Warc. I am not capturing This is the warc file I got: warc.zip It should have both https://www.drupal.org/ and https://www.drupal.org/developers I see them on the warc file Will try yours anyway and see what I do. Thanks, might get back properly next week. |
|
I am doing that manually on a context menu, so basically I just wait a reasonable while and trigger it: const menuItem2 = new MenuItem({
label: 'Warc it yo!',
click: (menuItem, browserWindow, event) => {
const warcGen = new DebuggerWARCGenerator()
console.log(cap)
warcGen.generateWARC(cap, debug, {
warcOpts: {
warcPath: 'myWARC.warc'
},
winfo: {
description: 'I created a warc!',
isPartOf: 'My awesome electron1 collection'
}
})
}
}) |
I just tried this and got the exact same behavior, maybe I am missing something related to the warc file that is currently beyond me, but would probably soon get to it. This was mainly making sure this is a workable solution, which it definitely is. If there's something to follow up here you may want to suggest or for me to help debugging or attempting to get to the root of this, I rather have this small thing working. |
You are not adding the pages array, and the warc is not being written to in appending mode. See the electron generator docs for more details. Correcting those issues should help you get your desired results |
See also https://github.com/N0taN3rd/Squidwarc/blob/next/lib/crawler/chrome.js for an example of warc generation |
I successfully (I think) captured and generated a warc file using https://electronjs.org/docs/api/debugger.
I tried a simple site: www.drupal.org
If I capture the first load, it seems to work nicely, Webrecorder Player shows it perfect.
However if I navigate to "Developers" and then store both the homepage and this page into the warc file, it doesn't seem to work. I see the data on the warcfile though.
I guess something is missing on the Warc file or I am missing something, any ideas?
Other than that, I am super happy of seeing this working. Might even worth contributing this warc generator into this package.
The text was updated successfully, but these errors were encountered: