Capturing two URLs are not being properly read by Webrecorder Player? #27

hanoii · 2019-01-18T21:41:42Z

I successfully (I think) captured and generated a warc file using https://electronjs.org/docs/api/debugger.

I tried a simple site: www.drupal.org

If I capture the first load, it seems to work nicely, Webrecorder Player shows it perfect.

However if I navigate to "Developers" and then store both the homepage and this page into the warc file, it doesn't seem to work. I see the data on the warcfile though.

I guess something is missing on the Warc file or I am missing something, any ideas?

Other than that, I am super happy of seeing this working. Might even worth contributing this warc generator into this package.

N0taN3rd · 2019-01-18T22:14:45Z

node-warc welcomes all contributions!

My guess is that you are not writing a single warc info record using writeWebrecorderBookmarksInfoRecord that contains all all the URLs of the pages you wish to be viewable via WR player

To fix that you can wait to append that record till the very end of capturing all the pages or view them using pywb which has no such restriction.
Ultimately WR player and WR itself use pywb as the replay system

hanoii · 2019-01-18T22:17:50Z

Hmm, oddly I also tried pywb, but it didn't display anything. Will look. I am basically just capturing everything from a webview tag, and I navigated just to one URL, and then store all packages to a warcfile almost the same as with the remoteChromeGenerator

N0taN3rd · 2019-01-18T22:21:21Z

Have you tried to use puppeteer rather than electron?
I have found that using a full browser either Chrome or Chromium (brought in via puppeteer) controllable via puppeteer or chrome-remote-interface
produces better results and is easier to use.

N0taN3rd · 2019-01-18T22:24:14Z

Ultimately the best advice I can give without seeing how you are doing the capturing (either src or minimal working example) is to treat each page as a standalone WARC that is either appended to a single WARC or written to its own WARC with concatenation done afterwards.

hanoii · 2019-01-18T22:39:42Z

If you can, here's my source:

https://pastebin.com/61bBUiyg

I'll eventually wrap this better, for now is a PoC.

I took out your RemoteChromeWARCGenerator and RemoteChromeRequestCapturer, change the network interface for Electron's Debugger which gave me access to the same events. So it should be basically the same.

The writing of the warc file is as per your example for chrome on the project's page.

I only tried puppeteer for a quick test, might do some better one next week but I would have expected to work.

N0taN3rd · 2019-01-18T22:47:18Z

Did the electron request capturer and writer not work for you?

hanoii · 2019-01-18T22:56:21Z

😱 I didn't see them or knew they were there! Sorry. Quick look at the code looks like I ended up doing something very similar.

Will try it anyway to see if I get the same Warc.

I am not capturing maybeNetworkMessage though.

This is the warc file I got: warc.zip

It should have both https://www.drupal.org/ and https://www.drupal.org/developers

I see them on the warc file

Will try yours anyway and see what I do. Thanks, might get back properly next week.

N0taN3rd · 2019-01-18T23:05:53Z

maybeNetworkMessage is a utility function in order to allow you to not have to add an additional message listener to the debugger 😄
As far as your shared src code I can not infer when you are writing to the WARC and from what I can infer from the discussion here when that is being done is likely the reason for your issues.

hanoii · 2019-01-18T23:12:14Z

I am doing that manually on a context menu, so basically I just wait a reasonable while and trigger it:

    const menuItem2 = new MenuItem({
      label: 'Warc it yo!',
      click: (menuItem, browserWindow, event) => {
        const warcGen = new DebuggerWARCGenerator()
        console.log(cap)
        warcGen.generateWARC(cap, debug, {
          warcOpts: {
            warcPath: 'myWARC.warc'
          },
          winfo: {
            description: 'I created a warc!',
            isPartOf: 'My awesome electron1 collection'
          }
        })
      }
    })

hanoii · 2019-01-21T19:12:22Z

Did the electron request capturer and writer not work for you?

I just tried this and got the exact same behavior, maybe I am missing something related to the warc file that is currently beyond me, but would probably soon get to it. This was mainly making sure this is a workable solution, which it definitely is.

If there's something to follow up here you may want to suggest or for me to help debugging or attempting to get to the root of this, I rather have this small thing working.

N0taN3rd · 2019-01-22T00:04:38Z

You are not adding the pages array, and the warc is not being written to in appending mode.

See the electron generator docs for more details.

Correcting those issues should help you get your desired results ☺️

N0taN3rd · 2019-01-22T00:24:16Z

See also https://github.com/N0taN3rd/Squidwarc/blob/next/lib/crawler/chrome.js for an example of warc generation

hanoii changed the title ~~Browsing away from an URL~~ Capturing two URLs are not being properly read by Webrecorder Player? Jan 18, 2019

hanoii mentioned this issue Jan 22, 2019

Things worth continuing working on thebrickfactory/electron-browser-archive#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capturing two URLs are not being properly read by Webrecorder Player? #27

Capturing two URLs are not being properly read by Webrecorder Player? #27

hanoii commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019 •

edited

Loading

hanoii commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019 •

edited

Loading

hanoii commented Jan 18, 2019 •

edited

Loading

N0taN3rd commented Jan 18, 2019

hanoii commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019 •

edited

Loading

hanoii commented Jan 18, 2019

hanoii commented Jan 21, 2019

N0taN3rd commented Jan 22, 2019 •

edited

Loading

N0taN3rd commented Jan 22, 2019

Capturing two URLs are not being properly read by Webrecorder Player? #27

Capturing two URLs are not being properly read by Webrecorder Player? #27

Comments

hanoii commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019 • edited Loading

hanoii commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019 • edited Loading

hanoii commented Jan 18, 2019 • edited Loading

N0taN3rd commented Jan 18, 2019

hanoii commented Jan 18, 2019

N0taN3rd commented Jan 18, 2019 • edited Loading

hanoii commented Jan 18, 2019

hanoii commented Jan 21, 2019

N0taN3rd commented Jan 22, 2019 • edited Loading

N0taN3rd commented Jan 22, 2019

N0taN3rd commented Jan 18, 2019 •

edited

Loading

N0taN3rd commented Jan 18, 2019 •

edited

Loading

hanoii commented Jan 18, 2019 •

edited

Loading

N0taN3rd commented Jan 18, 2019 •

edited

Loading

N0taN3rd commented Jan 22, 2019 •

edited

Loading