Skip to content

Postprocessor options

Sohom Datta edited this page May 22, 2024 · 1 revision

Postprocessors are a set of programs written in Golang that interpret and parse VisibleV8 logs and then dump the resulting data into a PostgreSQL database.

By default, the crawler does not apply postprocessors to the VisibleV8 logs. This is useful if you plan to process the logs independently after the experiment. However, in most scenarios, you will want the crawler to run postprocessors on the resulting logs in parallel. This can be done by using the -pp '<postprocessor>' flag in the crawl CLI. For example,

python3 ./scripts/vv8-cli.py crawl -u 'https://google.com' -pp 'Mfeatures'

will run the mega-features postprocessor on the logs. The mega-features postprocessor is one of the main postprocessors which organizes all of the scripts and associated APIs obtained from a log into separate tables in the PostgreSQL instance.

To run multiple postprocessors, you can use:

python3 ./scripts/vv8-cli.py crawl -u 'https://google.com' -pp 'Mfeatures+adblock'

This will run the adblock postprocessor and the megafeatures postprocessor at the same time on the logs.

A list of all currently available postprocessors exists on the VisibleV8 repository (the VisibleV8 repository calls them aggregators)

Clone this wiki locally