To help you get started quickly with zq
, this repository contains small sample sets of Zeek data. There are four different log formats available, all representing events based on the same network traffic:
Directory | Format |
---|---|
zeek-default/ | Zeek default output format |
zeek-ndjson/ | Newline-delimited JSON (NDJSON), as output by the Zeek package for JSON Streaming Logs |
zng/ | ZNG, the default output format of zq |
bzng/ | BZNG, the binary output format of zq |
The examples in the zq
documentation are based on this sample data.
This sample data set was generated from a subset of the packet capture archives that are distributed by the WRCCDC.
This sample data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, as it is built upon the WRCCDC PCAP data that is distributed under the same license.
We would like to express our thanks to the WRCCDC for generously making their packet capture archives available to the public and for commercial use. The terabytes of "real world" data has been invaluable to us in testing the foundations of zq
at scale.
The data set was made from the several PCAP files in the 2018 set. Zeek v3.0.0 was used in its default configuration with the only change being the addition/enabling of the JSON Streaming Logs package. The packet captures were then processed via the command-line:
# zeek -r wrccdc.2018-03-24.101533000000000.pcap -r wrccdc.2018-03-24.101551000000000.pcap -r wrccdc.2018-03-24.101610000000000.pcap -r wrccdc.2018-03-24.101629000000000.pcap -r wrccdc.2018-03-24.101737000000000.pcap -r wrccdc.2018-03-24.101939000000000.pcap -r wrccdc.2018-03-24.102051000000000.pcap -r wrccdc.2018-03-24.102126000000000.pcap -r wrccdc.2018-03-24.102233000000000.pcap -r wrccdc.2018-03-24.102443000000000.pcap -r wrccdc.2018-03-24.102602000000000.pcap -r wrccdc.2018-03-24.102643000000000.pcap -r wrccdc.2018-03-24.102717000000000.pcap -r wrccdc.2018-03-24.102733000000000.pcap -r wrccdc.2018-03-24.102747000000000.pcap -r wrccdc.2018-03-24.102831000000000.pcap -r wrccdc.2018-03-24.102920000000000.pcap -r wrccdc.2018-03-24.103009000000000.pcap -r wrccdc.2018-03-24.103049000000000.pcap -r wrccdc.2018-03-24.103117000000000.pcap -r wrccdc.2018-03-24.103152000000000.pcap -r wrccdc.2018-03-24.103210000000000.pcap -r wrccdc.2018-03-24.103224000000000.pcap -r wrccdc.2018-03-24.103256000000000.pcap -r wrccdc.2018-03-24.103420000000000.pcap -r wrccdc.2018-03-24.103630000000000.pcap local
This produced the logs in Zeek default and NDJSON formats. As ZNG/BZNG are not yet output directly by Zeek, these logs were created by sending each Zeek default log through zq
, e.g.:
# mkdir -p zng && \
for file in zeek-default/*
do
zq -f zng "$file" \
| gzip -n > zng/"$(basename "$file" | sed 's/\.log\.gz//')".zng.gz
done
# mkdir -p bzng && \
for file in zeek-default/*
do
zq -f bzng "$file" \
| gzip -n > bzng/"$(basename "$file" | sed 's/\.log\.gz//')".bzng.gz
done
Since the sample ZNG/BZNG logs are generated by zq
, regenerating these outputs is a useful zq
test. Assuming zq
is in your $PATH
, a script is provided to regenerate the hash for each ZNG/BZNG log and compare it to a last known "good" hash stored in the md5sums/
directory.
# scripts/check_md5sums.sh zng
capture_loss:157aea9c046a836d94361327d84e0747
...
x509:6f1ba1e08588282b32e6d28b562686af
diff'ing current "zq -f zng" output hashes vs. committed hashes:
======> No diffs found. zng outputs have not changed.
# scripts/check_md5sums.sh bzng
capture_loss:62949d22a0a557342d28ee5ee4b64d50
...
x509:10333d3d004c718b04cbedb8ee195cca
diff'ing current "zq -f bzng" output hashes vs. committed hashes:
7c7
< ftp:c84824c8114df4db745399ff875b0d92
---
> ftp:2d8d90df3c4b84eb9e281a3f10767aa5
======> diffs detected! Check for a zq bug or intentional bzng format change.
Current hashes are in /var/folders/yn/jbkxxkpd4vg142pc3_bd_krc0000gn/T/tmp.9X7Gab9I