Skip to content

Latest commit

 

History

History
100 lines (71 loc) · 5.45 KB

README.md

File metadata and controls

100 lines (71 loc) · 5.45 KB

Sample Data

To help you get started quickly with zq, this repository contains small sample sets of Zeek data. There are five different log formats available, all representing events based on the same network traffic:

Directory Format
zeek-default/ Zeek default output format
zeek-ndjson/ Newline-delimited JSON (NDJSON), as output by the Zeek package for JSON Streaming Logs
zng/ binary ZNG, output with zq's default LZ4-compressed format
zng-uncompressed/ binary ZNG, output with zq's option -znglz4blocksize 0 to disable compression
tzng/ TZNG, the ZNG text output format of zq

The examples in the zq documentation are based on this sample data.

Downloading

Because prior changes to the ZNG/TZNG output formats have added some bulk to the revision history, you'll typically want to save time by just downloading the latest revision:

# git clone --depth=1 https://github.com/brimsec/zq-sample-data.git

Origin/License

This sample data set was generated from a subset of the packet capture archives that are distributed by the WRCCDC.

This sample data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, as it is built upon the WRCCDC PCAP data that is distributed under the same license.

Acknowledgement

We would like to express our thanks to the WRCCDC for generously making their packet capture archives available to the public and for commercial use. The terabytes of "real world" data has been invaluable to us in testing the foundations of zq at scale.

Creation

The data set was made from the several PCAP files in the 2018 set. Zeek v3.0.0 was used in its default configuration with the only change being the addition/enabling of the JSON Streaming Logs package. The packet captures were then processed via the command-line:

# zeek -r wrccdc.2018-03-24.101533000000000.pcap -r wrccdc.2018-03-24.101551000000000.pcap -r wrccdc.2018-03-24.101610000000000.pcap -r wrccdc.2018-03-24.101629000000000.pcap -r wrccdc.2018-03-24.101737000000000.pcap -r wrccdc.2018-03-24.101939000000000.pcap -r wrccdc.2018-03-24.102051000000000.pcap -r wrccdc.2018-03-24.102126000000000.pcap -r wrccdc.2018-03-24.102233000000000.pcap -r wrccdc.2018-03-24.102443000000000.pcap -r wrccdc.2018-03-24.102602000000000.pcap -r wrccdc.2018-03-24.102643000000000.pcap -r wrccdc.2018-03-24.102717000000000.pcap -r wrccdc.2018-03-24.102733000000000.pcap -r wrccdc.2018-03-24.102747000000000.pcap -r wrccdc.2018-03-24.102831000000000.pcap -r wrccdc.2018-03-24.102920000000000.pcap -r wrccdc.2018-03-24.103009000000000.pcap -r wrccdc.2018-03-24.103049000000000.pcap -r wrccdc.2018-03-24.103117000000000.pcap -r wrccdc.2018-03-24.103152000000000.pcap -r wrccdc.2018-03-24.103210000000000.pcap -r wrccdc.2018-03-24.103224000000000.pcap -r wrccdc.2018-03-24.103256000000000.pcap -r wrccdc.2018-03-24.103420000000000.pcap -r wrccdc.2018-03-24.103630000000000.pcap local

This produced the logs in Zeek default and NDJSON formats. As ZNG/TZNG are not yet output directly by Zeek, these logs were created by sending each Zeek default log through zq, e.g.:

# mkdir -p zng && \
for file in zeek-default/*
do
  zq -f zng "$file" \
      | gzip -n > zng/"$(basename "$file" | sed 's/\.log\.gz//')".zng.gz
done

# mkdir -p zng-uncompressed && \
for file in zeek-default/*
do
  zq -f zng -znglz4blocksize 0 "$file" \
      | gzip -n > zng-uncompressed/"$(basename "$file" | sed 's/\.log\.gz//')".zng.gz
done

# mkdir -p tzng && \
for file in zeek-default/*
do
  zq -f tzng "$file" \
      | gzip -n > tzng/"$(basename "$file" | sed 's/\.log\.gz//')".tzng.gz
done

Testing

Since the sample ZNG/TZNG logs are generated by zq, regenerating these outputs is a useful zq test. Assuming zq is in your $PATH, a script is provided to regenerate the hash for each ZNG/TZNG log and compare it to a last known "good" hash stored in the md5sums/ directory.

Example successful output:

# scripts/check_md5sums.sh tzng
capture_loss:157aea9c046a836d94361327d84e0747
...
x509:6f1ba1e08588282b32e6d28b562686af

diff'ing current "zq -f tzng" output hashes vs. committed hashes:

  ======> No diffs found. tzng outputs have not changed.

Example output highlighting a format change has been flagged:

# scripts/check_md5sums.sh zng
capture_loss:62949d22a0a557342d28ee5ee4b64d50
...
x509:10333d3d004c718b04cbedb8ee195cca

diff'ing current "zq -f zng" output hashes vs. committed hashes:
7c7
< ftp:c84824c8114df4db745399ff875b0d92
---
> ftp:2d8d90df3c4b84eb9e281a3f10767aa5

  ======> diffs detected! Check for a zq bug or intentional zng format change.
          Current hashes are in /var/folders/yn/jbkxxkpd4vg142pc3_bd_krc0000gn/T/tmp.9X7Gab9I