Skip to content

A list of generic tools for parsing binary data structures, such as file formats, network protocols or bitstreams

License

Notifications You must be signed in to change notification settings

dloss/binary-parsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 

Repository files navigation

Awesome Binary Parsing

A curated collection of tools and resources for parsing and analyzing binary data structures, such as file formats, network protocols or bitstreams.

Libraries and Tools by Programming Language

Python

  • Construct: library for parsing and building of data structures (binary or textual). Define your data structures in a declarative manner
  • Hachoir: view and edit a binary stream field by field. Long list of parsers for all kinds of formats
  • Caterpillar: Python 3.12+ library to pack and unpack structurized binary data
  • Scapy: send, sniff and dissect and forge network packets. Usable interactively or as a library
  • Mr. Crowbar: Django-esque model framework for reading and writing binary file formats. Includes a suite of command-line tools for visualising and digging through binary data

JavaScript

  • Binary-parser: binary parser builder library which enables you to write efficient parsers in a simple & declarative way
  • jBinary: High-level API for working with binary data.

C/C++

  • Hammer (C): bit-oriented parsing library
  • FormatFuzzer (C++): framework for high-efficiency, high-quality generation and parsing of binary inputs
  • Marpa (C/C++, Perl, Go): libmarpa (C)
  • Wuffs: a memory-safe programming language (and a standard library written in that language) for Wrangling Untrusted File Formats Safely. Wrangling includes parsing, decoding and encoding.
  • libtins (C++): crafting, sending, sniffing and interpreting raw network packets
  • libcrafter (C++): high level library for C++ designed to create and decode network packets

Java

Go

  • restruct: library for reading and writing binary data

Rust

  • Nom: Rust parser combinator framework
  • Deku: bit-level, symmetric, serialization/deserialization implementations for structs and enums
  • binrw: binrw helps you write maintainable & easy-to-read declarative binary data readers and writers using ✨macro magic✨.

Ruby

  • BinData: provides a declarative way to read and write structured binary data

Other Programming Languages

  • FlexT (Delphi): a DSL and a tool for generating parsers in Delphi
  • Haka (Lua): open source security oriented language which allows to describe protocols and apply security policies on (live) captured traffic
  • binarylang (Nim): extensible Nim DSL for creating binary parsers/encoders in a symmetric fashion
  • binaryparse (Nim): In-language DSL for reading and writing binary data supporting all sorts of patterns. Generates an efficient stream based reader and writer for the runtime execution.
  • Gloss (Clojure): turn complicated byte formats into Clojure data structures and Clojure data structures into compact byte representations
  • scodec (Scala): Combinator library for working with binary data
  • attoparsec and attoparsec-binary: (Haskell): fast parser combinator library, aimed particularly at dealing efficiently with network protocols and complicated text/binary file formats
  • Parsifal (OCaml): OCaml-based parsing engine. Paper: A pragmatic solution to the binary parsing problem. Olivier Levillain

Language-Agnostic Tools

Binary Format Description Languages

  • Kaitai Struct (DSL): declarative language used for describe various binary data structures, laid out in files or in memory
  • RecordFlux: toolset for the formal specification of messages and the generation of verifiable binary parsers and message generators (Ada-inspired).
  • Spicy (DSL, C/C++, Zeek): a next-generation parser generator for network protocols and file formats
  • DataScript Tools (DSL): DataScript is a formal language for modelling binary datatypes, bitstreams or file formats. PDF
  • Dogma (DSL): human-friendly metalanguage for describing data formats in documentation using the familiar patterns of Backus-Naur Form.
  • EverParse: a framework for generating verified secure parsers and formatters from domain-specific format specification languages

Standalone Applications

Hex Editors with Grammars

  • Synalyze It! (macOS): hex editor with grammar-based binary format parsing
  • Hexinator (Windows): hex editor with grammar-based binary format parsing
  • 010 Editor (Windows/macOS/Linux): hex editor with C-style binary templates and large template library
  • Kiewtai: plugin for the Hiew hex editor that makes the Kaitai parsers available
  • Hobbits: multi-platform GUI for bit-based analysis, processing, and visualization. Has a Kaitai plugin.
  • ImHex (Windows/macOS/Linux): A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.

Binary Analysis Tools

  • GNU poke: The extensible editor for structured binary data
  • fq: jq for binary formats - tool, language and decoders for working with binary and text formats
  • radare2 (C, with bindings/pipe for almost all languages): Unix-like reverse engineering framework and commandline tools. See Parsing a fileformat with radare2 and Types.
  • Veles: open source tool for binary analysis

Network Protocol Analysis

  • Wireshark: network protocol analyzer that includes dissectors for over two thousand protocols.

    • TShark: command line version, can easily be called from shell scripts.
    • Wireshark Generic Dissector: add-on, allows dissection of a protocol based on a text description of the protocol elements
    • Wireshark Lua: dissectors can be written in Lua (Examples)
    • pyreshark: plugin providing a simple interface for writing Wireshark dissectors in Python
    • Sharktools (Python, Matlab): Tools for programmatic parsing of packet captures using Wireshark functionality
  • Netzob: open source tool for reverse engineering, traffic generation and fuzzing of communication protocols

  • Cat Karat Packet Builder: packet generation tool that allows to build custom packets for firewall or target testing

  • Scapy: send, sniff and dissect and forge network packets.

Research papers

  • Interval Parsing Grammars for File Format Parsing (2023): Jialun Zhang, Greg Morrisett, Gang Tan
  • LangSec Platform (2021): Towards a Platform to Compare Binary Parser Generators. Olivier Levillain, Sébastien Naud, Aina Toky Rasoamanana (Video)
  • Narcissus (2019): Correct-By-Construction Derivation of Decoders and Encoders from Binary Formats. Benjamin Delaware, Sorawit Suriyakarn, Clément Pit-Claudel, Qianchuan Ye, Adam Chlipala
  • EverParse (2019): Verified Secure Zero-Copy Parsers for Authenticated Message Formats. Tahina Ramananandro et. al.
  • Generic packet descriptions (2017): Verified parsing and pretty printing of low-level data. Marcell van Geest, Wouter Swierstra
  • Nail (2014): A Practical Tool for Parsing and Generating Data Formats. Julian Bangert and Nickolai Zeldovich
  • FlowSifter (2014): High-Speed Application Protocol Parsing and Extraction for Deep Flow Inspection. Alex X. Liu, Chad R. Meiners, Eric Norige, and Eric Torng
  • Zebra (2013): Improving the Performance of Message Parsers for Embedded Systems. Jigar Solanki et. al.
  • W. Underwood (2012): Grammar-Based Specification and Parsing of Binary File Formats. William Underwood
  • Yakker (2010): Semantics and Algorithms for Data-dependent Grammars. Trevor Jim, Yitzhak Mandelbaum, David Walker
  • Zebu (2009): A Language-Based Approach for Improving the Robustness of Network Application Protocol Implementations. Larent Burgy et. al.
  • z2z (2009): Automatic Generation of Network Protocol Gateways. Yerom-David Bromberg, Laurent Reveillere, Julia L. Lawall, Gilles Muller
  • Tupni (2008): Automatic Reverse Engineering of Input Formats. Weidong Cui et. al.
  • PADS/ML (2007): a functional data description language. Y. Mandelbaum, K. Fisher, D. Walker, M. F. Fernandez, and A. Gleyzer.
  • BinPAC (2006): Superseded by BinPAC++, which is now known as Spicy
  • NetPDL (2006): Markup Language that aims to describe Protocols from OSI layer 2 to OSI layer 7
  • TSN.1 (2005): Transfer Syntax Notation One (TSN.1). A formal notation for describing messages in binary protocols
  • GAPA (2005): Generic Application-Level Protocol Analyzer and its Language. Nikita Borisov, David J. Brumley, Helen J. Wang, Chuanxiong Guo
  • PacketTypes (2000): P. J. McCann and S. Chandra. Packet types: Abstract specification of network protocol messages.

Binary Format References

Related Topics

About

A list of generic tools for parsing binary data structures, such as file formats, network protocols or bitstreams

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published