In small block forensics, the goal is to determine the existence of any content from a small dataset of known content in a large target drive.
This project is an approximation of the SBF technique that takes two directories as input (target directory, known content directory), and uses the small block randomized technique to find the existence of some file from the known content directory within the target directory. For a visual intro to small block forensics, see this PDF deck.
View a video explanation of the project here: demo.mp4
This application supports the three tasks:
This task generates a SQLite DB of hashes of all the blocks within a source directory.
- Known Content Directory: A directory containing the files/folders of known content.
- Output SQL Path: The path to save the SQLite table for known_content.
- Block Size: The block size in bytes to be used for hashing. Defaults to 4096.
This task hashes the blocks of a target directory and compares them with the hashes contained in an SQLite database.
- Target Directory: The directory containing files/folders of the content to analyze.
- Input SQL: The path to the existing SQLite DB containing hashes of known content.
- Block Size: The block size in bytes to be used for hashing. Defaults to 4096.
- Target Probability: The target probability to achieve. Higher means more of the target drive will be scanned. Defaults to 0.95.
This task hashes the blocks of known content and compares them with the hashes generated from the target directory.
- Target Directory: The directory containing files/folders of the content to analyze.
- Known Content Directory: The directory containing the files/folders of known content.
- Output SQL Path: The path to save the SQLite hashesh for known content.
- Block Size: The block size in bytes to be used for hashing. Defaults to 4096.
- Target Probability: The target probability to achieve. Higher means more of the target drive will be scanned. Defaults to 0.95.
- Runtime: Because of the experimental nature of this project, the runtime is not guaranteed. Please make a backup of your data before running this application.
- Install pipenv
pip install pipenv
- Activate the venv
pipenv shell
- Install dependencies
pipenv install
python -m small_blk_forensics.backend.server
Pre-requisite: start the server in the background.
python client_example.py
Run SBF on a known content directory and target directory
python cmd_interface.py gen_hash_random \
--output_sql ./examples/out/known_content_hashes.sqlite \
--target_directory ./examples/target_directory \
--known_content_directory ./examples/known_content_directory \
--block_size 4
Generate a SQLite DB contains hashes of all the blocks within a source directory
python cmd_interface.py gen_hash \
--output_sql ./examples/out/known_content_hashes.sqlite \
--known_content_directory ./examples/known_content_directory \
--block_size 4
Run SBF on a pre-generated known content directory SQLite DB and target directory
python cmd_interface.py hash_random \
--input_sql ./examples/out/known_content_hashes.sqlite \
--target_directory ./examples/target_directory \
--block_size 4
Running black, isort, flake8 and mypy:
pipenv install --dev
make format