Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mypy type checker #746

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 4 additions & 22 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,98 +1,80 @@
name: CI

# only run tests for pull requests cause no file has to be changed without review
# open -> open the pull request
# synchronize -> push to branch of pull request
on:
pull_request:
types: [opened, synchronize]

jobs:
test:
uses: ./.github/workflows/testing.yml

build-docs:
runs-on: ubuntu-24.04

strategy:
matrix:
python-version: ["3.11"]

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"

- name: Install dependencies
run: |
sudo apt-get update && sudo apt-get -y install pandoc
pip install --upgrade pip wheel
pip install .[doc]

- name: build docs
run: |
cd doc
sphinx-apidoc -fT -o source/module_reference ../logprep
make clean html

code-quality:
runs-on: ubuntu-24.04

strategy:
matrix:
python-version: ["3.11"]

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: azure/[email protected]
with:
version: "latest"
id: install

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"

- name: Get changed python files
id: changed-files
uses: tj-actions/changed-files@v41
with:
files: |
**/*.py

- name: Install dependencies
run: |
pip install --upgrade pip wheel
pip install .[dev]

- name: check black formating
run: |
black --check --diff --config ./pyproject.toml .

- name: lint helm charts
run: |
helm lint --strict ./charts/logprep

- name: lint changed and added files
if: steps.changed-files.outputs.all_changed_files
run: |
pylint --rcfile=.pylintrc --fail-under 9.5 ${{ steps.changed-files.outputs.all_changed_files }}

pylint --fail-under 9.5 ${{ steps.changed-files.outputs.all_changed_files }}
- name: mypy type checking
if: steps.changed-files.outputs.all_changed_files
run: mypy --follow-imports=skip ${{ steps.changed-files.outputs.all_changed_files }}
- name: Run tests and collect coverage
run: pytest tests/unit --cov=logprep --cov-report=xml

- name: Upload coverage reports to Codecov with GitHub Action
uses: codecov/codecov-action@v2

containerbuild:
uses: ./.github/workflows/container-build.yml
secrets: inherit
Expand Down
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,8 @@ repos:
rev: v0.15.0
hooks:
- id: yamlfmt
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.14.1
hooks:
- id: mypy
args: [--follow-imports=skip]
1 change: 1 addition & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"ms-python.pylint",
"ms-python.isort",
"ms-toolsai.jupyter",
"ms-python.mypy-type-checker",
"njpwerner.autodocstring",
"ryanluker.vscode-coverage-gutters",
"streetsidesoftware.code-spell-checker"
Expand Down
6 changes: 3 additions & 3 deletions logprep/abc/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import zlib
from abc import abstractmethod
from copy import deepcopy
from functools import partial, cached_property
from functools import cached_property, partial
from hmac import HMAC
from typing import Optional, Tuple
from zoneinfo import ZoneInfo
Expand Down Expand Up @@ -91,9 +91,9 @@ class TimeDeltaConfig:
"""TimeDelta Configurations
Works only if the preprocessor log_arrival_time_target_field is set."""

target_field: field(validator=[validators.instance_of(str), lambda _, __, x: bool(x)])
target_field: field(validator=(validators.instance_of(str), lambda _, __, x: bool(x)))
"""Defines the fieldname to which the time difference should be written to."""
reference_field: field(validator=[validators.instance_of(str), lambda _, __, x: bool(x)])
reference_field: field(validator=(validators.instance_of(str), lambda _, __, x: bool(x)))
"""Defines a field with a timestamp that should be used for the time difference.
The calculation will be the arrival time minus the time of this reference field."""

Expand Down
6 changes: 3 additions & 3 deletions logprep/abc/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from pathlib import Path
from typing import TYPE_CHECKING, List, Optional

from attr import define, field, validators
from attrs import define, field, validators

from logprep.abc.component import Component
from logprep.framework.rule_tree.rule_tree import RuleTree
Expand Down Expand Up @@ -96,12 +96,12 @@ class Config(Component.Config):
As last option it is possible to define entire rules with all their configuration parameters as list elements.
"""
tree_config: Optional[str] = field(
default=None, validator=[validators.optional(validators.instance_of(str))]
default=None, validator=(validators.optional(validators.instance_of(str)))
)
"""Path to a JSON file with a valid :ref:`Rule Tree Configuration`.
For string format see :ref:`getters`."""
apply_multiple_times: Optional[bool] = field(
default=False, validator=[validators.optional(validators.instance_of(bool))]
default=False, validator=(validators.optional(validators.instance_of(bool)))
)
"""Set if the processor should be applied multiple times. This enables further processing
of an output with the same processor."""
Expand Down
2 changes: 1 addition & 1 deletion logprep/connector/file/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ class Config(Input.Config):
format. Needs to be parsed with dissector or another processor"""

start: str = field(
validator=[validators.instance_of(str), validators.in_(("begin", "end"))],
validator=(validators.instance_of(str), validators.in_(("begin", "end"))),
default="begin",
)
"""Defines the behaviour of the file monitor with the following options:
Expand Down
12 changes: 6 additions & 6 deletions logprep/connector/opensearch/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,23 +109,23 @@ class Config(Output.Config):
"""(Optional) Timeout after :code:`message_backlog` is flushed if
:code:`message_backlog_size` is not reached."""
thread_count: int = field(
default=4, validator=[validators.instance_of(int), validators.gt(1)]
default=4, validator=(validators.instance_of(int), validators.gt(1))
)
"""Number of threads to use for bulk requests."""
queue_size: int = field(
default=4, validator=[validators.instance_of(int), validators.gt(1)]
default=4, validator=(validators.instance_of(int), validators.gt(1))
)
"""Number of queue size to use for bulk requests."""
chunk_size: int = field(
default=500, validator=[validators.instance_of(int), validators.gt(1)]
default=500, validator=(validators.instance_of(int), validators.gt(1))
)
"""Chunk size to use for bulk requests."""
max_chunk_bytes: int = field(
default=100 * 1024 * 1024, validator=[validators.instance_of(int), validators.gt(1)]
default=100 * 1024 * 1024, validator=(validators.instance_of(int), validators.gt(1))
)
"""Max chunk size to use for bulk requests. The default is 100MB."""
max_retries: int = field(
default=3, validator=[validators.instance_of(int), validators.gt(0)]
default=3, validator=(validators.instance_of(int), validators.gt(0))
)
"""Max retries for all requests. Default is 3."""
desired_cluster_status: list = field(
Expand All @@ -134,7 +134,7 @@ class Config(Output.Config):
"""Desired cluster status for health check as list of strings. Default is ["green"]"""
default_op_type: str = field(
default="index",
validator=[validators.instance_of(str), validators.in_(["create", "index"])],
validator=(validators.instance_of(str), validators.in_(["create", "index"])),
)
"""Default op_type for indexing documents. Default is 'index',
Consider using 'create' for data streams or to prevent overwriting existing documents."""
Expand Down
12 changes: 7 additions & 5 deletions logprep/generator/http/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,23 @@
from typing import Generator, List

import msgspec
import yaml
from attr import define, field, validators
from attrs import define, field, validators
from ruamel.yaml import YAML

from logprep.generator.http.manipulator import Manipulator

yaml = YAML(typ="safe")


@define(kw_only=True)
class TimestampReplacementConfig:
"""Configuration Class fot TimestampReplacement"""

key: str = field(validator=[validators.instance_of(str)])
key: str = field(validator=(validators.instance_of(str)))
format: str = field(validator=validators.instance_of(str))
time_shift: str = field(
default="+0000",
validator=[validators.instance_of(str), validators.matches_re(r"[+-]\d{4}")],
validator=(validators.instance_of(str), validators.matches_re(r"[+-]\d{4}")),
)
time_delta: timedelta = field(
default=None, validator=validators.optional(validators.instance_of(timedelta))
Expand Down Expand Up @@ -148,7 +150,7 @@ def _load_event_class_config(self, event_class_dir_path: str) -> EventClassConfi
"""Load the event class specific configuration"""
config_path = os.path.join(event_class_dir_path, "config.yaml")
with open(config_path, "r", encoding="utf8") as file:
event_class_config = yaml.safe_load(file)
event_class_config = yaml.load(file)
self.log.debug("Following class config was loaded: %s", event_class_config)
event_class_config = EventClassConfig(**event_class_config)
if "," in event_class_config.target_path:
Expand Down
4 changes: 1 addition & 3 deletions logprep/processor/calculator/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,7 @@ class CalculatorRule(FieldManagerRule):
class Config(FieldManagerRule.Config):
"""Config for Calculator"""

calc: str = field(
validator=[validators.instance_of(str), validators.min_len(3)],
)
calc: str = field(validator=(validators.instance_of(str), validators.min_len(3)))
"""The calculation expression. Fields from the event can be used by
surrounding them with :code:`${` and :code:`}`."""
source_fields: list = field(factory=list, init=False, repr=False, eq=False)
Expand Down
2 changes: 1 addition & 1 deletion logprep/processor/generic_resolver/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ class Config(FieldManagerRule.Config):
]
)
"""Mapping in form of :code:`{SOURCE_FIELD: DESTINATION_FIELD}`"""
resolve_list: dict = field(validator=[validators.instance_of(dict)], factory=dict)
resolve_list: dict = field(validator=(validators.instance_of(dict)), factory=dict)
"""lookup mapping in form of
:code:`{REGEX_PATTERN_0: ADDED_VALUE_0, ..., REGEX_PATTERN_N: ADDED_VALUE_N}`"""
resolve_from_file: dict = field(
Expand Down
4 changes: 2 additions & 2 deletions logprep/processor/labeler/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

from typing import Optional

from attr import define, field, validators
from attrs import define, field, validators

from logprep.abc.processor import Processor
from logprep.processor.labeler.labeling_schema import LabelingSchema
Expand All @@ -41,7 +41,7 @@ class Labeler(Processor):
class Config(Processor.Config):
"""Labeler Configurations"""

schema: str = field(validator=[validators.instance_of(str)])
schema: str = field(validator=(validators.instance_of(str)))
"""Path to a labeling schema file. For string format see :ref:`getters`."""
include_parent_labels: Optional[bool] = field(
default=False, validator=validators.optional(validator=validators.instance_of(bool))
Expand Down
4 changes: 2 additions & 2 deletions logprep/processor/pre_detector/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,11 +176,11 @@ class Config(Rule.Config): # pylint: disable=too-many-instance-attributes
timestamp_field: str = field(validator=validators.instance_of(str), default="@timestamp")
"""the field which has the given timestamp to be normalized defaults to :code:`@timestamp`"""
source_timezone: ZoneInfo = field(
validator=[validators.instance_of(ZoneInfo)], converter=ZoneInfo, default="UTC"
validator=(validators.instance_of(ZoneInfo)), converter=ZoneInfo, default="UTC"
)
""" timezone of source_fields defaults to :code:`UTC`"""
target_timezone: ZoneInfo = field(
validator=[validators.instance_of(ZoneInfo)], converter=ZoneInfo, default="UTC"
validator=(validators.instance_of(ZoneInfo)), converter=ZoneInfo, default="UTC"
)
""" timezone for target_field defaults to :code:`UTC`"""
failure_tags: list = field(
Expand Down
6 changes: 3 additions & 3 deletions logprep/processor/pseudonymizer/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ class Config(FieldManager.Config):
* /var/git/logprep-rules/pseudonymizer_rules/regex_mapping.json
"""
max_cached_pseudonyms: int = field(
validator=[validators.instance_of(int), validators.gt(0)]
validator=(validators.instance_of(int), validators.gt(0))
)
"""
The maximum number of cached pseudonyms. One cache entry requires ~250 Byte, thus 10
Expand All @@ -127,12 +127,12 @@ class Config(FieldManager.Config):
entry is deleted. Has to be greater than 0.
"""
max_cached_pseudonymized_urls: int = field(
validator=[validators.instance_of(int), validators.gt(0)], default=10000
validator=(validators.instance_of(int), validators.gt(0)), default=10000
)
"""The maximum number of cached pseudonymized urls. Default is 10000.
Behaves similarly to the max_cached_pseudonyms. Has to be greater than 0."""
mode: str = field(
validator=[validators.instance_of(str), validators.in_(("GCM", "CTR"))], default="GCM"
validator=(validators.instance_of(str), validators.in_(("GCM", "CTR"))), default="GCM"
)
"""Optional mode of operation for the encryption. Can be either 'GCM' or 'CTR'.
Default is 'GCM'.
Expand Down
2 changes: 1 addition & 1 deletion logprep/processor/requester/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ class Config(FieldManagerRule.Config):
)
""" (Optional) The http headers as dictionary."""
auth: tuple = field(
validator=[validators.instance_of(tuple)],
validator=(validators.instance_of(tuple)),
converter=tuple,
factory=tuple,
)
Expand Down
4 changes: 2 additions & 2 deletions logprep/processor/timestamper/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,11 +122,11 @@ class Config(FieldManagerRule.Config):
a tag :code:`_timestamper_failure` will be added to the event.
"""
source_timezone: ZoneInfo = field(
validator=[validators.instance_of(ZoneInfo)], converter=ZoneInfo, default="UTC"
validator=(validators.instance_of(ZoneInfo)), converter=ZoneInfo, default="UTC"
)
""" timezone of source_fields. defaults to :code:`UTC`"""
target_timezone: ZoneInfo = field(
validator=[validators.instance_of(ZoneInfo)], converter=ZoneInfo, default="UTC"
validator=(validators.instance_of(ZoneInfo)), converter=ZoneInfo, default="UTC"
)
""" timezone for target_field. defaults to :code:`UTC`"""
mapping: dict = field(default="", init=False, repr=False, eq=False)
Expand Down
8 changes: 4 additions & 4 deletions logprep/util/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,15 +346,15 @@ class LoggerConfig:
The log level of the root logger should be set to :code:`INFO` or higher in production environments
to avoid exposing sensitive information in the logs.
"""
format: str = field(default="", validator=[validators.instance_of(str)], eq=False)
format: str = field(default="", validator=(validators.instance_of(str)), eq=False)
"""The format of the log message as supported by the :code:`LogprepFormatter`.
Defaults to :code:`"%(asctime)-15s %(name)-10s %(levelname)-8s: %(message)s"`.
.. autoclass:: logprep.util.logging.LogprepFormatter
:no-index:
"""
datefmt: str = field(default="", validator=[validators.instance_of(str)], eq=False)
datefmt: str = field(default="", validator=(validators.instance_of(str)), eq=False)
"""The date format of the log message. Defaults to :code:`"%Y-%m-%d %H:%M:%S"`."""
loggers: dict = field(validator=validators.instance_of(dict), factory=dict)
"""The loggers loglevel configuration. Defaults to:
Expand Down Expand Up @@ -469,7 +469,7 @@ class Configuration:
Because of that ensure that the configuration endpoint is always available.
"""
process_count: int = field(
validator=[validators.instance_of(int), validators.ge(1)], default=1, eq=False
validator=(validators.instance_of(int), validators.ge(1)), default=1, eq=False
)
"""Number of logprep processes to start. Defaults to :code:`1`."""
restart_count: int = field(
Expand All @@ -478,7 +478,7 @@ class Configuration:
"""Number of restarts before logprep exits. Defaults to :code:`5`.
If this value is set to a negative number, logprep will always restart immediately."""
timeout: float = field(
validator=[validators.instance_of(float), validators.gt(0)], default=5.0, eq=False
validator=(validators.instance_of(float), validators.gt(0)), default=5.0, eq=False
)
"""Logprep tries to react to signals (like sent by CTRL+C) within the given time.
The time taken for some processing steps is not always predictable, thus it is not possible to
Expand Down
Loading
Loading