Skip to content

Commit

Permalink
Merge branch 'master' into 370-create-proper-enums-for-job-statuses
Browse files Browse the repository at this point in the history
  • Loading branch information
michalkrzem authored Oct 17, 2024
2 parents e909efe + 0861024 commit b3ddd37
Show file tree
Hide file tree
Showing 17 changed files with 120 additions and 106 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test_code.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ jobs:
with:
node-version: '12.x'
- name: install dependencies
run: npm install
run: yarn install
- name: build
run: npm build
test_unit:
Expand Down
4 changes: 0 additions & 4 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ mkdir samples
# expect files in ./samples directory, and keep index in ./index.
vim .env
docker compose up --scale daemon=3 # this will take a while
docker compose exec web python3 -m mquery.db
docker compose exec web python3 -m alembic upgrade head
```

- Good for testing mquery and production deployments on a single server
Expand All @@ -39,8 +37,6 @@ cd mquery
# expect files in ./samples directory, and keep index in ./index.
vim .env
docker compose -f docker-compose.dev.yml up # this will take a while
docker compose exec dev-web python3 -m mquery.db
docker compose exec dev-web python3 -m alembic upgrade head
```

- Good for development - all file changes will be picked up automatically.
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
graft src/mqueryfront/dist
include src/alembic.ini
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ git clone https://github.com/CERT-Polska/mquery.git
cd mquery
vim .env # optional - change samples and index directory locations
docker compose up --scale daemon=3 # building the images will take a while
docker compose exec web python3 -m mquery.db
docker compose exec web python3 -m alembic upgrade head
```

The web interface should be available at `http://localhost`.
Expand Down
2 changes: 1 addition & 1 deletion deploy/docker/dev.frontend.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ FROM node:18 AS build
RUN npm install -g serve
COPY src/mqueryfront /app
WORKDIR /app
RUN npm install --legacy-peer-deps
RUN yarn install --legacy-peer-deps
CMD ["npm", "start"]
2 changes: 1 addition & 1 deletion deploy/docker/web.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM node:18 AS build
RUN npm install -g serve
COPY src/mqueryfront /app
WORKDIR /app
RUN npm install --legacy-peer-deps && npm run build
RUN yarn install --legacy-peer-deps && npm run build

FROM python:3.10

Expand Down
3 changes: 2 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Relevant for people who want to run mquery in production or on a a bigger scale.
- [On-disk format](./ondiskformat.md): Read if you want to understand ursadb's on
disk format (spoiler: many files are just JSON and can be inspected with vim).
- [Plugin system](./plugins.md): For filtering, processing and tagging files.
- [Database format](./redis.md): Information about the data stored in redis.
- [Database format](./database.md): Information about the data stored in the database.
- [Redis applications](./redis.md): Of historical interest, redis is used only for [rq](https://python-rq.org/) now.
- [User management](./users.md): Control and manage access to your mquery instance.
- [API](./api.md): Mquery exposes a simple API that you may use for your automation.
54 changes: 54 additions & 0 deletions docs/database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# How the data is stored in the database

Currently, Postgres database is used to keep entities used by mquery.

With the default docker configuration, you can connect to the database
using the following oneliner:

```
sudo docker compose exec postgres psql -U postgres --dbname mquery
```

The followiung tables are defined:

### Job table (`job`)

Jobs are stored in the `job` table.

Every job has ID, which is a random 12 character string like 2OV8UP4DUOWK (the
same string that is visible in urls like http://mquery.net/query/2OV8UP4DUOWK).

Possible job statuses are:

* "new" - Completely new job.
* "inprogress" - Job that is in progress.
* "done" - Job that was finished
* "cancelled" - Job was cancelled by the user or failed
* "removed" - Job is hidden in the UI (TODO: remove this status in the future)

### Job agent table (`jobagent`)

It is a simple mapping between job_id and agent_id. Additionaly, it keeps track
of how many tasks are still in progress for a given agent assigned to this job.

### Match table (`match`)

Matches represent files matched to a job.

Every match represents a single yara rule match (along with optional attributes
from plugins).

### AgentGroup table (`agentgroup`)

When scheduling jobs, mquery needs to know how many agent groups are
waiting for tasks. In most cases there is only one, but in distributed environment
there may be more.

### Configuration table (`configentry`)

Represented by models.configentry.ConfigEntry class.

For example, `plugin:TestPlugin` will store configuration for `TestPlugin` as a
dictionary. All plugins can expose their own arbitrary config options.

As a special case `plugin:Mquery` keeps configuration of the mquery itself.
3 changes: 1 addition & 2 deletions docs/how-to/install-native.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ we can start a web server.

```shell
cd /opt/mquery/src/mqueryfront
npm install --legacy-peer-deps
yarn install --legacy-peer-deps
npm run build
```

Expand Down Expand Up @@ -118,7 +118,6 @@ Now you need to create and configure a database
```shell
psql -c "CREATE DATABASE mquery"
source /opt/mquery/venv/bin/activate # remember, we need virtualenv
python3 -m mquery.db # initialize the mquery database
```

### Start everything
Expand Down
71 changes: 5 additions & 66 deletions docs/redis.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,16 @@
# How the data is stored in redis

In the older mquery versions, data used to be stored in Redis. In mquery
version 1.4.0 the data was migrated to a postgresql - see [database](./database.md).

Please note that all this is 100% internal, and shouldn't be relied on.
Data format in redis can and does change between mquery releases.

Right now mquery is in the process of migrating internal storage to Postgres.

### Why redis?

Because very early daemon was a trivial piece of code, and Redis as a job
queue was the easiest solution. Since then mquery got extended with (in
no particular order) batching, users, jobs, commands, task cancellations,
distributed agents, configuration, and more.

I have thus learned the hard way that Redis is not a good database.

Nevertheless, that ship has sailed. There are no plans of migrating mquery
to another database. What we can do is to document the current data format.

### Redis quickstart

To connect to redis use `redis-cli`. For docker compose use
`docker compose exec redis redis-cli`.
You can use `redis-cli` to connect to redis. With the default docker compose configuration,
use `docker compose exec redis redis-cli`.

Redis command documentation is pretty good and available at https://redis.io/commands/.

### Job table (`job`)

Jobs are stored in the `job` table.

Every job has ID, which is a random 12 character string like 2OV8UP4DUOWK (the
same string that is visible in urls like http://mquery.net/query/2OV8UP4DUOWK).

Possible job statuses are:

* "new" - Completely new job.
* "inprogress" - Job that is in progress.
* "done" - Job that was finished
* "cancelled" - Job was cancelled by the user or failed
* "removed" - Job is hidden in the UI (TODO: remove this status in the future)

### Match table (`match`)

Matches represent files matched to a job.

Every match represents a single yara rule match (along with optional attributes
from plugins).

### Agentjob objects (`agentjob:*`)

Agentjob is a simple String (but only used as an integer).

In distributed environment it's sometimes hard to say when exactly agent's job
is finished. To work around this, each agent keeps a number of pending tasks
using agentjob key. For example, for job `123456123456` and agent `default`, redis key
`agentjob:default:123456123456` will contain the number of pending tasks.

This only matters during the task execution and can be discarded after task is done.

### AgentGroup table (`agentgroup`)

When scheduling jobs, mquery needs to know how many agent groups are
waiting for tasks. In most cases there is only one, but in distributed environment
there may be more.

### Configuration table (`configentry`)

Represented by models.configentry.ConfigEntry class.

For example, `plugin:TestPlugin` will store configuration for `TestPlugin` as a
dictionary. All plugins can expose their own arbitrary config options.

As a special case `plugin:Mquery` keeps configuration of the mquery itself.

### Rq objects (`rq:*`)

Objects used internally by https://python-rq.org/, task scheduler used by mquery.
Expand Down
23 changes: 18 additions & 5 deletions docs/users.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Optional user management in mquery is role-based, and handled by OIDC.

## Role-based permissions

There are two predefined permission sets that can be assigned to users:
There are three predefined permission sets that can be assigned to users:

- `admin`: has access to everything, including management features.
Can change the service configuration, manage datasets, etc.
Expand All @@ -18,6 +18,8 @@ There are two predefined permission sets that can be assigned to users:
create new search jobs, see and cancel every job, and download
matched files. In current version, users can see and browse
all jobs in the system.
- `nobody`: empty role that gives no access to anything. Useful
for anonymous users.

Role names are considered stable, and will continue to work in the future.

Expand All @@ -35,6 +37,17 @@ may change in some future new version.
(**Note**: in the current version there is no isolation between users, and
users can view/stop/delete each other queries. This may change in the future)

## OIDC quickstart

In the `/config` section set:
* `auth_default_roles` to "nobody" or "user" (this is a role for anonymous users)
* `openid_client_id`, `openid_url`, `openid_secret` as required for your OIDC server
(secret should be a RS256 key)
* `auth_enabled` to "true" (to this last, to avoid locking yourself out).

If something goes wrong, you need to manually fix the config in the database
(to disable auth: `delete from configentry where key='auth_enabled'`).

## OIDC integration

Mquery doesn't implement user management. Instead, this is delegated
Expand Down Expand Up @@ -97,8 +110,8 @@ as necessary for your deployment.
**Warning** the proces is tricky, and right now it's missing a proper validation.
It's possible to lock yourself out (by enabling auth before configuring it
correctly). If you do this, you have to disable auth manually, by running
`redis-cli` (`sudo docker compose exec redis redis-cli` for docker) and
executing `HMSET plugin:Mquery auth_enabled ""`.
`redis-cli` (`sudo docker compose exec postgres psql -U postgres --dbname mquery` for docker) and
executing `delete from configentry where key='auth_enabled';`.

**Step 0 (optional): enable auth in non-enforcing mode**

Expand Down Expand Up @@ -146,8 +159,8 @@ Get it from `http://localhost:8080/auth/admin/master/console/#/realms/myrealm/ke

**Step 3: enable auth in enforcing mode**

- Go to the `config` page in mquery. Ensure `auth_default_roles` is
an empty string.
- Go to the `config` page in mquery. Change `auth_default_roles` to "user" or "nobody", depending on your needs.
- **Don't leave `auth_default_roles` empty**, for compatibility reasons this gives admin permissions for every user.
- Set `auth_enabled` to `true`

Final result:
Expand Down
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
"mquery.lib",
"mquery.plugins",
"mquery.models",
"mquery.migrations",
"mquery.migrations.versions",
],
package_dir={"mquery": "src"},
include_package_data=True,
Expand Down
2 changes: 1 addition & 1 deletion src/alembic.ini
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[alembic]
script_location = migrations
script_location = %(here)s/migrations
prepend_sys_path = .
version_path_separator = os # Use os.pathsep. Default configuration used for new projects.

Expand Down
13 changes: 11 additions & 2 deletions src/app.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from contextlib import asynccontextmanager
import os

import uvicorn # type: ignore
Expand Down Expand Up @@ -44,8 +45,15 @@
ServerSchema,
)


@asynccontextmanager
async def lifespan(app: FastAPI):
db.alembic_upgrade()
yield


db = Database(app_config.redis.host, app_config.redis.port)
app = FastAPI()
app = FastAPI(lifespan=lifespan)


def with_plugins() -> Iterable[PluginManager]:
Expand Down Expand Up @@ -180,7 +188,8 @@ def expand_role(role: str) -> List[str]:
"""Some roles imply other roles or permissions. For example, admin role
also gives permissions for all user permissions.
"""
role_implications = {
role_implications: Dict = {
"nobody": [],
"admin": [
"user",
"can_list_all_queries",
Expand Down
18 changes: 8 additions & 10 deletions src/db.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
from alembic.config import Config
from alembic import command
from pathlib import Path
from collections import defaultdict
from contextlib import contextmanager
from typing import List, Optional, Dict, Any
Expand All @@ -9,7 +12,6 @@
from rq import Queue # type: ignore
from sqlmodel import (
Session,
SQLModel,
create_engine,
select,
and_,
Expand Down Expand Up @@ -337,7 +339,7 @@ def get_core_config(self) -> Dict[str, str]:
return {
# Autentication-related config
"auth_enabled": "Enable and force authentication for all users ('true' or 'false')",
"auth_default_roles": "Comma separated list of roles available to everyone (available roles: admin, user)",
"auth_default_roles": "Roles assigned to everyone - including anonymous users (available roles: admin, user, nobody)",
# OpenID Authentication config
"openid_url": "OpenID Connect base url",
"openid_client_id": "OpenID client ID",
Expand Down Expand Up @@ -410,11 +412,7 @@ def set_config_key(self, plugin_name: str, key: str, value: str) -> None:
session.add(entry)
session.commit()


def init_db() -> None:
engine = create_engine(app_config.database.url, echo=True)
SQLModel.metadata.create_all(engine)


if __name__ == "__main__":
init_db()
def alembic_upgrade(self) -> None:
config_file = Path(__file__).parent / "alembic.ini"
alembic_cfg = Config(str(config_file))
command.upgrade(alembic_cfg, "head")
2 changes: 1 addition & 1 deletion src/mqueryfront/src/config/ConfigEntries.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import api from "../api";

const R_BOOL = /^(|true|false)$/;
const R_URL = /^(https?:\/\/.*)$/;
const R_ROLES = /^((admin|user)(,(admin|user))*)?$/;
const R_ROLES = /^((admin|user|nobody)(,(admin|user|nobody))*)?$/;

const KNOWN_RULES = {
openid_url: R_URL,
Expand Down
Loading

0 comments on commit b3ddd37

Please sign in to comment.