Skip to content

lblod/app-openproceshuis

Repository files navigation

Open Proces Huis (microservices stack)

This repo holds the docker compose and all configuration files necessary to get Open Proces Huis running. It was started using the mu-project template and has since been heavily expanded.

Open Proces Huis is one of many applications developed under the Agentschap Binnenlands Bestuur (ABB), which is part of the Flemish Government. It allows for lokale besturen to create and share processes. Such process is created by uploading a BPMN file, which is subsequently stored by the file-service, as well as processed by the bpmn-service. The latter was developed as part of Open Proces Huis and essentially extracts the BPMN elements from a given BPMN file and stores them as triples in the Virtuoso triplestore. To make all functionalities available as a web application, a frontend was also developed.

In addition to the aforementioned services, a range of others are also essential to the stack. All of them are listed in this overview, and can of course also be found in docker-compose.yml.

Getting started

First run

  1. Clone this repository
git clone https://github.com/lblod/app-openproceshuis.git
  1. Run the project
cd /path/to/project
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
  1. Wait for the op-consumer to finish initial ingest
docker compose logs -f op-consumer

When the logs show delta-sync-queue: Remaining number of tasks 0, you can move on.

  1. In your browser, go to localhost:8890/sparql and run the SPARQL query found in manual-query-reasoning-service.sparql.

Usage

  • You can access the frontend in your browser by going to localhost.
  • You can log in using a mock account by going to localhost/mock-login.
  • You can shut everything down by running docker compose down.
  • When restarting the project not having emptied the data/ folder, you can ignore steps 3 and 4 under First run.
  • You can empty the database and file storage by running sudo rm -rf data/ (restarting the project will require steps 3 and 4 under First run).

Data domain

All data is stored as triples in the Virtuoso triplestore. However, the default way of accessing this data, is by using the REST API provided by the mu-migrations-service. This service acts based on domain.lisp, which stipulates how the API classes should be mapped to the resources from the triplestore. What follows, is a visualization of the domain made up of the different API classes, alonside the underlying RDF triples as can be found in the triplestore.

Open Proces Huis data domain

The prefixes used in the diagram are equivalent to the ones used throughout the project. Their definitions can be found in repository.lisp.

The definition of Group can be found in auth.json, alongside all other classes that are necessary for user management (not visible in the diagram).

The BpmnElement class is in fact only an interface for all true BPMN element classes available. These are mapped onto RDF resources that comply with the BPMN Based Ontology (BBO).

Dispatching

Different services from the stack handle different HTTP requests. The mu-dispatcher service makes sure each request gets dispatched to the correct service. The exact dispatching rules are described in dispatcher.ex.

Mu-search

Organizations typically have multiple processes, with each process covering one or more BPMN files, and each BPMN file holding multiple process steps. This obviously results in lots of BpmnElement resources being stored in the triplestore. Also, since all data is public, retrieving a list of process steps, forces the query engine to go over all organizational graphs, which naturally leads to long waiting times for users. To make matters worse, the frontend's process steps route introduces filters (e.g. filtering out archived process's process steps) that translate to SPARQL queries with very long RDF paths, which the query engine needs to traverse for every BpmnElement it has found. In light of better user experience, it was thus decided to make use of mu-search for the retrieval of process steps.

In essence, mu-search serves as a bridge between a triplestore and an elasticsearch cluster. The triplestore still serves as the storage medium for the original linked data, but part of that data is also indexed by an elasticsearch cluster to allow for better retrieval performance, as well as having more options regarding search queries. Which part of the original data should be indexed, is defined in a configuration file.

Configuration

The OPH mu-search configuration file on the one hand determines how process steps and the necessary attributes should be stored, and on the other defines which indexes should be created on start-up.

Types

Each element found in a BPMN process is considered a process step. In other words, multiple RDF types fall under the BpmnElement denominator. The configuration's types section defines that any resource of a given RDF type (these are all BBO types) can be considered a bpmn-element and indexed as such. Based on what the frontend's process steps route needs, a series of properties is subsequently defined. This not only includes direct properties, but also nested ones:

  • name
  • type.label
  • type.key
  • bpmn-process.bpmn-file.name
  • bpmn-process.bpmn-file.status
  • bpmn-process.bpmn-file.processes.title
  • bpmn-process.bpmn-file.processes.status

Indexes

Mu-search is implemented to create an elasticsearch index for each authorization group. When it receives a request from an authorization group it already has an index for, it will retrieve the requested data in the corresponding index. In the other case, it will first create a new index by querying the triplestore for data as defined in the types section. However, since the triplestore will typically hold an enormous amount of process steps, this indexing step would take a long time, again negatively impacting the user experience. For that reason, eager_indexing_groups are defined, forcing elasticsearch to create an index for each of the given authorization groups when the service is spinned up for the first time.

Since there is a large amount of organizations that can access OPH, this eager indexing step will take a long time to complete and take in a considerable amount of storage space. Naturally, this would only bother developers while spinning up an instance of the application locally. For that reason, the main configuration file (config/search/dev/config.json) only defines the public authorization groups. For production purposes, on the other hand, an extended configuration file (config/search/prd/config.json) is provided, containing all authorization groups that might at one point access OPH.

The production-intended configuration was generated by the generate-elastic-config-prd.rb script, and can be recreated as such whenever deemed necessary.

At the time of writing, 3541 indeces were necessary for the production-intended configuration. Taking into account that elasticsearch by default creates two shards per index, the necessary amount of shards (2 x 3541 = 7082) significantly surpasses elasticsearch's default maximum of 1000 shards. To expand this maximum to a sufficient 8000, the increase-shards-elastic.sh script can be run on the machine that needs to hold this large number of indexes.

The reset-elastic.sh script can be run to properly remove all elasticsearch indexes and associated data.

Overview of services