Skip to content

GOKb Data Model Explanation

Moritz Horn edited this page Sep 26, 2018 · 6 revisions

Fundamental

Parts Explosion

The gokb datamodel is based upon the parts explosion pattern, and in it's simplest form can be represented as follows:

The fundamental unit in the PE model is the component (KBComponent) and the fundamental connectivity is provided by the Combo. A component may be recursively "Exploded" by navigating any of its 1:M combo properties. Combo objects carry a discriminator property so we can label the relationship (EG Title -> Publisher combo might have a type "Title-publisher". Combo relationships are considered to be transitive).

Extended Entity - Relational / Object Relational Concepts

GOKb exploits the fundamental underlying Hibernate ORM by subclassing components as needed for specific application types, for example, a component can represent any of a TitleInstance, a Package, a Platform, a TIPP (See later), an Organisation - or many other fundamental system concepts. Class herarchies are generally not used in the data model outside the KBComponent structure, there are some exceptions tho - just watch for the extends <> marker on objects in the domain directory.

When does GOKb use a Component vs a normal Object/Table

Good question -- and I'm afraid the answer is it depends. Each case is a judgement call. The Component-Combo pair is extremely expressive, and it allows us to dynamically connect any component to any other component without changing the data model. This flexibility is extremely helpful in a dynamic situation - for example, the datamodel was recently extended to include eBook items. This was achieved with minimum impact because packages are composed fundamentally of components, so adding a component of "eBook" made it easy for collections of titles to include both books and journals. However this flexibility comes at a cost - more complex queries and the need to join through the combo table even when a simple join would have sufficed. --However-- our experience is that with ERM, most properties that you think to be scalar (eg the publisher of a journal) actually turn out to be collections over time - IE what we think of as publisher is really a marker on the current organisation publishing the title, and publishers can change over time.

Therefore, the choice to exploit the more expressive, slightly slower combo approach is not made lightly, but depending upon the deeper case being modelled. If we think there is even a hint that what we think is a scalar is actually a collection we usually opt for the combo method, unless performance indicates otherwise.

Whats a TIPP

A tipp is a ternary relation that joins a Title Instance, a Package and a Platform. We have a business rule that TIPP objects are immutable - they can be replaced, but not (usually) edited. Later this rule softened to say that the fundamental relations in a TIPP are immutable, but properties can be amended / extended without creating a new TIPP. The concept was borrowed from the KB+ (Originally KB+) datamodel where the arrangement WITHOUT the combo mechanism looks like this:

Why/How does this datamodel relate to other implementations / datastore technologies

The functional requirements of an ERM require access paths through the data which usually start with each of the three entities that join to create the TIPP. Users want to see what titles are in a package, or on a platform (Package -> TIPP -> Title or Platform -> TIPP -> Title). Sometimes users will ask "Show me all the packages this title is available" (Title -> TIPP -> Package). ERMs commonly provide 2 core functions : Feeding configuration files to Link resolvers - usually "Packages I have Bought -> Packages -> TIPP -> Title" - gives me a file of everything my link resolver should be able to look up. The other functions relate to back-office issues, and answering questions like "Find me all the titles I am paying for under two different subscriptions".

The need to attack the ternary relationship from three different root problems is a good match for the capabilities of relational engines (We've found PostgreSQL to perform really well, and MySQL to be perfectly adequate for sub million-item setups).

Converting this model to a document store might not be friction free - the assumptions are relational and atomic in nature, and it's not clear to us how well this approach might just exploit a document store. In particular, queries which SQL makes possible (If not trivially easy) will involve traversing and linking many collections together, or deduplicating large portions of the data structure. YMMV. Worth a note tho is post-relational stores like NEO4J - The Combo approach is a slight fudge of the relational model, but something the newer semantic stores seem to take to heart - given a clean sheet, the team are tempted by the appeal of newer semantic databases like NEO.

Examples

This section illustrates some of the basic relations

Titles in a package

A package is formed of many TIPP objects, and a TIPP is a ternary relation of a Package, Title and Platform. To navigate from Package to Title, the following access path is needed

<<KBComponent>> Package#1
  Combo - Type::Package-TIPP, fromComponent=Package#1, toComponent=TIPP#1
    <<KBComponent>> TIPP#1
      Combo - Type::Title-TIPP, fromComponent=Title#1, toComponent=TIPP#1
        <<KBComponent>> Title#1

Obviously, a package consists of MANY tipps, so the lines after the package are repeated for each item in the package. Relational tech makes this perform well in a single operation. Iterating over each item in the package and assembling the data by hand would not perform well.