-
Notifications
You must be signed in to change notification settings - Fork 24
5.5. Stream Specification (STR)
A Stream Specification file (also called an STR File) is a CSV file containing the values for a stream's properties. This section describes the elements of such CSV file, followed by an example.
Stream specifications identify resources necessary for ingesting the content of CSV files and messaging systems' messages into HADatAc's Knowledge Graph. An STR is made up of three tables (sheets) inside of a spreadsheet: InfoSheet, FileStream, and MessageStream.
Infosheet
The Infosheet is used to identify a study, an optional global (semantic) data dictionary, and a pointer to the data acquisition table. An example Infosheet template is shown below.
Attribute | Value |
---|---|
Study_ID | study name (without the "STD-" prefix) |
Global_SDD | SDD name (without the "SDD-" prefix) |
Stream_protocol | (To be used with message streams) Type of message stream: "mqtt" or "http" |
Stream_IP | (To be used with message streams) IP address of the stream |
Stream_Port | (To be used with message streams) Port number of the stream |
Stream_Topic | (To be used with message streams) Top-level label used to retrieve the full payload of a message stream |
FileStream | Fixed value of "#FILESTREAM" |
MessageStream | Fixed value of "#MESSAGESTREAM" |
File Stream
Column Header | Column Description |
---|---|
da name | a name-template for selecting incoming files or the IP address of a messaging broadcaster |
data dict | the data dictionary of choice for processing the file(s) |
deployment uri | URI of deployment originating data file's content |
cell scope | cell level scope (see explanation below in this section) |
owner email | the email address of the person who is the actual owner of the data |
permission url | permission policy for accessing the data |
da name: This field is used when specifying file streams. An STR file identifies a common ‘prefix’ that is assigned to a given collection of data files that are going to have the same owner, same study, and most important, be ingested using a common SDD. For instance, let say that we have files named "File1.csv" and "File2.csv", and that both files need to be ingested using SDD-ABC.xls. In this case, we can do all the following:
- Create an STR file with da name=‘Z’ and data dict='SDD-ABC';
- Rename File1.csv to "DA-Z-1.csv";
- Rename File2 to "DA-Z-2.csv";
- Submit SDD-ABC.xls for ingestion;
- Submit DA-Z-1.csv for ingestion;
- Submit DA-Z-2.csv for ingestion.
data dict: The data dictionary of the STR identifies the SDD document that describes how objects identified during the parsing of the SDD file content map to object types. The precise identification of which object is mapped to each value is specified by the row-scope attribute and cell-scope attribute described below.
deployment uri:The URI of a deployment(specified in a DPL file)
cell Scope: Objects associated with a data file need to be either instantiated in HADatAc's knowledge base or they need to be objects created during the ingestion of a file with the use of an SDD that specifies dynamic objets. The cell scope is used when objects in a single row belong to multiple objects that are not associated with each other through known semantic relationships, for instance, when a single row have information from two or more subjects instead of a single objects. If all the attributes of interest in a given file come from a single object, and one of the cells of the row is the identifier of the object, there is no need to specify a cell scope (that can be left blank).
owner email: Email of a registered user who is going to be assigned of the content extracted from the file that is going to be ingested
Message Stream
Column Header | Column Description |
---|---|
topic | the label used to retrieve the payload of a message stream |
deployment uri | URI of deployment originating data file's content |
cell scope | cell level scope (see explanation below in this section) |
owner email | the email address of the person who is the actual owner of the data |
permission url | permission policy for accessing the data |
topic: the topic used in combination with the stream's IP and port number that retrieves the payload that corresponds to the selected topic.
deployment uri: The URI of a deployment (as specified in a DPL file and with the use of name space prefix)
cell Scope: Objects associated with a data file need to be either instantiated in HADatAc's knowledge base or they need to be objects created during the ingestion of a file with the use of an SDD that specifies dynamic objets. The cell scope is used when objects in a single row belong to multiple objects that are not associated with each other through known semantic relationships, for instance, when a single row have information from two or more subjects instead of a single objects. If all the attributes of interest in a given file come from a single object, and one of the cells of the row is the identifier of the object, there is no need to specify a cell scope (that can be left blank).
owner email: Email of a registered user who is going to be assigned of the content extracted from the file that is going to be ingested
Let assume that we have three files names DA-demographics001.csv, DA-demographics002.csv and DA-demographics003.csv, that the three files need to be processed by a common SDD named SDD-demographics.xsl, and that the data files are related to a common set of objects. Under these assumptions, one STR file can be used to assign how content extracted from the files will be assigned to the common set of objects.
Study ID | da name | data dict | deployment uri | cell scope | owner email | permission uri |
---|---|---|---|---|---|---|
2016-1234 | demographics | SDD-demographics | proj:quest | [email protected] | http://example\#team |
The table above shows an example of how a semantic dada dictionary SDD-demographics is assigned to parse the content of the file starting with the name DA-demographics. In fact, the STR specification assumes that any data file name starts with DA- plus the data-file-name value in the STR.
The STR above further specifies that [email protected] is the email of the owner of the data content of any file that starts with the name DA-demographics, and that anyone with permission http://example#team (according to HADatAc's data access policy) has access to content extracted from any file matching the data-file-name identified in the STR.
Copyright (c) 2019, HADatAc.org
![](https://raw.githubusercontent.com/paulopinheiro1234/hadatac-screenshots/master/hadatac-logo.png)
-
Installation
1.1. Installing for Linux (Production)
1.2. Installing for Linux (Development)
1.3. Installing for MacOS (Development)
1.4. Deploying with Docker (Production)
1.5. Deploying with Docker (Development)
1.6. Installing for Vagrant under Windows
1.7. Upgrading
1.8. Starting HADatAc
1.9. Stopping HADatAc -
Setting Up
2.1. Software Configuration
2.2. Knowledge Graph Bootstrap
2.2.1. Knowledge Graph
2.2.2. Bootstrap without Labkey
2.2.3. Bootstrap with Labkey
2.3. Config Verification -
Using HADatAc
3.1. Initial Page
3.1.1. Home Button
3.1.2. Sandbox Mode Button
3.2. File Ingestion
3.2.1. Ingesting Study Content
3.2.2. Manual Submission of Files
3.2.3. Automatic Submission of Files
3.2.4. Data File Operations
3.3. Manage Working Files 3.3.1. [Create Empty Semantic File from Template]
3.3.2. SDD Editor
3.3.3. DD Editor
3.4. Manage Metadata
3.4.1. Manage Instrument Infrastructure
3.4.2. Manage Deployments 3.4.3. Manage Studies
3.4.4. [Manage Object Collections]
3.4.5. Manage Streams
3.4.6. Manage Semantic Data Dictionaries
3.4.7. Manage Indicators
3.5. Data Search
3.5.1. Data Faceted Search
3.5.2. Data Spatial Search
3.6. Metadata Browser and Search
3.7. Knowledge Graph Browser
3.8. API
3.9. Data Download -
Software Architecture
4.1. Software Components
4.2. The Human-Aware Science Ontology (HAScO) -
Metadata Files
5.1. Deployment Specification (DPL)
5.2. Study Specification (STD)
5.3. Semantic Study Design (SSD)
5.4. Semantic Data Dictionary (SDD)
5.5. Stream Specification (STR) -
Content Evolution
6.1. Namespace List Update
6.2. Ontology Update
6.3. [DPL Update]
6.4. [SSD Update]
6.5. SDD Update -
Data Governance
7.1. Access Network
7.2. User Status, Categories and Access Permissions
7.3. Data and Metadata Privacy - HADatAc-Supported Projects
- Derived Products and Technologies
- Glossary