diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e43b0f9 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.DS_Store diff --git a/README.md b/README.md index b498833..d0e6e61 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,43 @@ -# E-ARK-SIP -E-ARK SIP specification +# E-ARK General SIP specification -This page describes the SIP package structure and minimum set of required metadata for SIP delivery to the archive. It is fully compliant with the Common Specification for Information Packages. +This Git repository aims to describes the E-ARK SIP package structure and minimum set of required metadata for SIP delivery to the archive. It is fully compliant with the Common Specification for Information Packages. + +## Target audience The target group for this document are records creators, archival institutions and software providers creating or updating their SIP format specifications. - -About history: +## The specification + +### Final versions + +Final versions of the specification are conveniently published at the [DILCIS Board web site](http://dilcis.eu/specifications/sip) on PDF format. + + +### Draft versions + +The most up-to-date version of the SIP specification is being managed in markdown format in this GitHub repository. + +This is a draft version of the specification that is being collaboratively edited by multiple experts. + +An HTML version of the E-ARK Submission Information Package Specification is available on the +[specification folder](./specification/) of this repository. + +See [Markdown documentation ](https://guides.github.com/features/mastering-markdown/) for a deeper understanding on how to edit Markdown documents. + + + +## Previous versions of the specification + +Previous versions of the specification are available on the [archive](./archive/) folder. + + +## History + +In 2014, the E-ARK project conducted a survey and published a report on available best practices. The report provided, among other outcomes, an overview of SIP formats used in memory institutions and supported by tools. -In 2014, the E-ARK project conducted a survey and published a report on available best practices. The report provided, among other outcomes, an overview of SIP formats used in memory institutions and supported by tools. The E-ARK project analysed the formats and then delivered a first version of a harmonised SIP format based on that - Deliverable 3.2 E-ARK SIP Draft Specification. That deliverable gave an overview of the structure and main metadata elements for the SIP and provided initial input for the technical implementations of pre-ingest and ingest tools. It was followed by Deliverable 3.3 which extended the previous one by providing a revised version of the D3.2 content, adding more details relevant for tool development and implementation, and describing specific profiles for the transfer of relational databases, electronic records management systems (ERMS) and simple file system based records (SFSB). The version 0.14 is based on the deliverable 3.3 and the feedback received from pilot projects. +The E-ARK project analysed the formats and then delivered a first version of a harmonised SIP format based on that - Deliverable 3.2 E-ARK SIP Draft Specification. +That deliverable gave an overview of the structure and main metadata elements for the SIP and provided initial input for the technical implementations of pre-ingest and ingest tools. It was followed by Deliverable 3.3 which extended the previous one by providing a revised version of the D3.2 content, adding more details relevant for tool development and implementation, and describing specific profiles for the transfer of relational databases, electronic records management systems (ERMS) and simple file system based records (SFSB). +The version 1.4 is based on the deliverable 3.3 and the feedback received from pilot projects. diff --git a/archive/README.md b/archive/README.md new file mode 100644 index 0000000..82a6245 --- /dev/null +++ b/archive/README.md @@ -0,0 +1,3 @@ +# General SIP Specification + +In this folder you will find the published versions of the General SIP specification. diff --git a/archive/v1.4/General_SIP Specification_v1.4.docx b/archive/v1.4/General_SIP Specification_v1.4.docx new file mode 100644 index 0000000..33cf0ad Binary files /dev/null and b/archive/v1.4/General_SIP Specification_v1.4.docx differ diff --git a/archive/v1.4/General_SIP Specification_v1.4.pdf b/archive/v1.4/General_SIP Specification_v1.4.pdf new file mode 100644 index 0000000..31b9ce8 Binary files /dev/null and b/archive/v1.4/General_SIP Specification_v1.4.pdf differ diff --git a/archive/v1.4/images/image1.png b/archive/v1.4/images/image1.png new file mode 100644 index 0000000..4364eb3 Binary files /dev/null and b/archive/v1.4/images/image1.png differ diff --git a/archive/v1.4/images/image10.png b/archive/v1.4/images/image10.png new file mode 100644 index 0000000..0c65af3 Binary files /dev/null and b/archive/v1.4/images/image10.png differ diff --git a/archive/v1.4/images/image11.png b/archive/v1.4/images/image11.png new file mode 100644 index 0000000..1008f1a Binary files /dev/null and b/archive/v1.4/images/image11.png differ diff --git a/archive/v1.4/images/image12.png b/archive/v1.4/images/image12.png new file mode 100644 index 0000000..ed1c116 Binary files /dev/null and b/archive/v1.4/images/image12.png differ diff --git a/archive/v1.4/images/image13.png b/archive/v1.4/images/image13.png new file mode 100644 index 0000000..4a17cd2 Binary files /dev/null and b/archive/v1.4/images/image13.png differ diff --git a/archive/v1.4/images/image14.png b/archive/v1.4/images/image14.png new file mode 100644 index 0000000..0440465 Binary files /dev/null and b/archive/v1.4/images/image14.png differ diff --git a/archive/v1.4/images/image15.png b/archive/v1.4/images/image15.png new file mode 100644 index 0000000..eda6871 Binary files /dev/null and b/archive/v1.4/images/image15.png differ diff --git a/archive/v1.4/images/image2.png b/archive/v1.4/images/image2.png new file mode 100644 index 0000000..b34e319 Binary files /dev/null and b/archive/v1.4/images/image2.png differ diff --git a/archive/v1.4/images/image3.png b/archive/v1.4/images/image3.png new file mode 100644 index 0000000..a280b44 Binary files /dev/null and b/archive/v1.4/images/image3.png differ diff --git a/archive/v1.4/images/image4.png b/archive/v1.4/images/image4.png new file mode 100644 index 0000000..464688f Binary files /dev/null and b/archive/v1.4/images/image4.png differ diff --git a/archive/v1.4/images/image5.emf b/archive/v1.4/images/image5.emf new file mode 100644 index 0000000..e2a19be Binary files /dev/null and b/archive/v1.4/images/image5.emf differ diff --git a/archive/v1.4/images/image5.png b/archive/v1.4/images/image5.png new file mode 100644 index 0000000..768b819 Binary files /dev/null and b/archive/v1.4/images/image5.png differ diff --git a/archive/v1.4/images/image6.emf b/archive/v1.4/images/image6.emf new file mode 100644 index 0000000..127d89f Binary files /dev/null and b/archive/v1.4/images/image6.emf differ diff --git a/archive/v1.4/images/image6.png b/archive/v1.4/images/image6.png new file mode 100644 index 0000000..2e32a59 Binary files /dev/null and b/archive/v1.4/images/image6.png differ diff --git a/archive/v1.4/images/image7.emf b/archive/v1.4/images/image7.emf new file mode 100644 index 0000000..75fe647 Binary files /dev/null and b/archive/v1.4/images/image7.emf differ diff --git a/archive/v1.4/images/image7.png b/archive/v1.4/images/image7.png new file mode 100644 index 0000000..c433edd Binary files /dev/null and b/archive/v1.4/images/image7.png differ diff --git a/archive/v1.4/images/image8.emf b/archive/v1.4/images/image8.emf new file mode 100644 index 0000000..1943d44 Binary files /dev/null and b/archive/v1.4/images/image8.emf differ diff --git a/archive/v1.4/images/image8.png b/archive/v1.4/images/image8.png new file mode 100644 index 0000000..b779fdc Binary files /dev/null and b/archive/v1.4/images/image8.png differ diff --git a/archive/v1.4/images/image9.png b/archive/v1.4/images/image9.png new file mode 100644 index 0000000..b0aa7e3 Binary files /dev/null and b/archive/v1.4/images/image9.png differ diff --git a/archive/v1.4/~$neral_SIP Specification_v1.4.txt b/archive/v1.4/~$neral_SIP Specification_v1.4.txt new file mode 100644 index 0000000..80f358c Binary files /dev/null and b/archive/v1.4/~$neral_SIP Specification_v1.4.txt differ diff --git a/specification/00.01-authors/index.md b/specification/00.01-authors/index.md new file mode 100644 index 0000000..d202574 --- /dev/null +++ b/specification/00.01-authors/index.md @@ -0,0 +1,15 @@ +Authors +------- + +| Name | Organisation | +| -------------------------------- | -------------------------------------------------- | +| Tarvo Kärberg | National Archives of Estonia | +| Anders Bo Nielsen | Danish National Archives | +| Björn Skog | ES Solutions | +| Gregor Zavrsnik | Slovenian National Archives | +| Hélder Silva | KEEP SOLUTIONS | +| Karin Bredenberg | National Archives of Sweden | +| Kathrine Hougaard Edsen Johansen | Danish National Archives | +| Levente Szilágyi | National Archives of Hungary | +| Phillip Mike Tømmerholt | Danish National Archives | +| Miguel Ferreira | KEEP SOLUTIONS | diff --git a/specification/00.02-history/index.md b/specification/00.02-history/index.md new file mode 100644 index 0000000..098e01c --- /dev/null +++ b/specification/00.02-history/index.md @@ -0,0 +1,38 @@ +Revision History +---------------- + +| Revision No. | Date | Authors(s) | Organisation | Description | +|--------------|------------|----------------------------------|------------------------|-----------------------------------------------------------------------| +| 0.1 | 20.10.2014 | Tarvo Kärberg | NAE | First draft. | +| 0.2 | 13.11.2014 | Tarvo Kärberg | NAE | Updating content. | +| 0.3 | 02.12.2014 | Tarvo Kärberg | NAE | Updating content. | +| 0.4 | 17.01.2015 | Tarvo Kärberg | NAE | Updating content. | +| 0.5 | 21.01.2015 | Karin Bredenberg | ESS | Updating content. | +| 0.6 | 23.01.2015 | Anders Bo Nielsen | DNA | Updating content. | +| 0.7 | 23.01.2015 | Kathrine Hougaard Edsen | DNA | Updating content. | +| 0.71 | 26.01.2015 | Björn Skog | ESS | Updating content. | +| 0.72 | 27.01.2015 | Hélder Silva | KEEPS | Updating content. | +| 0.8 | 27.01.2015 | Angela Dappert | DLM/UPHEC | Quality assurance and proof-reading. | +| 0.9 | 29.01.2017 | Kuldar Aas | NAE | Quality assurance and proof-reading. | +| 0.91 | 30.01.2015 | David Anderson | UPHEC | Quality assurance and proof-reading. | +| 1.0 | 30.01.2015 | Tarvo Kärberg | NAE | Final version (D3.2). | +| 0.1 | 11.05.2015 | Karin Bredenberg | ESS/NAS | Updating content. | +| 0.2 | 30.06.2015 | Tarvo Kärberg | NAE | Updating content. | +| 0.3 | 27.07.2015 | Tarvo Kärberg | NAE | Updating content. | +| 0.4 | 23.10.2015 | Tarvo Kärberg | NAE | Updating content, synchronising with the SMURF profile. | +| 0.41 | 17.11.2015 | Tarvo Kärberg | NAE | Integrating the feedback. | +| 0.42 | 07.12.2015 | Tarvo Kärberg | NAE | Updating content. | +| 0.5 | 12.01.2016 | Tarvo Kärberg | NAE | Updating content, synchronising with the Common Specification. | +| 0.6 | 15.01.2016 | Anders Bo Nielsen | DNA | Updating content. | +| 0.61 | 15.01.2016 | Gregor Zavrsnik | SNA | Updating content. | +| 0.62 | 18.01.2016 | Tarvo Kärberg | NAE | Updating content. | +| 0.63 | 20.01.2016 | Phillip Mike Tømmerholt | DNA | Updating content. | +| 0.64 | 25.01.2016 | Phillip Mike Tømmerholt | DNA | Updating content. | +| 0.7 | 26.01.2016 | Sven Schlarb | AIT | Quality assurance and proof-reading. | +| 0.8 | 27.01.2016 | Kuldar Aas | NAE | Quality assurance and proof-reading. | +| 0.9 | 29.01.2016 | Andrew Wilson and David Anderson | University of Brighton | Quality assurance and proof-reading. | +| 1.0 | 29.01.2016 | Tarvo Kärberg | NAE | Final version (general part of D3.3) | +| 1.1 | 14.07.2016 | Tarvo Kärberg | NAE | Incorporating agreements made in the Common Specification work group. | +| 1.2 | 12.12.2016 | Tarvo Kärberg | NAE | Incorporating agreements made in the Common Specification work group. | +| 1.3 | 13.01.2017 | Tarvo Kärberg | NAE | Small updates. | +| 1.4 | 31.01.2017 | Tarvo Kärberg | NAE | Finalising the specification. | \ No newline at end of file diff --git a/specification/00.03-summary/index.md b/specification/00.03-summary/index.md new file mode 100644 index 0000000..7a85ae7 --- /dev/null +++ b/specification/00.03-summary/index.md @@ -0,0 +1,16 @@ +# Executive summary + +According to the Open Archival Information System Reference Model (OAIS) every submission of information to an archive by a producer occurs as one or more discrete transmissions of submission information packages. Unfortunately there is currently no central SIP format which would cover all national and business needs as identified in the E-ARK Report on Available Best Practices. The E-ARK project acknowledged this problem and developed a solution in the form of the SIP format which is described in this document. + +The first outcome of this work was Deliverable 3.2: E-ARK SIP Draft Specification. This gives an overview of the structure and main metadata elements for the SIP and provides initial input for the technical implementations of pre-ingest and ingest tools. It was followed by Deliverable 3.3 which extends the previous one by providing a revised version of the D3.2 content, adding more details relevant for tool development and implementation, and describing specific profiles for the transfer of relational databases, electronic records management systems (ERMS) and simple file system based records (SFSB). + +The target group for this document are records creators, archival institutions and software providers creating or updating their SIP format specifications. This document is also important for electronic records management systems (ERMS) providers as it presents a standardised profile for exporting records and metadata out of their systems. + +This document provides an overview of: + +- **The general structure for Submission Information Packages.** +This chapter explains how records creators should construct/structure their SIPs in order to meet the requirements of the SIP specification and achieve interoperability by following the common rules for all information packages (SIPs, AIPs, DIPs) as described in the Common Specification for Information Packages . +- **General SIP metadata.** This chapter provides a detailed overview of metadata sections and the metadata elements in these sections. The tables with all metadata elements could possibly be of interest to technical stakeholders who wish to implement the SIP. +- **Content Information Type Specifications.** This section introduces profiles for SMURF (Semantically Marked Up Records Format) and relational databases. The profiles themselves are separate documents. +- **The submission agreement.** This chapter provides an overview of submission agreement usages and recommended metadata elements. + diff --git a/specification/01-introduction/image1.png b/specification/01-introduction/image1.png new file mode 100644 index 0000000..4364eb3 Binary files /dev/null and b/specification/01-introduction/image1.png differ diff --git a/specification/01-introduction/index.md b/specification/01-introduction/index.md new file mode 100644 index 0000000..a6c3795 --- /dev/null +++ b/specification/01-introduction/index.md @@ -0,0 +1,39 @@ +# 1. Introduction + + +## 1.1. Scope and purpose + +This document is a core / general SIP specification which is guided by the following hierarchical model (see Figure 1): + +![Relations between specifications](image1.png) + + +- Common Specification for Information Packages (CSIP) identifies and standardises the common aspects of information packages (SIP/AIP/DIP) which are equally relevant and implemented by any of the functional entities of the overall digital preservation process (i.e. pre-ingest, ingest, long-term preservation and access). CSIP is a separate document. Therefore, the current specification does not aim largely repeating the information presented there – only the information that is absolutely necessary to understand the SIP specification will be mentioned here. +- General SIP Specification. This is the current document which describes the SIP package structure and minimum set of required metadata for SIP delivery to the archive. +- Content Information Type Specifications are content-dependent specifications which include detailed information on how content, metadata, and documentation for specific content types (for example ERMS or relational databases) can to be handled within the SIP. At the moment, there are 3 such specifications: + - SIARD 2.0 for relational databases (The SIARD 2.0 specification for relational databases can be found at http://eark-project.com/resources/specificationdocs/32-specification-for-siard-format-v20) + - SMURF ERMS for electronic records management systems (The SMURF profile for ERMS can be found https://github.com/DLMArchivalStandardsBoard/SMURF/tree/master/spec.) + - SMURF SFSB for simple file system based records (The SMURF profile for SFSB can be found at https://github.com/DLMArchivalStandardsBoard/SMURF/tree/master/spec.) + + +## 1.2. Related work + +This document is based on or influenced by the following documents and best practices: + +- **Deliverable D3.1** - E-ARK Report on Available Best Practices, 2014, http://eark-project.com/resources/project-deliverables/6-d31-e-ark-report-on-available-best-practices +D3.1 was one of the inputs to the deliverable D3.2 and the D3.2 to the D3.3. +- **Deliverable D2.1** - General pilot model and use case definition, 2014, http://eark-project.com/resources/project-deliverables/5-d21-e-ark-general-pilot-model-and-use-case-definition. +We have developed the SIP specification to support the workflows defined in the general model. +- **FGS package structure**, 2013, https://riksarkivet.se/Media/pdf-filer/Projekt/FGS_Earkiv_Paket.pdf +This specification was one of the main inputs for the first draft SIP specification. The newest version (https://riksarkivet.se/Media/pdf-filer/doi-t/FGS_Paketstruktur_RAFGS1V1.pdf) was also investigated in the SIP definition process. +- **Reference Model for an Open Archival Information System** (OAIS), 2012, public.ccsds.org/publications/archive/650x0m2.pdf +We have used the same terminology as introduced in the OAIS model and also the same division of information package types: Submission Information Package (SIP), Archival Information Package (AIP), Dissemination Information Package (DIP). +- **Producer-Archive Interface Methodology Abstract Standard** (PAIMAS), 2004, public.ccsds.org/publications/archive/651x0m1.pdf +We have looked at the four phases (Preliminary, Formal Definition, Transfer, Validation) of PAIMAS, their aims and expected results and decided to support the phases as far as possible with the current specification. Furthermore, the requirements for the submission agreement were influenced by the PAIMAS standard. +- **Producer-Archive Interface Specification (PAIS)** – CCSDS, 2014, public.ccsds.org/publications/archive/651x1b1.pdf +We have investigated the structure of a SIP presented in PAIS, but as the implementation of this specification is far from comprehensive (only few prototypes exist), we decided to rely more on the best practices introduced in the best practice report. +- **e-SENS** (Electronic Simple European Networked Services) project, http://www.esens.eu/ +We have investigated the e-Delivery and e-Documents related work in e-SENS and made sure that our work is neither duplicating the work done there nor producing any conflicts between deliverables. +- **Deliverables D3.2** - E-ARK SIP Draft Specification, 2015, http://eark-project.com/resources/project-deliverables/17-d32-e-ark-sip-draft-specification and D3.3 E-ARK SIP Pilot Specification, 2016, http://eark-project.com/resources/project-deliverables/51-d33pilotspec + + diff --git a/specification/02-general_structure/image2.png b/specification/02-general_structure/image2.png new file mode 100644 index 0000000..b34e319 Binary files /dev/null and b/specification/02-general_structure/image2.png differ diff --git a/specification/02-general_structure/image3.png b/specification/02-general_structure/image3.png new file mode 100644 index 0000000..a280b44 Binary files /dev/null and b/specification/02-general_structure/image3.png differ diff --git a/specification/02-general_structure/index.md b/specification/02-general_structure/index.md new file mode 100644 index 0000000..3205b08 --- /dev/null +++ b/specification/02-general_structure/index.md @@ -0,0 +1,18 @@ +# 2. GENERAL STRUCTURE AND DATA MODEL FOR SUBMISSION INFORMATION PACKAGES + +The SIP specification follows the general structure which is common for all information packages. The SIP data model describes the package submitted to the archive, which consists of representations (submitted data and metadata) and metadata as seen in Figure 2 (This is a conceptual model and does not describe the actual implementation structure.) and mandated/required by the SIP, AIP and DIP formats. + +![SIP data model](image2.png) + +As one SIP can contain more than one representation (Digital Object or physical object instantiating or embodying an Intellectual Entity. A Representation that is a Digital Object is the set of stored Files and Structural Metadata needed to provide a complete rendition of the Intellectual Entity. PREMIS Data Dictionary (full document), Version 3.0, 2015, http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf) of the same intellectual entity then it is reasonable to separate different representations (e.g. Rep-001 and Rep-002 under Representations). This requires additional metadata about the SIP. If we store all metadata (even about representations) at the IP level then we do not need to use the Metadata folder at the representation level. In this case, the Metadata directory under representations is considered optional, as are: + +- Documentation folder – for including additional documents that explain the content or its use (e.g. user manual). +- Schemas folder – for adding schemas for the XML files in the data/metadata directly into the package. + +![Minimal SIP structure](image3.png) + +If needed, a METS.xml file can be present under representations as well to handle scalability issues. This proposed extended IP structure using divided METS files, is introduced in the Common Specification for Information Packages and in deliverable D4.3 E-ARK AIP pilot specification (E-ARK AIP pilot specification, released January 2016, http://eark-project.com/resources/project-deliverables) to more easily manage the splitting of large packages using a divided METS structure. + +The detailed folder structure of a SIP will also be present and agreed upon in the submission agreement (page 41) by indicating the data model for the submission. Also the details of the internal structure of the data and metadata folders can be further specified in submission agreements. + +The metadata model for the SIP will be multi-layered by starting from general common metadata elements and finishing with optional local elements as explained previously (Please note that the business specific (e.g. healthcare records) or local implementation based metadata is not discussed in this specification. As the specifications can be undertaken at different scales, with different types of data and locations, with their constituent technical components, more detailed or localised specifications may be needed.). diff --git a/specification/03-metadata/image4.png b/specification/03-metadata/image4.png new file mode 100644 index 0000000..464688f Binary files /dev/null and b/specification/03-metadata/image4.png differ diff --git a/specification/03-metadata/image5.png b/specification/03-metadata/image5.png new file mode 100644 index 0000000..768b819 Binary files /dev/null and b/specification/03-metadata/image5.png differ diff --git a/specification/03-metadata/image6.png b/specification/03-metadata/image6.png new file mode 100644 index 0000000..2e32a59 Binary files /dev/null and b/specification/03-metadata/image6.png differ diff --git a/specification/03-metadata/image7.png b/specification/03-metadata/image7.png new file mode 100644 index 0000000..c433edd Binary files /dev/null and b/specification/03-metadata/image7.png differ diff --git a/specification/03-metadata/image8.png b/specification/03-metadata/image8.png new file mode 100644 index 0000000..b779fdc Binary files /dev/null and b/specification/03-metadata/image8.png differ diff --git a/specification/03-metadata/index.md b/specification/03-metadata/index.md new file mode 100644 index 0000000..5c4121b --- /dev/null +++ b/specification/03-metadata/index.md @@ -0,0 +1,299 @@ +# 3. GENERAL SIP METADATA + +The general SIP metadata is based on the METS standard and presented as a profile. METS profiles are intended to describe a class of METS documents in sufficient detail to provide both document authors and programmers with the guidance they need to create and process METS documents conforming to a particular [profile](http://www.loc.gov/standards/mets/mets-profiles.html). + +Creating a METS profile requires a good understanding of the METS Profile components. An overview of these components can be found in the [METS online documentation](http://www.loc.gov/standards/mets/profile_docs/components.html) and in Appendix D on page 44 in the [D3.2 specification](http://eark-project.com/resources/project-deliverables/17-d32-e-ark-sip-draft-specification). + +There are 5 main sections in this METS profile: + +- `````` - METS header (metadata about the creator, contact persons, etc. of the IP). +- `````` - descriptive metadata (references to EAD, EAC-CPF, etc.). +- `````` - administrative metadata (information about how files were created and stored, intellectual property rights, etc.). +- `````` - file section, lists all files containing content (may also contain metadata about files). +- `````` - structural map, describes the hierarchical structure of the digital object and the whole IP (i.e. object + metadata). + +These sections will be described in more detail in sections 3.1 to 3.6. All these sections present the SIP requirements for METS elements in table form according to the following structure: + +- Element - The name of the element in plain text used in the accompanying schema for elements or attributes. For more information regarding elements and attributes in XML see WWW Consortium (http://www.w3.org/). +- Definition - Defines the functions of the element. Contains an explanation of the element and some example values. +- Cardinality – Represents the number of occurrences of an element (see below). + - 0..1 – The element is optional and cannot be repeated. + - 0..* – The element is optional and can be repeated. + - 1 – The element is mandatory and can only be stated once. + - 1..* – The element is mandatory and has one or more occurrences. +- METS - Defines the element in the METS standard used for designing the SIP element. The column uses XML-syntax. [ ] defines where the value is placed. + +## 3.1. Root + +The root of a METS document can contain a number of optional attributes, namespaces (xmlns:) and schema instance locations (xsi:) of the external standards referenced in the METS record and a number of other elements as seen in Table 1. + +[Table 1 missing] + +Example: + +```xml + +``` + + +## 3.2. Header + +The METS header element describes metadata about the creator, contact persons, etc. of the submission information package as seen in Figure 4. + +![METS header](image4.png) + +These are the elements that give information about the submission of the SIP in the METS header element. + +[Table 2: Metadata about the information package missing] + + +Example: + +```xml + + + The Hungarian Ministry of Healthcare + ORG:HU121345098701 + + + The Hungarian Health Agency + ORG:HU891345098701 + + + National Archives of Hungary + ORG:HU2010340987 + + + SIP Creator + VERSION=0.0.2 + + +``` + +## 3.3. Descriptive metadata + +The METS descriptive metadata element references to archival description metadata (EAD, EAC-CPF, etc.) as seen in Figure 5. + +![METS descriptive metadata](image5.png) + +Archival information can be included in the METS package. Usually, for the archival institutions this information is delivered in EAD and EAC-CPF formats. + +To include EAD and EAC-CPF in a METS profile the use of is to be preferred according to the METS implementation guide referenced above. The complete rules for all elements and attributes in the are stated in the profile, the specific elements used when referencing and embedding is shown below. + +Other metadata standards for description and administrative purposes can be used and referencing them must adhere to the and rules stated in the profile. + +[Table 3: EAD metadata missing] + +Example: + +```xml + + + + + + + +``` + +## 3.4. Administrative metadata + +The METS administrative metadata element references to technical and preservation metadata as seen in Figure 6. + +![METS administrative metadata](image6.png) + +Preservation metadata can be included in the METS package. It is recommended that PREMIS is used for preservation metadata. For further reading: + +- More information about PREMIS can be found at: http://www.loc.gov/standards/premis/ . +- A guide on using PREMIS with METS may be found at: http://www.loc.gov/standards/premis/guidelines-premismets.pdf. +- Decisions made during the use of PREMIS can be recorded using this document: http://www.loc.gov/standards/premis/premis_mets_checklist.pdf + +The guide on using PREMIS with METS (referenced above) recommends using the `` in order to reference PREMIS metadata. The complete rules for all elements and attributes in the `` are stated in the profile, the specific elements used when referencing are shown below. However, please note that preservation metadata varies for different content types and therefore best practice guidelines should be applied as required. + +[Table 4: PREMIS metadata missing] + +Example: + +```xml + + + + + +``` + + +## 3.5. Files + +The METS file section element lists all files containing content (may also contain metadata about files) as seen in Figure 7. + +![METS files](image7.png) + +All files found in the submission package should be referenced once and only once in the METS-document describing the submission. The elements and attributes are the same regardless of the content type submitted. + +When describing the content and documentation files in METS they are placed in the fileSec element in one or more fileGrp elements. The fileGrp element can be used for grouping files together in different ways. In this profile we do not group files in different groups, we only use one mandatory fileGrp. Use of more fileGrp’s must be decided in every implementation and described in a METS profile. + +[Table 5: Files metadata missing] + + +Example of the element (root METS file): + +```xml + + + + + + + + + + + + + + + + + + + + ... + + + +``` + +Example of the element (representation METS file): + +```xml + + + + + + + + + + + + + ... + + + + ... + + + + +``` + +## 3.6. Structure + +The mandatory METS structural map element describes the hierarchical structure for the digital object as seen in Figure 8 and follows completely the requirements set in the Common Specification for Information Packages. + +![METS structural section](image8.png) + +Example: + +```xml + +
+
+
+ + +
+
+ + +
+
+
+ +
+
+
+ +
+
+ +
+
+ +``` + + + diff --git a/specification/04-content_types/image10.png b/specification/04-content_types/image10.png new file mode 100644 index 0000000..0c65af3 Binary files /dev/null and b/specification/04-content_types/image10.png differ diff --git a/specification/04-content_types/image11.png b/specification/04-content_types/image11.png new file mode 100644 index 0000000..1008f1a Binary files /dev/null and b/specification/04-content_types/image11.png differ diff --git a/specification/04-content_types/image12.png b/specification/04-content_types/image12.png new file mode 100644 index 0000000..ed1c116 Binary files /dev/null and b/specification/04-content_types/image12.png differ diff --git a/specification/04-content_types/image13.png b/specification/04-content_types/image13.png new file mode 100644 index 0000000..4a17cd2 Binary files /dev/null and b/specification/04-content_types/image13.png differ diff --git a/specification/04-content_types/image14.png b/specification/04-content_types/image14.png new file mode 100644 index 0000000..0440465 Binary files /dev/null and b/specification/04-content_types/image14.png differ diff --git a/specification/04-content_types/image15.png b/specification/04-content_types/image15.png new file mode 100644 index 0000000..eda6871 Binary files /dev/null and b/specification/04-content_types/image15.png differ diff --git a/specification/04-content_types/image9.png b/specification/04-content_types/image9.png new file mode 100644 index 0000000..b0aa7e3 Binary files /dev/null and b/specification/04-content_types/image9.png differ diff --git a/specification/04-content_types/index.md b/specification/04-content_types/index.md new file mode 100644 index 0000000..1203d15 --- /dev/null +++ b/specification/04-content_types/index.md @@ -0,0 +1,114 @@ +# 4. CONTENT INFORMATION TYPE SPECIFICATIONS + +As discussed above (Chapter 2), an SIP can include content-type specific data and metadata. Types of data files and their structural relationships, and metadata elements vary for different content-types. Metadata is submitted to an archive so that it can support functions in the archive. The metadata created by business systems can be in different structures / formats. The amount and type of available metadata depends very much on the type and owner/developer of the system. As such there are also differences in how much metadata can a specific system or type of system export and in which formats. To deal with these differences there’s the possibility of content type profiles which define detailed metadata requirements beyond the Common Specification for Information Packages and general SIP. + +This specification does not offer one single structure in which the content-type specific metadata could be stored as a whole. In order to efficiently use the metadata to support archival functions the SIP defines separate SIP METS sections as containers for the various metadata functions, such as the METS header for package management function, the `` for EAD metadata standard (i.e. using `` for package discovery) and other descriptive metadata standards, the `` for preservation (PREMIS), technical and other functions and standards. In order to use the submitted metadata it has to be mapped to and referenced from the SIP METS sections. To do this the content-type specific metadata elements need to be mapped to those containers and implemented in the agreed standards. Therefore, complementary metadata profiles are needed for content types. This document refers to 3 profiles which define how the submitted content-specific metadata should be mapped to the SIP structure: + +- The SMURF (semantically marked up record format) for ERMS will contain mappings for ERMS (electronic records management systems) based on MoReq2010 as described in 4.1. +- The SMURF for SFSB (simple file-system based) records as described in 4.2. +- The SIARD 2.0 profile for relational databases as described in a section 4.3. +All SIPs will need to be transformed into AIPs in the archival ingest process. The SIP to AIP conversion is described in the AIP specification. + +## 4.1. Electronic records management systems (ERMS) + +The first case represents ERMS records encapsulated in the SIP. This profile aims to standardise the export of records management systems into a single easy to use model. The basic workflow is described on Figure 9. + +![Extraction at pre-ingest](image9.png) + +In case of ERMS we distinguish two scenarios – MCRS and non-MCRS (1, 4). The latter is assumed to be able to export metadata and records in a native export format (5), the first supports in addition the specific MoReq2010 export format (2). Further, the export for archival purposes can differ from the original export (3). + +The SMURF ERMS profile (6) defines a set of Extended EAD metadata (7) which are created during the pre-ingest phase. In some cases it may be not possible to map all relevant original elements to a set of Extended EAD metadata, therefore some MoReq 2010 elements (8) are allowed for guaranteeing that all required elements are included in the SIP. + +footnotes: + +- The scope of this chapter is to give short introductions; more details are available in a separate document SMURF (semantically marked up record format) for ERMS. +- The metadata extracted from a non-MCRS system should be mapped and transformed into the SMURF format by using external mechanisms (i.e. XSL transformation) or by updating the export format to support the SMURF profile. +- The EAD extraction will be created automatically by a MCRS. +- We do not recommend using MoReq2010 elements in the SMURF profile and therefore only the mapping from MoReq2010 elements to EAD will be provided. + + +The SMURF extraction should be complemented with more general information about the information package and could be complemented with PREMIS, EAC-CPF metadata as well (Figure 10). + +![Creation at Pre-Ingest](image10.png) + +The SMURF profile (1) includes MoReq2010 metadata that has been mapped to EAD (2) and some additional elements required by archives. The structural metadata for the submission information package (represented as a METS file) will be added (4) during the SIP preparation process. If possible the EAC-CPF metadata (6) should be created and SIP creation events logged as PREMIS metadata (5). The full SIP will consist of items 1, 4 and optionally (5), (6). + + + +## 4.2. Simple file system based records (SFSB) + +The second case represents an encapsulation of computer files into the SIP. It is based on an assumption that the files can be described in an extended EAD format (Figure 11). + + +![SFSB metadata and computer files](image11.png) + +The blocks in the diagram refer to the following. Computer files reside in some file system (e.g. shared drives, 3). The metadata (2) about the files needed for the long time preservation may or may not exist. If the metadata exists then it has to be transformed into the EAD metadata (5). If the metadata does not exist then it has to be created and included in the SIP. + +The SMURF metadata should be complemented with more general information about the information package and could be complemented with PREMIS, EAC-CPF metadata as well to build a full SIP (Figure 12). + +![SFSB SIP](image12.png) + + +The blocks in the diagram refer to the following: + +1. The SMURF profile for SFSB records. +2. Archival descriptions following the EAD extended schema for SFSB records. +3. Structural metadata for the submission information package (represented as METS file). +4. If possible then SIP creation events should be logged as PREMIS metadata. +5. If possible then EAC-CPF metadata should be created during the SIP creation process. +6. The SIP consists of items 1, 3 and optionally (4), (5). + +## 4.3. Relational databases + +The third case represents a relational database encapsulated in the SIP. This case structure presumes that the database is previously exported in the SIARD 2.0 format (a harmonised format for database archiving based on SIARD, Figure 13). + +![Export to SIARD 2.0](image13.png) + +Various relational databases (e.g. Oracle, PostgreSQL, etc.) exist (1). These databases contain the metadata and records in its native format (2) which can be extracted into a standardised format (4) by following SIARD 2.0 (3). The SIARD extraction should be complemented with more general information about the information package and could be complemented with PREMIS, EAC-CPF, EAD metadata as well (Figure 14). + + +![SIARD 2.0 to SIP](image14.png) + +### BLOBs and CLOBs in relational databases + +The Figure 13 and Figure 14 show the most common profile for relational databases with metadata and records. However, in some cases there can be binary data in a relational database which will be exported as external files in SIP creation. This might cause a situation where it is necessary to consult with “RECOMMENDATION for storing large objects outside the SIARD file”, which is a specific and technical recommendation that is not included in the SIARD 2.0 specification. + +Binary data in regard to relational databases is defined as information which is stored in the database as a bit stream following a specific file format. The potentially huge size of binary data within a database can lead to problems in the handling and archival processing of the database. Binary data is mostly referred to as binary large object (BLOB). Similarly large amounts of character data are named CLOB. CLOBs pose a problem due to size more than lack of a proper data type. For the rest of this section CLOBs will be treated as BLOBs. + +An example of a relational database with BLOBs could be a database where images are stored. + +Databases and the handling of binary data has always been a challenge, regardless of whether the handling was based on: + +1. Internal BLOBs - where data is contained in the records. +2. External direct references (path and filename) – where BLOBs are stored as files. +3. External indirect reference (file ID)- where BLOBs are stored as files. +4. Other methods which may exist. + + +The first method using internal BLOBs is supported in the SIARD 2.0 format, but if a table contains data with BLOBs that are more than 2000 bytes or 2000 characters in size, BLOBs will be produced as separate files and a reference to the location of the individual files stored in the cell content. The SIARD 2.0 format therefore also supports external reference to BLOBs stored as files inside the SIARD table folder structure (i.e. inside the SIARD ZIP package file). + +The above scenario will therefore have no consequences regarding the Figure 13 and Figure 14 presented above. + +The SIARD 2.0 format, however, also supports methods using external files outside the SIARD table folder structure (i.e. outside the SIARD ZIP package file) but it does not describe in detail how to handle BLOBs if this is the case. It is in this particular scenario that it is advisable to consult the detailed recommendations in “RECOMMENDATION for storing large objects outside the SIARD file” document. + +When a SIP creation includes BLOBs stored as external files outside the table folder structure this will have influence on the SIP package since in this case there is not only one SIARD-file containing data from the database, but a SIARD-file and one or several other folders containing the external BLOB files. + +A diagram for external files outside the SIARD table folder structure is presented in Figure 15: + +![Relational databases with BLOBs/CLOBs stored as external files](image15.png) + + +1. Various relational databases (e.g. Oracle, PostgreSQL, etc.). +2. The metadata and records in a relational database. +3. The SIARD 2.0 specification. +4. The metadata and records in the SIARD 2.0 format. +5. Recommendations for external file structure of binary data for the SIARD 2.0 format. +6. BLOBs and/or CLOBs stored as external files outside the table structure. + +### External BLOBs influence on METS file + +If there are several data files and folders in the SIP package, this consequently has influence on the IP metadata (METS file). Therefore, “RECOMMENDATION for segmenting IP using METS” describes how to represent the files in METS. + +Further information can be found in the SIARD 2.0 Profile document. + + diff --git a/specification/05-submission_agreement/index.md b/specification/05-submission_agreement/index.md new file mode 100644 index 0000000..3e43a09 --- /dev/null +++ b/specification/05-submission_agreement/index.md @@ -0,0 +1,16 @@ +# 5. SUBMISSION AGREEMENT + +Interaction between the Archive and Producers is often formalized and guided by a Submission Agreement, which establishes specific details of the interaction such as the type of information submitted, the metadata the Producer is expected to provide, the logistics of the actual transfer of custody from the Producer to the archive, and any access restrictions attached to the submitted material. (avoie B, The Open Archival Information System (OAIS) Reference Model: Introductory Guide (2nd Edition), 2014, www.dpconline.org/component/docman/doc_download/1359-dpctw14-02) According to the OAIS model the submission agreement is an agreement reached between an Archive and the Producer that specifies a data model, and any other arrangements needed, for the Data Submission Session. This data model identifies format/contents and the logical constructs used by the Producer and how they are represented on each media delivery or in a telecommunication session. (Reference Model for an Open Archival Information, 2012, public.ccsds.org/publications/archive/650x0m2.pdf) + +The E-ARK project acknowledged the importance of submission agreements and provided a way for referencing it in a METS.xml regardless of its form. (A submission agreement can be delivered in a digital (e.g. PDF or XML file) or an analogue way (e.g. paper document).) This document does describe a recommended format for a Submission Agreement (Appendix B: Submission Agreement), but of course does not forbid the use of any other Submission Agreement format. + +According to the [PAIMAS, 2004](http://public.ccsds.org/publications/archive/651x0m1.pdf) standard the submission agreement should include a complete and precise definition of: + +- information to be transferred (e.g., SIP contents, SIP packaging, data models, Designated Community, legal and contractual aspects); +- transfer definition (e.g. specification of the Data Submission Sessions); +- validation definition; +- change management (e.g. conditions for modification of the agreement, for breaking the agreement); +- schedule (submission timetable). + + +The submission agreement is inspired by the PAIMAS requirements and the submission agreement template provided by the National Oceanic and Atmospheric Administration (NOAA). This document will propose a list of elements which are recommended to be recorded in the submission agreement (8.2). diff --git a/specification/07-references/index.md b/specification/07-references/index.md new file mode 100644 index 0000000..bb6005f --- /dev/null +++ b/specification/07-references/index.md @@ -0,0 +1,24 @@ +# 7. REFERENCES + +1. A Checklist for Documenting PREMIS-METS Decisions in a METS Profile, 2010, +URL: http://www.loc.gov/standards/premis/premis_mets_checklist.pdf +2. E-ARK Report on Available Best Practices, 2014, URL: http://eark-project.com/resources/project-deliverables/6-d31-e-ark-report-on-available-best-practices +3. e-SENS (Electronic Simple European Networked Services) project, 2015, +URL: http://www.esens.eu/ +4. Encoded Archival Context for Corporate Bodies, Persons, and Families, 2015, URL: http://eac.staatsbibliothek-berlin.de/ +5. FGS packet structure, 2013, +URL:https://riksarkivet.se/Media/pdf-filer/Projekt/FGS_Earkiv_Paket.pdf +6. Guidelines for using PREMIS with METS for exchange, Revised September 17, 2008 +URL: http://www.loc.gov/standards/premis/guidelines-premismets.pdf. +7. Media Types, 2015, URL: https://www.iana.org/assignments/media-types/media-types.xhtml +8. METS, 2015, URL: http://www.loc.gov/standards/mets/ +9. METS Profile Components, 2011, URL: http://www.loc.gov/standards/mets/profile_docs/components.html +10. METS Profiles, 2012, URL: http://www.loc.gov/standards/mets/mets-profiles.html +11. Producer, Submission Agreements: Glossary of Terms, 2015, URL: http://sites.tufts.edu/dca/about-us/research-initiatives/taper-tufts-accessioning-program-for-electronic-records/project-documentation/submission-agreements-glossary-of-terms/ +12. Producer-Archive Interface Methodology Abstract Standard (PAIMAS), 2004, +URL: public.ccsds.org/publications/archive/651x0m1.pdf +13. Producer-Archive Interface Specification (PAIS) – CCSDS, 2014, +URL: public.ccsds.org/publications/archive/651x1b1.pdf +14. Records Creator, Submission Agreements: Glossary of Terms, 2015, URL: http://sites.tufts.edu/dca/about-us/research-initiatives/taper-tufts-accessioning-program-for-electronic-records/project-documentation/submission-agreements-glossary-of-terms/ +15. Reference Model for an Open Archival Information System (OAIS), 2012, +URL: public.ccsds.org/publications/archive/650x0m2.pdf diff --git a/specification/08-appendixes/index.md b/specification/08-appendixes/index.md new file mode 100644 index 0000000..6def6b1 --- /dev/null +++ b/specification/08-appendixes/index.md @@ -0,0 +1,58 @@ +# 8. APPENDICIES + +## 8.1. Appendix A: Quality requirements for a submission information package + +Every SIP should follow the requirements set out in the common specification for information packages. + +### 8.1.1. General requirements + +- Requirement 1.1: It MUST be possible to include any data or metadata, regardless of its type or format, in a Submission Information Package. +- Requirement 1.2: A Submission Information Package Specification MUST NOT restrict the means, methods or tools for exchanging it. +- Requirement 1.3: The Submission Information Package Specification MUST NOT define the scope of data and metadata which constitutes an Information Package. +- Requirement 1.4: A Submission Information Package SHOULD be highly scalable. +- Requirement 1.5: A Submission Information Package MUST be machine-readable +- Requirement 1.6: A Submission Information Package SHOULD be human-readable +- Requirement 1.7: A Submission Information Package MUST support the preservation method best suited for the data. + +### 8.1.2. Identification of the Information Package + +- Requirement 2.1: The Information Package type (SIP, AIP or DIP) MUST be clearly indicated. +- Requirement 2.2: The Submission Information Package MUST clearly indicate the Content Information Type(s) of its data and metadata. +- Requirement 2.3: A Submission Information Package MUST bear an identifier which is unique and persistent in the scope of the repository. +- Requirement 2.4: A Submission Information Package SHOULD bear an identifier which is globally unique and persistent. +- Requirement 2.5: All components of a Submission Information Package MUST bear an identifier which is unique and persistent within the repository. + +### 8.1.3. Structure of the Information Package + +- Requirement 3.1: A Submission Information Package MUST be built in such a way that its data and metadata can be logically and physically separated from one another. +- Requirement 3.2: The structure of the Submission Information Package SHOULD allow for the separation of different types of metadata +- Requirement 3.3: The structure of the Submission Information Package MUST allow for the separation of data and metadata representations. +- Requirement 3.4: The structure of a Submission Information Package MUST explicitly define the possibilities for adding additional logical components into the Information Package. +- Requirement 3.5: A Submission Information Package MUST follow a common conceptual structure regardless of its technical implementation. +- Requirement 3.6: A Submission Information Package MUST be implemented by one and only one implementation at any point in time. + +###8.1.4. Information Package Metadata + +- Requirement 4.1: Metadata in a Submission Information Package MUST be based on standards. +- Requirement 4.2: Metadata in a Submission Information Package MUST allow for unambiguous use. +- Requirement 4.3: A Submission Information Package MUST NOT restrict the addition of any additional metadata. + + +  +## 8.2. Appendix B: Submission Agreement + +[Table 6 missing] + +## 8.3. Appendix C: Terminology + +| Archival creator | An organization unit or individual that creates records and/or manages those records during their active use. | +|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Archive* | An Organisation that intends to preserve information for Access and use by a Designated Community. | +| Delivering organisation | The organisation delivering the package to the archive. For stating and extending the information use of the “Producer organisation name” and “Submitting organisation name” elements is recommended. | +| ERMS | A type of content management system known as an electronic records management system. | +| Information Package* | A logical container composed of optional Content Information and optional associated Preservation Description Information. Associated with this Information Package is Packaging Information used to delimit and identify the Content Information and Package Description information used to facilitate searches for the Content Information. | +| Ingest Functional Entity* | The OAIS functional entity that contains the services and functions that accept Submission Information Packages from Producers, prepares Archival Information Packages for storage, and ensures that Archival Information Packages and their supporting Descriptive Information become established within the OAIS. | +| OAIS* | The Open Archival Information System is an archive (and a standard: ISO 14721:2003), consisting of an organisation of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community. | +| Producing organisation | The organizational unit or individual that has the authority to transfer records to an archive. Usually the producer is also the records creator, the organizational unit or individual that created and managed the records during their active use. This is not always the case, sometimes the producer is different from the records creator.For example: An author dies and her literary executor gains the authority to transfer her papers to an archive. The author is the records creator and the literary executor is the producer. For example: Department X gets reorganized out of existence and Department Y, which takes over the functional responsibilities of Department X, gains the authority to transfer the records of Department X to the archive. Department X is the records creator and Department Y is the producer. Counter example: The Department of Widget Science transfers some of its own records to the archive. The Department of Widget Science is the records creator and the producer. | +| Submission Information Package (SIP)* | An Information Package that is delivered by the Producer to the OAIS for use in the construction or update of one or more AIPs and/or the associated Descriptive Information. | +| Submitting organisation | Name of the organisation submitting the package to the archive. Extends the delivery information since it may be the case that the content of a creator is held by another part of the organisation. | diff --git a/specification/index.md b/specification/index.md new file mode 100644 index 0000000..2247119 --- /dev/null +++ b/specification/index.md @@ -0,0 +1,34 @@ +GENERAL SIP SPECIFICATION +============================================= + +Version: 1.4 + +January 31, 2017 + +Front Matter +------------ +1. [Authors](authors) +2. [Revision History](history) + +Contents +-------- + + +TBD + +Acknowledgements +---------------- +The General SIP Specification was first developed within the E-ARK project in 2014 – 2017. E-ARK was an EC-funded pilot action project in the Competiveness and Innovation Programme 2007- 2013, Grant Agreement no. 620998 under the Policy Support Programme. + +The authors of this deliverable would like to thank all national archives, tool developers, the Advisory Board of the E-ARK project and other stakeholders who provided valuable knowledge about their submission information packages and feedback to E-ARK deliverables. + +A special gratitude goes to the National Archives of Sweden whose FGS (Förvaltningsgemensam Specifikation) structure influenced the first version of the SIP METS profile development significantly. + +The authors would also like to express their gratitude to the team behind the Common Specification for Information Packages document for their enormous effort in agreeing common principles for submission, archival and dissemination packages. + + +Contact & Feedback +------------------ +The General SIP Specification is maintained by the Digital Information LifeCycle Interoperability Standard Board (DILCIS Board). For further information about the DILCIS Board or feedback on the current document please consult the website http://www.dilcis.eu/ or contact us at + +