Skip to content

Commit

Permalink
add existing project files
Browse files Browse the repository at this point in the history
  • Loading branch information
DaltonAlves committed May 9, 2024
0 parents commit e38e4b4
Show file tree
Hide file tree
Showing 34 changed files with 1,179 additions and 0 deletions.
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Not sure what a .gitignore is?
# See: https://git-scm.com/docs/gitignore

# These are directly copied from Jekyll's first-party docs on `.gitignore` files:
# https://jekyllrb.com/tutorials/using-jekyll-with-bundler/#commit-to-source-control

# Ignore the default location of the built site, and caches and metadata generated by Jekyll
_site/
.sass-cache/
.jekyll-cache/
.jekyll-metadata

# Ignore folders generated by Bundler
.bundle/
vendor/
7 changes: 7 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
source 'https://rubygems.org'

gem "jekyll", "~> 4.3.3" # installed by `gem jekyll`
# gem "webrick" # required when using Ruby >= 3 and Jekyll <= 4.2.2

gem "just-the-docs", "0.8.1" # pinned to the current release
# gem "just-the-docs" # always download the latest release
90 changes: 90 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
GEM
remote: https://rubygems.org/
specs:
addressable (2.8.6)
public_suffix (>= 2.0.2, < 6.0)
colorator (1.1.0)
concurrent-ruby (1.2.2)
em-websocket (0.5.3)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0)
eventmachine (1.2.7)
ffi (1.16.3)
ffi (1.16.3-x64-mingw-ucrt)
forwardable-extended (2.6.0)
google-protobuf (3.25.1-arm64-darwin)
google-protobuf (3.25.1-x64-mingw-ucrt)
google-protobuf (3.25.1-x86_64-linux)
http_parser.rb (0.8.0)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
jekyll (4.3.3)
addressable (~> 2.4)
colorator (~> 1.0)
em-websocket (~> 0.5)
i18n (~> 1.0)
jekyll-sass-converter (>= 2.0, < 4.0)
jekyll-watch (~> 2.0)
kramdown (~> 2.3, >= 2.3.1)
kramdown-parser-gfm (~> 1.0)
liquid (~> 4.0)
mercenary (>= 0.3.6, < 0.5)
pathutil (~> 0.9)
rouge (>= 3.0, < 5.0)
safe_yaml (~> 1.0)
terminal-table (>= 1.8, < 4.0)
webrick (~> 1.7)
jekyll-include-cache (0.2.1)
jekyll (>= 3.7, < 5.0)
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
just-the-docs (0.8.1)
jekyll (>= 3.8.5)
jekyll-include-cache
jekyll-seo-tag (>= 2.0)
rake (>= 12.3.1)
kramdown (2.4.0)
rexml
kramdown-parser-gfm (1.1.0)
kramdown (~> 2.0)
liquid (4.0.4)
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (5.0.4)
rake (13.1.0)
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.6)
rouge (4.2.0)
safe_yaml (1.0.5)
sass-embedded (1.69.5-arm64-darwin)
google-protobuf (~> 3.23)
sass-embedded (1.69.5-x64-mingw-ucrt)
google-protobuf (~> 3.23)
sass-embedded (1.69.5-x86_64-linux-gnu)
google-protobuf (~> 3.23)
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
unicode-display_width (2.5.0)
webrick (1.8.1)

PLATFORMS
arm64-darwin-23
x64-mingw-ucrt
x86_64-linux

DEPENDENCIES
jekyll (~> 4.3.3)
just-the-docs (= 0.8.1)

BUNDLED WITH
2.3.26
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 just-the-docs

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Digital Services Documentation
Work in Progress
4 changes: 4 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
title: GW SCRC - Digital Services
description: Documentation for GW LAI's Digital Services unit of the Special Collections Research Center.
theme: just-the-docs

Binary file not shown.
Binary file not shown.
Binary file added assets/images/sidecar_upload.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions borndigital/accessioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
layout: page
title: "Accessioning and Ingesting Born-Digital Records"
parent: Supporting Born-Digital Records
has_children: true
nav_order: 1
has_toc: true
---
104 changes: 104 additions & 0 deletions borndigital/bestpractices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
layout: page
title: "Best Practices for Born-Digital Accessioning: Appraisal and Migration"
permalink: /borndigital/bestpractice/
parent: Accessioning and Ingesting Born-Digital Records
grand_parent: Supporting Born-Digital Records
nav_order: 1
---

# Best Practices for Born-Digital Accessioning: Appraisal and Migration

LAI Digital Services

Draft: 2023-11-30

## Background

Quality appraisal and migration of digital records provides benefits that increase the long-term effectiveness of storage, preservation, and access. The costs of managing digital records provides incentive for practitioners to appraise records before ingestion into a repository’s digital collections when possible. Ultimately, the level of access that users can expect for born-digital records is impacted by the level of appraisal and migration afforded at the time of accessioning.

The appraisal and migration of born-digital records can not be a wholly prescriptive practice. Practitioners must make decisions throughout the process based on the context of the records, their unique characteristics, and the limitations of technology, labor, and resources at hand.

### Appraisal

Not every record has value worthy of inclusion in an archival collection. Therefore, practitioners must assess value on some level to make decisions about what to accession. To do this, the practitioner must develop an understanding of the records, the context of the creation/use, and the broader information ecosystem in which they exist. In a digital context, appraisal has similar benefits as it does with physical records. Providing long-term preservation and access to archival records has significant costs. Appraisal makes archives easier to use and manage.

It should be noted that with digital records there is often a desire to fall back on more traditional modes of appraisal. [Jenkinsonian](https://www.jstor.org/stable/40294205) inspired approaches are often taken, in part, because digital records take up no physical space. It’s perceived as easy to save everything that is transferred or offered to a repository in digital form. This is untrue. We know that these passive approaches towards appraisal accrue significant environmental, labor, and resource costs overtime. There may be value in the Jenkinsonian inspired methods for preserving [evidential value](https://dictionary.archivists.org/entry/evidential-value.html) of digital records, but the benefits are often outweighed by the costs.

### Migration

Migration encompasses the actions of acquiring, imaging, and transferring records into a repository's care. Migration may further develop the repository's understanding of the records. Decisions made during the migration process directly affect the experience that future users will have with the records.

## Decision Matrices

The below provides 3 possible scenarios that represent different levels of appraisal and migration. Practitioners might mix and match aspects of multiple levels depending on the characteristics of records and the resources available. A variety of tools, software, and hardware might be used to accomplish each task.

### Level 1. Full Migration

<table><thead><tr><th><p>Transfer</p></th><th><p>Understand/Review/Act</p></th><th><p>Documentation</p></th><th><p>Storage</p></th></tr><tr><th><ul><li>Disk image (physical media)</li><li>File Transfer</li><li>Robocopy</li><li>Virus scan</li></ul></th><th><ul><li>Search and Remediate for PII/PPI</li><li>Perform file identification and review file formats. Migrate to preferred formats as necessary</li><li>Review for bad file names or junk files.</li><li>Review files for duplicate files.</li></ul></th><th><ul><li>Create and or update ArchivesSpace accession record</li><li>Capture information about physical media.</li><li>BagIt</li></ul></th><th><ul><li>BagIt</li><li><a href="https://github.com/gwu-libraries/digital-stewardship-services">Tier 2 storage</a></li></ul></th></tr></thead></table>

### Level 2. Intermediate Migration

<table><thead><tr><th><p>Transfer</p></th><th><p>Understand/Review/Act</p></th><th><p>Documentation</p></th><th><p>Storage</p></th></tr><tr><th><ul><li>Disk image (physical media)</li><li>File Transfer</li></ul></th><th><ul><li>Perform file identification</li><li>Review for junk files</li></ul></th><th><ul><li>Create and update ArchivesSpace accession record</li><li>Capture information about physical media</li></ul></th><th><ul><li>BagIt</li><li><a href="https://github.com/gwu-libraries/digital-stewardship-services">Tier 2 storage</a></li></ul></th></tr></thead></table>

### Level 3. Base-line Migration

<table><thead><tr><th><p>Transfer</p></th><th><p>Understand/Review/Act</p></th><th><p>Documentation</p></th><th><p>Storage</p></th></tr><tr><th><ul><li>Disk Image</li><li>File Transfer</li></ul></th><th></th><th><ul><li>Create and update ArchivesSpace accession record</li></ul></th><th><ul><li><a href="https://github.com/gwu-libraries/digital-stewardship-services">Tier 2 storage</a></li></ul></th></tr></thead></table>

## Example Narratives

### Example 1. University Office transferring records to the University Archives

**Transfer**: The Office of the University President has transferred 40gb (1008 files) of records to the University Archives via file sharing platforms (Box, Google Drive, ect.) and internal systems (Office of Media Relation’s Digital Asset Management system). The University Archivist and Digital Services Manager compiled the files from various sources provided by the transferring office into a single submission information packet (SIP). A virus scan was not completed since the files previously existed solely within GWU systems.

**Understand/Review/Act:** [DROID](https://github.com/digital-preservation/droid) was used to perform file format identification. No proprietary file formats were found, therefore no file formats were migrated/reformatted. [Simple python scripts](https://github.com/DaltonAlves/GWU/tree/main/DigitalAccesioning) were run over the directory to identify files with bad file names (invalid or non-ASCII characters), 0-bit (empty) files, duplicate files, and hidden subdirectories/files. Duplicate files were reviewed by the practitioner and removed from the SIP. No bad/junk files were identified, so no further action was taken.

Through discussions with the transferring office it was determined that no restrictions apply to the records. In addition, the majority of the files were images with limited embedded metadata. Therefore, a PII search was not conducted.

The arrangement of the files as they were transferred was relatively organized and provided some eventual value related to the creation of the records, so the arrangement was maintained.

**Documentation:** An accession record was created in ArchivesSpace. The accession records include information about restriction status, extent of the accession, and the source and date of the transfer.

**Storage:** The SIP was ingested into the SCRC preservation environment as an archival information packet. The contents of the AIP are registered in an inventory by the [simple audit tool](https://github.com/gwu-libraries/audit-tool). The URI of the AIP is linked to the accession record in ArchivesSpace.

### Example 2. University Office transferring records with restrictions to University Archives

**Transfer**: The Office of the University President has transferred 4mb (50 files) of records to the University Archives via file sharing platform (Box). The files were not created nor used in the environment (Box). Therefore robocopy (or similar tools) were not used to transfer the files.

**Understand/Review/Act:** Per the University’s record retention schedule the records transferred are closed for a set period of time (50 years from date of record creation). An examination of the files indicated that the earliest file was dated 2021. Date information was appended to file names by the record creator, but in an sporadic manner. A python script was run over the files to standardize the date in the file name in YYYY-MM-DD formatting. This was done to facilitate access based upon the restriction. Spaces were also removed from file names and replaced with underscores. No PII search was conducted since the records will be closed for 50 years. A search for duplicate files was conducted and none were found.

**Documentation:** An accession record was created in ArchivesSpace. The accession records include information about restriction status, extent of the accession, and the source and date of the transfer, and the restriction. BagIt was used to package the submission information packet. This also allowed for some basic metadata related to rights to be attached to the AIP. The 50 year restriction and lack of a PII search was noted in the BagIt info file.

**Transfer:** The SIP was ingested into the SCRC preservation environment as an archival information packet. BagIt was used on the to package the SIP with relevant metadata including information about the restriction. The contents of the AIP are registered in an inventory by the [simple audit tool](https://github.com/gwu-libraries/audit-tool). The URI of the AIP is linked to the accession record in ArchivesSpace.

### Example 3. Record creator electronically transferring records as part of personal papers collection

**Transfer:** A record creator has transferred 450mb (168 files) of records as an accretion to an existing fonds. The records were transmitted via a file sharing platform (Box). A virus scan was completed on the files before ingestion into GW systems. Robocopy was not used because no physical media was used during the transfer and files were not used nor created by the record creator in the environment in which they were transferred.

**Understand/Review/Act:** Through discussions with the record creator and examination of the files it was understood that the files were not systematically arranged by the record creator. Rather, they were accumulated over-time in a file sharing platform in an unorganized and inconsistent manner. NARA’s [Electronic Records Accessioning Support Tools](https://github.com/usnationalarchives/Electronic-Records-Accessioning-Support-Tools) were used to identify file formats, duplicate files, and to search for “funny” file names (non-ASCII characters). Many files were present in multiple file formats. For example, PDFs and text documents (.pages) containing the same content. Pages files are a proprietary document format supported only on Apple devices. To facilitate future use, unique pages files (files with no PDF extant equivalent) were migrated to the PDF/A file format. Duplicate files were not retained once it was confirmed that they existed in another preferred file format (PDF/A).

Some arrangement was carried out to better structure the files after consultation with the record creator. Files were organized to be more consistent in a hierarchical structure (grouping together project files, organizing correspondence into subdirectories, ect).

Documentation: An accession record was created in ArchivesSpace. The accession record includes information about restriction status, extent of the accession, and the source and date of the transfer.

**Storage:** the SIP was ingested into the SCRC preservation environment as an archival information packet. The contents of the AIP are registered in an inventory by the [simple audit tool](https://github.com/gwu-libraries/audit-tool). The URI of the AIP is linked to the accession record in ArchivesSpace.

### Example 4. Physical Media transferred with non-digital records

Transfer: A record creator has transferred 2 linear feet of records as an [accretion](https://dictionary.archivists.org/entry/accretion.html) to an existing fonds. Within this material are physical media carriers containing born-digital records. A virus scan of the physical media was performed. Robocopy was used to migrate the files off of the physical media to retain file and folder attributes (timestamps, ownership information, and other attributes). A write-blocker device was used when handling the digital files on the physical media carriers.

**Understand/Review/Act:** The physical media devices contained less than 1mb of data in about 20 files across approximately 10 media carriers (3.5 IN floppy discs). The files on each floppy did not contain file extensions (file formats), but upon closer examination they were identified as plain text files. The physical media did contain important contextual and descriptive information. This information was used in the archival description. It was decided not to photograph the physical media since the media would be retained within the collections. Unlike previous examples, the review of these files was done via mostly manual means because of the simplicity of the files and the amount of files.

**Documentation:** The accession record was updated to reflect the information learned about the digital files: the extent (size and # of files) and dates (from the original timestamps).

**Storage: The original files without file formats and migrated file formats (.txt files) were ingested as a single SIP.** The contents of the AIP are registered in an inventory by the [simple audit tool](https://github.com/gwu-libraries/audit-tool). The URI of the AIP is linked to the accession record in ArchivesSpace.

## Thoughts for the Future

While many appraisal and migration activities require the expertise of archivists and other practitioners the overall process leverages many tools that make the process scalable. These tools should be further explored to identify what works best for the needs of LAI and is usable given the available resources and skill sets of staff.

Packaging of tools as microservices in a broader system to support digital accessioning has many benefits. Examples of these types of services include: [Archivematica](https://www.archivematica.org/en/), [Aurora](https://github.com/RockefellerArchiveCenter/aurora) (and the broader [Project Electron](https://github.com/RockefellerArchiveCenter/project_electron)), [FITS](https://projects.iq.harvard.edu/fits), [Preservica](https://preservica.com/), and [BitCurator](https://bitcurator.net/).

In addition, ArchivesSpace and BagIt info files are currently the only available resources for recording and storing important administrative and preservation metadata information for AIPs. ArchivesSpace (and Encoded Archival Description) as a system of record is not ideal for these types of metadata. LAI’s digital stewardship program does not presently support broader use of XML-based metadata for management of digital objects outside of EAD.

Further exploration of packaged microservices and more robust support for metadata is crucial as the scale and complexity of digital accessions increases.
13 changes: 13 additions & 0 deletions borndigital/borndigital.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
layout: page
title: "Supporting Born-Digital Records"
permalink: /borndigital/
has_children: true
nav_order: 2

---
# Supporting Born-Digital Special Collections

The Digital Services unit supports the accessioning and transfer of digital records into the collections of GWLAI. This work is focused on addressing the backlog of digital media found in GW LAI’s Specialized Collections and supporting the accessioning of new digital media and born-digital content.

This work relates to workflows and tools that are necessary for ensuring the proper stewardship of digital records and is therefore closely related to the unit’s digital stewardship work.
Loading

0 comments on commit e38e4b4

Please sign in to comment.