Skip to content

Commit

Permalink
Merge pull request #8 from gwu-libraries/fall2024
Browse files Browse the repository at this point in the history
daos_file-names
  • Loading branch information
DaltonAlves authored Feb 11, 2025
2 parents 61bcd35 + 1a8499a commit f6f84fd
Show file tree
Hide file tree
Showing 5 changed files with 70 additions and 112 deletions.
115 changes: 40 additions & 75 deletions digitization/filenames.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,47 @@
---
layout: page
title: "Digitization - File Names"
title: "Naming and Organizing Digital Surrogate Files"
permalink: /digitization/filenames/
parent: Digitization
nav_order: 1
---
# File Naming Conventions
# Naming and Organizing Digital Surrogate Files

## General Rules
- Be consistent in your file naming for a project. Look at past related projects to help determine a structure for file names. Consistency within and across projects will allow for easier management and manipulation of files.
- Use leading zeroes when necessary. Never use a single digit without a leading zero.
- "1" = bad
- "01" = good
Prior to actually starting any digitization project you should decide on a structure and scheme for organizing and naming the files you produce.

## Suggested and Example File Name Structures

These are suggested file names. Depending on your project there may be valid reasons to deviate from these structures.

### Manuscript/Archival Material
- ms####_s##_c##_f##_i##_p###
- Breakdown of elements:
- ms = collection identifier
- s = series
- c = container (ignore the container type indicator in ArchivesSpace, use "c" regardless of type)
- f = folder
- i = item
- p = page
- Example = ms2123_s11_ss01_c31_f01.pdf


### File names for materials from Corcoran Archives
For materials from the Corcoran Archivces collections following structure should be used.

#### Collection IDs
The collection identifier should be simplified as follows:
- COR0001.0-RG -> cor1-0
- COR0003.1-RG -> cor3-1
- COR0013-MS -> cor13

#### Containers

Container numbers should be simplified as follows:
- Box RG2-2008.018 -> rg2-2008-018
- Box RG5.0-2008.029 -> rg5-0-2008-029

#### Full file name examaples:
cor2-0_s01_ss01_rg2-2008-001_f11_i01.tiff
cor5-0_s06_ss01_rg5-2008-020_f01.pdf

### Cataloged Books & Pamphlets (rarebooks)
- CallNumber_PageNumber
- Example = spec_ps3544_h56_page34.tif
- 'spec' may be replaced with other collection areas depending on call number in catalog (ex. mei, kiev)

### Serials (cataloged and from manuscript collections)
In general, it is best to create file names for serials that reflect the items' volume/sequential designation (ex. vol. 1, no. 1). For serials that exist within an archival collection, this information can be easily confused with the top container type "volume." Therefore, the volume designation of the actual work should be used over the instance record volume.

Example:
- RG0044_s39_vol12_no03
- GWNews_vol12_no03

It is also appropriate to use date information to form file names. This may be relevant if the volume/sequential designations are not present or irregular.
- FBNews_1965_10

Example:
- GWNews_199712 (Title_YYYYMM)
- GWTimes_199712-199801 (Title_YYYMM-YYYYMM)

### Audiovisual Material
In certain cases, an audiovisual work may have multiple parts. Maintaining consistency within the project and including as much collection identifying information as possible is essential.

In addition, audiovisual material is often minimally described and processed in an archival collection. Often a single archival object may represent many audiovisual items.

Examples of file names:
- collectionID_s#_c#_f#_i#
- collectionID_s#_c#_title_of_video
- collectionID_c#_title_of_video
- collectionID_s#_c#_f#_i#_part1
- CollectionID_c#_title_of_video_part2

### Born Digital Material
While this section is specifically for filenaming conventions used for digitized content, it should be mentioned that it is often inappropriate to change file names of born-digital records. The original file names, as given by the record creator/s, should be respected when possible. Normalization of born-digital file names can be done (removing bad characters, spaces, ect), but it is not recommended to try to make them fit any of the above schemes.
## Using RefIDs for File Names

For material that is represented in ArchivesSpace by a corresponding archival object record, the **refid** of the record should be used the basis of the file name and organization.

Start by creating a directory (folder) with the refid as the name. Any files that you produce should use the refid as the basis of the file name.

### Example: 2-sided cassette

```
root_folder/
├── ref9916/
├── ref9916_001.wav
├── ref9916_002.wav
└── derivatives/
├── ref9916_001.mp3
├── ref9916_001_caption_eng.vtt
├── ref9916_002.mp3
└── ref9916_002_caption_eng.vtt
```

### Example: text-based document
```
root_folder/
├── e203d9a24f90f062871a72fe359c7900/
├── e203d9a24f90f062871a72fe359c7900_001.tif
├── e203d9a24f90f062871a72fe359c7900_002.tif
├── e203d9a24f90f062871a72fe359c7900_003.tif
├── e203d9a24f90f062871a72fe359c7900_004.tif
├── e203d9a24f90f062871a72fe359c7900_005.tif
├── ...
└── derivatives/
└── e203d9a24f90f062871a72fe359c7900.pdf
```

## Born Digital Material
While this section is specifically for conventions used for digital surrogates, it should be mentioned that it is often inappropriate to change file names of born-digital records. The original file names, as given by the record creator/s, should be respected when possible. Normalization of born-digital file names can be done (removing bad characters, spaces, ect), but it is not recommended to try to make them fit any of the above schemes.
2 changes: 1 addition & 1 deletion digitization/imaging/imaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ title: "Digitization: Imaging Text and Graphics"
parent: Digitization
has_children: true
has_toc: true
nav_order: 1
nav_order: 2
---
36 changes: 0 additions & 36 deletions managing/audittool.md

This file was deleted.

24 changes: 24 additions & 0 deletions managing/bagit_profile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
layout: page
title: "Bag Profiles"
permalink: /bag/
parent: SCRC Digital Collection Storage
grand_parent: Managing Digital Collections - Access and Preservation
---
BagIt profiles used to package digital content for preservation storage should attempt to conform with single-level requirements for descriptions prescribed by DACS.

# Digitized Content

```
ArchivesSpace-URI:
Bag-Software-Agent:
BagIt-Profile-Identifier: scrc-digitization-profile.json
Bagging-Date:
Collection-ID:
End-Date:
Origin: digitization
Payload-Oxum:
Rights-ID:
Start-Date:
Title:
```
5 changes: 5 additions & 0 deletions managing/creatingdaos.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ nav_order: 1

These instructions are a work in progress. Efforts are underway to integrate the creation of digital object records within broader workflows. These new workflows would generate digital object records upon ingest into the preservation environment or access systems.

# Using package_and_ship
[package_and_ship](https://github.com/gwu-libraries/package_and_ship)

DAO records are automatically created when using the package_and_ship tool to ingest content into SCRC's digital collection storage. A DAO record created by this tool hold will hold `File URI` values that point to the content in the storage area.

# Using Digital Object Creator (Google Colab Notebook)
[Aspace Digital Object Creator](https://drive.google.com/drive/folders/1br8rcrGZlsoAOBGiLDVIG12c8szJwXuQ?usp=drive_link)

Expand Down

0 comments on commit f6f84fd

Please sign in to comment.