[originally hosted at AMIA-DLF Hack Day 2013]
- Date: November 6, 2013
- Time: ~9am-5pm (with option of continued work projects throughout the conference in our Developer Lounge at Richmond Mariott, Apple Boardroom - available all day Thursday and Friday)
- Location: Salon B at the Crowne Plaza Richmond Downtown in Richmond, VA
- hashtag: #AVhack13
- IRC: #curatecamp_avpres_1 If using an IRC client the server is chat.freenode.net, or you can use your browser and connect to webchat.freenode.net. If you are unfamiliar with IRC, take a look at this ☞ brief introduction.
- Light breakfast, snacks and coffee will be provided throughout the day!
Sign up! As this will be a highly participatory event, registration is limited to those willing to get their hands dirty, so no onlookers please.
If you are unsure whether you can or want to participate in the hack day itself, you can still see the results by attending the AMIA closing plenary, where hack day projects will be presented, and the audience will have an opportunity to vote on their favorites.
In advance of the hack day, project ideas will be collected through the registration form and the event wiki. In advance of the event, participants will review and discuss submitted project ideas. We’ll then break into groups consisting of technologists and practitioners, selecting an idea to work on together for the day and (if desired) throughout the duration of the AMIA conference in the developers lounge.
The day itself will be structured something like this. Breakfast, coffee/tea, and snacks will be provided. Lunch is on your own.
9am – Welcome, introductions, and breakfast
9:30 - noon - Hacking. Snacks and coffee to be served.
Noon-1pm – Lunch on your own.
1 - 4:30 - Hacking. Snacks and coffee will be served.
4:30 - 5 - Wrap up.
Projects will be presented during the conference closing plenary, Saturday November 9 at 9:30am. Projects will be judged by a panel as well as by conference attendees.
In association with the annual conference, the Association of Moving Image Archivists will host its first ever hack day on November 6, 2013 in Richmond, VA. The event will be a unique opportunity for practitioners and managers of digital audiovisual collections to join with developers and engineers for an intense day of collaboration to develop solutions for digital audiovisual preservation and access. It will be fun and practical…and there will be prizes!
This year's hack day is a partnership between AMIA and the Digital Library Federation. A robust and diverse community of practitioners who advance research, teaching and learning through the application of digital library research, technology and services, DLF brings years of experience creating and hosting events designed to foster collaboration and develop shared solutions for common challenges.
Content managers and preservation practitioners are as central to the success of the event as having keen developers. YOU will be responsible for setting the agenda and the outcomes. The goal is to foster collaboration between audiovisual preservation specialists and technologists, to solve problems together and share expertise.
A hack day or hackathon is an event that brings together computer technologists and practitioners for an intense period of problem solving through computer programming. Within digital preservation and curation communities, hack days provide an opportunity for archivists, collection managers, and others to work together with technologists to develop software solutions for digital collections management needs. Hack days have been held independently by groups such as the Open Planets Foundation, as well as in association with preservation and access oriented conferences including Open Repositories and Museums and the Web.
The manifesto of a recent event at the Open Repositories conference framed the benefits this way: “Transparent, fun, open collaboration in diversely constituted teams...The creation of new professional networks over the ossification of old ones. Effective engagement of non-developers (researchers, repository managers) in development...Work done at the conference over presentation of something prepared earlier.”
An audiovisual preservation-themed CURATEcamp was held in April 2013, drawing over 120 registrants from at least 3 continents for a day of great conversations and lightning talks. CURATEcamp is as series of unconference-style events focused on connecting practitioners and technologists interested in digital curation. The event generated a lot of documentation and articulated many shared concerns. Topics covered included digitization of video, film scanning, digital storage strategies, proprietary digital video files in collections, and technical metadata for preservation. The participants of the event agreed that more work needed to be done and action taken, so the idea for an AMIA hack day was born.
Discussions between managers of audiovisual collections and solutions developers provided a fruitful starting point for a hack day project ideas, including:
- Simple fixity tools to use when transferring files from one storage medium to another
- Technical metadata extraction and making use of these reports (MediaInfo, ffprobe)
- Simple cataloging tools for AV, with eye towards contemporary frameworks/schema
- Discovery tools/UX for audiovisual collections, access at scale
Manifesto:
- Transparent, fun, open collaboration in diversely constituted teams over individual brilliance and/or groups of like individuals in cut-throat competition.
- The creation of new professional networks over the ossification of old ones.
- Effective engagement of non-developers (researchers, repository managers) in development over purely developer driven projects.
- Work done at the conference over presentation of something prepared earlier.
Please register for the hack day (we're currently at capacity, but forming a wait list) and we will start adding your ideas here for voting in advance of the Hack Day!
Possible topics projects could touch on: fixity checking; transcoding; metadata validation; automating file movement; altering fdupes so that it will show user md5 checksum hash; alter Archivematica 1.0 code to bypass zipping the AIP.
Loose metadata projects ideas: Segmentation and time-based annotation of video segments on the web (maybe leveraging Media Fragments?); XSLT mapping; Turn CSV fields into PREMIS xml; Using geolocation information to facilitate new access pathways to video; RDFing PBCore, potentially to leverage in Fedora 4
Loose non-code projects ideas: Editing/adding wikipedia pages, create a manual for a tool or a workflow, create a webpage
Please submit your project ideas using the format below. Remember, the more specific the better. Have a look at the project descriptions from Open Repositories 2013 for inspiration.
Sign up for projects you are interested in here
Signing up in advance does not mean you are committed to work on that project. And it does not mean these are the only projects. There will still be an opportunity to add additional projects on the day of the event and sign up for those as well.
The projects below were discussed during a Google Hangout on November 1, 2013. For more information, please see the notes from that conversation.
1. The 608ers: Timebased transcript/caption display
Two proposals have merged into one:
Extraction of EIA-608/line 21 closed caption information: Ability to extract and reuse closed caption information from NTSC video.
Interactive Video/Transcript Streaming: This project would use the open source Interactive Video/Transcript viewer package as a baseline for streaming video and transcripts. This package has weak support and is becoming increasingly difficult to maintain. The hope is to come up with an approach to build or improve upon the existing system to reliably stream video files with their time coded transcripts across multiple browser and OS types.
Original input is mp4 interviews spoken in Inuktitut with their English transcripts in doc format. The video and transcript need to be streamed/played simultaneously.
Notes from the Nov 1 planning call
Maybe: http://ccextractor.sourceforge.net/ Also: http://dev.w3.org/html5/webvtt/
The original IVT package is here: IVT.zip
Uncompressed video files that contain line 21 closed caption information
Sample Data: CanadaVideoTranscripts.zip
The existing IVT player is running here: Live Site
Steven Villereal
Chris McNeave
Interested team members/participant roles Who wants to work on this project?
Would like to generate and include mediainfo key/value pairs into DFXML for forensic disk images that contain audio or video files. This could be accomplished through the FIWalk utility's DGI interface.
Use case: An archive has acquired hard drives containing mixed file formats, including media formats. In order to prevent any further modifications of the drives they have been forensically imaged. To plan for the work necessary to preserve, process and provide access to the media files in the future, the repository would like the ability generate a report on the types and extents of media files within disk images.
Reconciling filenames with embedded technical metadata/named parameters: I'd like to explore if it would be possible to compare embedded technical metadata (file/MIME type/external signature) to existing media filenames to ensure that all files in a given directory are what they are supposed to be according to the extension. There can be messages/a report if any files do not match your named parameters.
Potential User Story: As a CONTENT MANAGER, I need to verify that files with an "mov" extension in a named directory (*.mov) are Quicktime files so that I can ensure filenames accurately represent embedded technical metadata.
Pre-conditions: Specifications of files already determined (ie all access files are qt wrapped .mov), Have associated utilities available to read metadata
Post conditions: Filenames include accurate extension, content manager is delivered a report of any/all inaccurately named files in directory.
MediaWalker (Texas Ranger)
Here is our GitHub repository: https://github.com/dmmd/AMIA_HACK/
Includes the Python script and sample files.
dmmd kgrons walterforsberg yvonneng groakus mistydemeo
Notes from the Nov 1 planning call
MediaWalker Documentation (public): https://docs.google.com/spreadsheet/ccc?key=0ArMWuWMTUNRgdGNMbDYyRzZPOTBpTDJsU2R6cFZWRnc&usp=sharing
a pdf of what dfxml looks like + mocked up mediainfo: https://docs.google.com/file/d/0B1hVT_M0h1f_VnVqZnV4R0J1amc/edit
FFprobe output description: http://stackoverflow.com/questions/3199489/meaning-of-ffmpeg-output-tbc-tbn-tbr
Registries for extension associations (ex. PRONOM: http://www.nationalarchives.gov.uk/PRONOM/Default.aspx)
MediaInfo: http://mediaarea.net/en/MediaInfo
Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/
Georgetown University Lib File Analyzer?: https://github.com/Georgetown-University-Libraries/File-Analyzer
http://www.sleuthkit.org/sleuthkit/ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.5362&rep=rep1&type=pdf https://raw.github.com/dfxml-working-group/dfxml_schema/v1.1.0/dfxml.xsd http://mediaarea.net/en/MediaInfo
FFMPEG: http://www.ffmpeg.org/download.html
Donald Mennerich
Kathryn Gronsbell
Forensic disk images containing audio and video files
Misty De Meo Jason Evans Groth Walter Forsberg Kathryn Gronsbell Donald Mennerich Yvonne Ng
https://github.com/dmmd/AMIA_HACK/blob/master/fiwalk/amia.xml
https://github.com/dmmd/AMIA_HACK/blob/master/fiwalk/mediainfo.py
https://github.com/dmmd/AMIA_HACK/blob/master/fiwalk/ficonfig.txt
Ensure that the following are installed on your computer: xcode: https://developer.apple.com/xcode/ homebrew: http://brew.sh/ mediainfo: On the command line enter $ brew install mediainfo sleuthkit: On the command line enter $ brew install sleuthkit python: On the command line enter $ brew install python
I don't have Python installed on my computer!!! Don't freak. PIP is a Python package installer. You need it to install LXML, which is a popular parser for Python. Here's the command line:
$ sudo easy_install pip
Then, type the command, below (it takes awhile). Lxml is the most popular xml parser, by the way:
$ sudo pip install lxml
While you're waiting, you need to customize the config file!
Open "ficonfig.txt" in a text editor. Then, update the location of the script on your local drive. It initially looks like: * dgi python fiwalk/mediainfo.py and you insert your filepath before the fiwalk.mediainfo.py portion...thus, becoming (as an eg.) * dgi python /Users/yvonne/Desktop/amia_hack/fiwalk/mediainfo.py. Whew! Save your doc, and close.
Pre-config of .txt
Post-config sample of .txt
Then, you want to run Mediawalker in fiwalk. This command will create your DFXML file with audio/video metadata! Make sure that you redirect the standard output with a ">" to a filename ending in ".xml"
$ fiwalk -xc /FILEPATH/TO/ficonfig.txt /FILEPATH/TO/yourDiskImage > /PREFERRED/DESTINATION/FOR/THE/DFXML.xml
Metadata is becoming more and more important for various aspects of video archiving (i.e. conservation, management, access etc.), but there is little help for non-AV specialist practicioners. An easy-to-use tool with a simple graphical interface could be one valuable element. The project could be to develop a tool for editing existing (or self-developed) metadata schemas/standards with export functionalities producing schemas in useful formats (like XML, stylesheets, etc.) useable in widespread programmes used for collection management/description (like FileMaker, Excel, Access, etc.). An additional part of such a tool could be a mapping and data transformation element, allowing users to map one existing schema (in different file formats like XML, CSV) to a target schema (like EBUCore) and transform existing data. An online version of such a tool could collect and disseminate edited schemas, crosswalks, mapping schemas, etc. and serve as exchange platform.
-
MINT is an online tool used in several AV-heritage projects that is open source. Source code not available, but info here: http://mint.image.ece.ntua.gr/redmine/projects/mint/wiki/Introduction
-
any interest in creating mappings to allow [dp.la/info/about/faq/ DPLA] to expose richer metadata about sound/moving image content? {DPLA crosswalks here](https://docs.google.com/spreadsheet/ccc?key=0ApDps8nOS9g5dHp5ZE10YS10amljdXRVS3dqYXRvTXc#gid=0) or more info as needed...
Yves Niederhäuser
Esha DattaMeghan Fitzgerald Yves Niederhäuser Lai-Tee Phang Nick Richardson Neale Stokes Pamela Vizner
M.O.D.E.M. (metadata organising, developing, editing and mapping)
Your Path to a Shiny New Schema! Problem Many schemas and frameworks exist for AV metadata, which rarely fully meet the needs of any given organization or collection; the use of standards is not widespread in AV conservation, leading to problems for sharing and aggregating on access platforms. Additionally, data is often supplied in forms that are not suitable for direct upload into the collection management system/database, requiring time and IT skills to manually correct prior to ingest.
Project Scope
This application will be developed iteratively. Right now, it will create a custom metadata schema derived from an uploaded data set and existing metadata schemas.
In its first phase, this application will not perform data mapping and transformation; these functions might be available in a later phase. (See Further Work.) It will be a web-based application using Javascript. In its first iteration, it will only be able to import files in XML or CSV formats, and will only export schemas in XML and CSV formats.
Intended User Base
Librarians, archivists, and other professionals with limited knowledge of AV metadata standards, who nevertheless may be tasked with the creation, management, or transformation of moving image metadata, and are also limited in proficiency with programming or do not have advanced IT knowledge. This application is very source-agnostic and it can be used to create custom schemas from any metadata standards.
Benefits
• Provides a generic interface to develop a custom metadata schema based on existing data sets (extracting elements) and (standard) schemas.
• Provides a generic interface to map disparate data sources to various metadata schemas.
• Is scalable to different projects and different types of metadata.
• Creates efficiencies as staff time previously used to manually manage metadata can be used for other tasks.
• Allows for ease of metadata standardization across databases and systems.
High Level Functional Requirements
• Ability to import metadata schemas for AV materials
• Ability to extract fields from an uploaded data set as data elements
• Ability to rearrange, split, merge, rename, compare and map data elements from imported schemas and data set, to create a custom metadata model
• Ability to export custom metadata model as XML and CSV
• Ability to keep track of what is being mapped and the sources of the constituent data elements
User Stories
• As a metadata manager, I work with multiple streams of non-standardized data that need to be collated and made to conform to my organization’s data model. I would like a system where I can integrate the multiple streams of metadata, map them to my model, and create a custom schema that correctly describes the asset to an accepted standard according to that model.
• As a metadata manager, I work with legacy systems that do not all use the same metadata schema. I would like a system that will allow me to easily map the data in those legacy systems to a single standard.
• As a metadata analyst, I have user generated metadata coming in from various projects. I work with books that are digitized, e-books, postcard digitization and digital video projects. I want a system where I can map user generated metadata to various fields from other metadata standards.
• As an archivist, I have to develop a metadata schema suitable for AV media, which can be integrated with existing tools (finding aids etc.).
• As an archivist, I receive information from content generators that does not conform to a format that I can use. It seems unlikely that they can be initially "trained" to provide data in a more standardised format. I would like a tool that will allow me to make my work more efficient while more easily providing feedback to data suppliers on how this data might be improved for my use.
Further Work
Suggestions for future development
• Creation of data entry form based on custom schema and rules determined in data mapping step so that content providers can conform to proper data formatting
• Ability to save custom models previously created as templates
• Ability to collect and disseminate edited schemas, crosswalks, mapping schemas, etc. and serve as a sharing platform
• Create RDF functionality from generated metadata schemas.
• Explore data mapping and transformation functionality: the ability to import sets of data and perform mapping and transformation to conform to the custom metadata model/schema; user-friendly interface to create rules for simple data transformation, e.g. standardise date format, provide default value for fields with no data, etc.
4. The Amazing METS: Creating a Sample METS (Addressing METS Specification) for Digitization Project of Analog Audiovisual Collections
Several sets of specifications are already available for creating a METS schema. But I have not really heard of any complete METS example that is boilerplated to work for a real digitization project. University of Michigan, after several trials to look for existing schema that we can piggyback, is currently creating an example METS for outsourced digitization project that can be used from end to end. The application programmer at Digital Library Production Service department has created it out of the existing audio METS xml, VideoMD, and other spreadsheets that U of M has been using as interim means. And several related people are now discussing and examining that sample section by section. I would like to know if a group can sit and investigate this current sample and give comments/feedback about its possible limitations/errors/issues to make a better version out of it. If the whole sample is too big to work on in a day, I would like to propose to review the process history/provenance section only since that could be the most challenging section to tackle due to complicated video digitization process itself. If we can come up with anything that seems to work as a working sample, it can be shared/distributed and used at this standard-less age.
And here again, an online version of such a tool could collect and disseminate edited schemas, crosswalks, mapping schemas etc. and serve as exchange platform.
Notes from the Nov 1 planning call Day of notes
Here are very drafty draft that UM programmer created. There are many notes and it does not quite look complete but I believe this can be a starting point. More than anything, we are in need of any outsiders who can review this with fresh eyes and many other different experiences.
Both the video process history schema and example METS are located in the directory: http://www-personal.umich.edu/~grosscol/vprocesshistory/
- reVTMD possibly useful to incorporate?
- interest in PREMIS event entity w. this info?
Knowledge of video/audio metadata, familiarity with audiovisual digitization project?
Existing metadata set that are created from the digitization project at each institution
JungYun Oh
The Amazing METS! JungYun Oh Hannah Frost Kara Van Malssen Emily Nabasny
Use Case: Our use case for project is a single content item, on analog video tape, which is reformatted to a digital file set. The source object, digitization process, and resulting file set should be described in one METS file.
Solution: Our objective for the purposes of the hack day is to articulate a content model within METS for reformatted video content. Our goals include:
- Identify existing schemas that can be used to express the various components of the METS file
- Determine how those schemas should be expressed within METS containers
- Identify minimal fields that should be captured using those schemas within METS
- Articulate controlled vocabularies when applicable
- Create a sample METS file for a U-matic source object, which is reformatted and results in a preservation master and mezzanine file, document the sample describing element usage, and enumerating recommended controlled vocabularies where appropriate
- Refine the model with input from the community with the goal of eventually creating a METS profile which can be added to the profile registry maintained by the standards office at the Library of Congress.
The project is maintained in a github repository. Documentation and notes are available here.
5. Fast Forward: Produce easy-to-follow documentation for the installation and use of FFMPEG transcoding software
Specific usage topics might include batch transcoding, metadata extraction, common output profiles, and FFMPEG version upgrades. Evaluation of available GUI's might also be included as a secondary goal.
http://sourceforge.net/projects/ffmpeg-gui/
check also: http://www.reto.ch/training/2013/20130503/ (its in German, but commands are commands...)
Kathryn Gronsbell: Helpful hints for basic FFMPEG from Kelly Haydon https://docs.google.com/document/d/1zbThoqnEl50Yw_fG9prHSptlIjo6tdteieVq4XP4K_E/edit?usp=sharing
Windows/MAC/Linux Operating Systems, Document Writing, Digital Media Transcoding
Sample media files for transcode tests
Nash Bly
Software Testers, Media Transcoders, Document Writers - Who wants to work on this project?
ffmpeg hackday notes - https://docs.google.com/document/d/1RFlXJGXChbIwNXs3Ka01sHj-RXNEAt1h9yWPpFvZUJ4/edit?usp=sharing
Merged with Timecoded transcripts and FFMPEG documentation: Moving Image Research Collections Digital Video Repository
Several potential ideas for improving this DVR that can hopefully be integrated into other sites…
- Timecode-based tagging in videos or other ways to allow for user-generated metadata
- A way to connect related video material
- scripts for transcoding video (modifying an existing script)
- Issues in XACML restrictions / easy way to make records public/non-public Possible starting points
DVR: http://mirc.sc.edu Git: https://github.com/DGI-USC
Drupal knowledge, Fedora/Islandora, ffmpeg, Python
Video files, records, scripts, the DVR itself? (Providable.)
Ashley Blewer
Where do I start once I decided digitization is the right thing to do for my video collection? How do I decide whether to built up infrastructure/know how in-house or to outsource digitization? How do I need to prepare analog tapes for best results and minimal risk? What information do I need and which requirements do I have to ask for in a call for tenders? What do I have to do and how to controll the quality of digitization? How do I store the new archive masters and access copies? Which codecs/formats are best in my case? A little stand-alone or online tool for video collections/non-specialist practicioners, maybe something like an interactive flow-chart or decision path, that helps to ask the right questions and produces an automatic report after running it could be a big help for lots of non-video-specialist collection managers and serve as starting point for consultations, evaluation of tenders, convincing of decision makers etc. A possible online version of a tool like this could integrate a "similar projects"-functionality, pointing collection managers to other projects/people with experience in similar cases and thereby built up/strenghten a network for exchange. I think there is still a big potential in bringing people of this field together!
There are tons of online survey tools that maybe could be used as technical starting point; the right set of questions could be collected/priorized/structured during the hack day.
Unknown technical/developer's skills and some video digitization and collection management expertise is needed for this project.
None.
Yves Niederhäuser
==4. Format/codec evaluation/selection tool== "What format should I use when digitizing my videos?" This is by far the most heard question for video archiving consultants, I guess. But possible answers are complicated and very context-related, say: often frustrating for the asking non-specialist practicioners as for consultants. For a possible hack day project see description of submitted idea for a digitization workflow development tool above. A format/codec evaluation/selection tool could be part of or a first element of this bigger tool.
Notes from the Nov 1 planning call
See idea for a digitization workflow development tool above.
See idea for a digitization workflow development tool above.
See idea for a digitization workflow development tool above.
Yves Niederhäuser
Who wants to work on this project?
==8. CURATEcamp-syle discussion==
For those that are more interested in meeting up with other folks for discussion and brainstorming on specific topics, we are setting aside an area for a CURATEcamp style "unconference" breakout groups. Folks interested should come prepared with potential topics for discussion. These will be gathered on the morning of the event, and voted on by those registrants in the CURATEcamp stream. For more information, please visit the CURATEcamp website, and see the documentation from CURATEcamp AVpres 2013 held in April 2013.
Please note that while discussion groups are not discouraged, these groups will not be eligible for awards.
RDFing PBCore Let's see if we can come up with a RDF expression for PBCore. Could be useful for things like the up and coming Fedora 4.
Notes from the Nov 1 planning call
http://pbcore.org/index.php http://www.w3.org/TR/REC-rdf-syntax/ http://dublincore.org/documents/dc-rdf/ Bawstun app from WGBH...can output PBCore XML from EBUCore RDF...could be reverse engineered? https://github.com/curationexperts/bawstun/tree/master/app/models
Any of: knowledge of pbcore, XML/RDF, OWL, metadata schema in general
Sample PBCore (to be provided)
Kara Van Malssen (idea by Karen Cariani)
Who wants to work on this project?