Skip to Content

Fundamentals of AV Preservation - Chapter 4

 5.1 Introduction to Metadata | 5.2 Types of Metadata | 5.3 Embedded Metadata |  5.4 Standards, Schemas, and Guidelines | 5.5 Resources  

Section 5: Metadata

5.1 Introduction to Metadata

We recommend that you read this section together with “Metadata,” in Chapter 3, Section 4.1 , to get a complete understanding of the types of metadata pertinent to audiovisual collections, how metadata is generated, and standards for capture. 

Metadata is the information about a digital file that allows us to understand, use, manage, and preserve it. Without it, we would not know about the file (e.g., the title, who created it, and on what date), what the file is (e.g., the wrapper or codec in use, data rate, pixel dimensions, duration, etc.), how it relates to other files (e.g., part one of three), and how it has been monitored over its lifetime (e.g., fixity checks). Without the appropriate metadata, a file becomes inaccessible and unusable, ultimately losing its value.  

Metadata is produced at various times during a file’s lifespan. Descriptive and technical metadata are often captured at the point of creation, such as during the digitization process. Preservation metadata, on the other hand, is generated on an ongoing basis to log actions performed on the file, including such activities as transferring it from one storage environment to another, performing fixity checks, or migrating from one format to another. Logging preservation metadata over time creates an audit trail that ensures a file remains authentic and accessible for the long-term.

5.2 Types of Metadata

Each type of metadata plays a different role that, along with a digital file, makes up the information package that ensures long-term access. It is useful, then, to describe the different types of metadata and how they affect a digital file.

Descriptive Metadata

Descriptive metadata is the information about a file or files that enables identification and discovery. Descriptive metadata includes the title, creator, date of creation, and keywords that document the subject of the file’s content.

Structural Metadata

Structural metadata is the information that designates how a set of files relate to one another, such as songs on a CD or how the parts of a single file are structured. For example, it could tell you that a file comprises a container file with one video, two audio, and two subtitles tracks.

Administrative Metadata

Administrative metadata includes information about how to manage a digital file and track its process history. This ranges from rights metadata, which indicates who owns or holds copyright for a file and how it can be used and accessed, to technical and preservation metadata, which are described in detail below. 

While all forms of metadata help provide long-term access to digital collections, technical metadata has particular significance for audiovisual content and preservation metadata is key to ensuring that digital content can be managed over time.

Technical Metadata

Technical metadata captures the essence of a digital file. It is the technical information that describes how a file functions and that enables a computer to understand it at the bit level, so that it can be played back in a way that is useful for a viewer or listener. Technical metadata includes information such as wrapper, codec, compression, and aspect ratios. Often, technical information is embedded in the file itself and is read directly by compatible software and hardware. For preservation purposes, technical metadata is extracted and stored outside the file so that as formats obsolesce and compatibility fades, a file’s basic structure is understood and can be migrated to a new format that allows it to be accessed. Many audiovisual metadata schemes rely on technical metadata—extracted from embedded metadata—as one method for maintaining the usability of digital content over time.

Preservation Metadata

Preservation metadata is the information necessary to support the management and long-term accessibility and usability of an object. It tracks the processes that are necessary to manage a file in a digital environment over time, including: monitoring fixity and performing any repairs that are identified during fixity checks, auditing logs to identify when and who has interacted with an object, monitoring obsolescence information, and documenting provenance information to support the authenticity of an object. Examples of preservation metadata include checksums, storage locations, and records of process activities and dates (for example, that a file is moved from one location to another and the date that the move occurred).

5.3 Embedded Metadata

Embedded metadata is the information that is stored within a file that also stores the content to which the metadata refers. For example, a WAVE file contains both the music and the technical information to play the file. Embedded metadata can also include descriptive information, as in mp3 files, which enables display of an artist, album, and title in applications that play them.12 Thinking about it another way, embedded metadata is the digital equivalent of physical labels, annotations, and written documentation stored inside a material housing or the video slates at the head of a recording. 

Embedding information about the holding organization (the data source that holds information about the object) and the copyright status also helps to identify the file if it becomes disassociated with the metadata that is part of its information package. The Federal Agencies Digitization Guidelines Initiative (FADGI) is a set of published guidelines for digitization processes and offers guidance for the use of embedded metadata in WAVE files.13 For example, the guidelines offer recommendations about how to store embedded metadata in WAVE files that result from the digitization process.

Extraction Tools

Because of the embedded nature of much technical metadata, tools have been developed to automate the extraction of this information from the files in which it is held. Two of those are: 

  • FITS, https://projects.iq.harvard.edu/fits, is a command-line tool developed by Harvard University Library that identifies, validates, and extracts technical metadata for digital objects, including some audiovisual formats. The metadata is exported into an XML file.

  • MediaInfo, https://mediaarea.net/en/MediaInfo, is an open-source program that extracts technical metadata about media assets and exports it into a variety of formats including txt, EBUCore, PBCore, and reVTMD, which are all described below. It works with a variety of audio and video formats and has a GUI interface, so command-line knowledge is not necessary. It is available for many operating systems.

5.4 Standards, Schemas, and Guidelines

Metadata standards, schema, and guidelines are invaluable to the creation, management, and sharing of information. They tell us how and why certain metadata should be captured, enabling us to easily understand metadata created by others and minimizing the obstacles of sharing information between systems. Metadata can be stored in Excel spreadsheets, as XML files, or in databases such as content management systems and institutional repositories, as well as in other formats. However metadata is stored, using standards to create and structure it will make it more broadly understood and interoperable.  

The standards and guidelines briefly described below are just a few of the most recognized and recommended for the management of audiovisual collections.

1. EBUCore

EBUCore is based on the Dublin Core standard and adapted to broadcast media. It is a descriptive and technical metadata schema developed and maintained by EBU, the largest professional association of broadcasters in the world. EBUCore captures the minimum information needed to describe radio and television content.

Link to more information about EBUCore: https://tech.ebu.ch/docs/tech/tech3293.pdf

Link to the EBUCore metadata specification: https://tech.ebu.ch/MetadataEbuCore

2. FADGI (Federal Agencies Digitization Guidelines Initiative)

Begun in 2007, this is a collaborative effort by US federal agencies to define common technical guidelines, methods, and practices for digitizing historical content and the capture of technical metadata. The focus of the audiovisual working group, in particular, is to identify, establish, and disseminate information about standards and practices for the digital reformatting of historical and cultural audiovisual materials by federal agencies, although the guidelines have seen broad use beyond the US government as well. The effort covers sound recordings, video recordings, motion picture film, and born-digital content. 

Link to more information about the FADGI audiovisual working group and its guidelines: http://www.digitizationguidelines.gov/audio-visual/

3. METS (Metadata Encoding and Transmission Standard)

The METS schema is a standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. METS provides an XML document format for encoding metadata necessary for both the management of digital library objects within a repository and the exchange of such objects between repositories or between repositories and their users. METS is a Digital Library Federation initiative that is maintained by the Library of Congress. 

Link to more information about METS: https://www.loc.gov/standards/mets/mets-home.html

4. PBCore (Public Broadcasting Metadata Dictionary Project)

PBCore is a metadata schema designed for sound and moving images. It can be used as a guideline for cataloging the descriptive and administrative information about audiovisual content.  It can also act as an exchange mechanism to share information between institutions or applications, and much more. PBCore expands on the Dublin Core standard. It was created by the US public broadcasting community and is maintained by WGBH in Boston. 

Link to more information about PBCore: http://pbcore.org/

5. PREMIS (Preservation Metadata: Implementation Strategies)

The PREMIS Data Dictionary for Preservation Metadata is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. PREMIS is a comprehensive, practical resource for implementing preservation metadata in digital archiving systems.14 It is maintained as a standard by the Library of Congress. 

Link to more information about PREMIS: http://www.loc.gov/standards/premis/

6. reVTMD

reVTMD is an XML schema tailored to include fields that address the creation and long-term management of reformatted videos, especially for the cultural heritage community. It is a concise subset of the large array of technical metadata available for digital media, structured in a way to make it highly usable for accessing and managing all types of video files. The captureHistory section is especially helpful in capturing process history for preservation purposes. reVTMD was developed by the US National Archives and Records Administration in collaboration with AVPreserve. 

Link to more information about reVTMD: https://www.weareavp.com/tag/revtmd/

Link to the reVTMD XML schema: https://www.archives.gov/preservation/products/reVTMD.xsd

5.5 Resources

Baca, Murtha, ed. Introduction to Metadata, 3rd edition. http://www.getty.edu/publications/intrometadata/

Greisinger, Peggy. “A brief overview of metadata for audiovisual materials,” November 6, 2014. http://ndsr.nycdigital.org/a-brief-overview-of-metadata-for-audiovisual-materials/

International Association of Sound and Audiovisual Archives. “Metadata.” In Task Force to establish Selection Criteria.http://www.iasa-web.org/task-force/7-metadata

Zeng, Marcia L. Metadata basics. http://metadataetc.org/metadatabasics/

 

 
Section 6: Planning for Obsolescence ›