Metadata and Describing Data

Body

The Who, What, When, Where, Why, and How of Your Research

Metadata is documentation that describes data. Properly describing and documenting data allows users (yourself included) to understand and track important details of the work. Having metadata about the data also facilitates search and retrieval of the data when deposited in a data repository.

Metadata can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. In a lab setting, much of the content used to describe data is initially collected in a notebook. When possible, structure your metadata using an appropriate, agreed-upon metadata standard format. 

When no appropriate metadata standard exists, you may consider composing a "readme" style metadata document, as described in this guide from Cornell University.

Metadata Standards by Discipline

To find an appropriate metadata standard for your discipline, consider the Disciplinary Metadata guide (via the Digital Curation Center).

Additionally, a community-driven project manages an open directory of metadata standards.

Metadata Formats and Standards

Metadata can take many different forms, from free text to standardized, structured, machine-readable, extensible content. Specific disciplines, repositories or data centers may guide or even dictate the content and format of metadata, possibly using a formal standard. Because creation of standardized metadata can be difficult and time consuming, another consideration when selecting a standard is the availability of tools that can help generate the metadata (e.g. Morpho allows for easy creation of EML).

Some specific examples of metadata standards, both general and domain specific are:

  • Dublin Core - domain agnostic, basic and widely used metadata standard
  • Darwin Core - facilitate the sharing of information about biological diversity
  • DDI (Data Documentation Initiative) - common standard for social, behavioral and economic sciences, including survey data
  • EML (Ecological Metadata Language) - specific for ecology disciplines
  • ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata) - for describing geospatial information
  • FITS (Flexible Image Transport System) - Astronomy digital file standard that includes structured, embedded metadata
  • TEI: Text Encoding Initiative (textual data)
  • CDWA: Categories for the Description of Works of Art (audio/visual data)
  • Cataloging Cultural Objects – Visual Resources Association (Art History and Architecture)
  • PBCore (public broadcasting & media)

This guide was adapted from the Cornell Research Data Management Service Group’s “Metadata and describing data,” licensed under a Creative Commons Attribution 4.0 International License.

 

Additional Information

If you need assistance, please contact Research Data Services at researchdata@rice.edu