Introduction to Metadata


Reproducible Research Data and Project Management in R

Dr Anna Krystallli

R-RSE


https://acce-rrresearch.netlify.app/

You got data. Is it enough?

#otherpeoplesdata dream match!

Thought experiment:

Imagine a dream open data set, how would you locate it?
  • what details would you need to know to determine relevance?
  • what information would you need to know to use it?

metadata = data about data

Metadata

“Information that describes, explains, locates, or in some way makes it easier to find, access, and use a resource (in this case, data).”

Metadata

Backbone of digital curation

Without it, a digital resource may be irretrievable, unidentifiable or unusable

Metadata Types

Descriptive

  • enables identification, location and retrieval of data, often includes use of controlled vocabularies for classification and indexing.

Technical

  • describes the technical processes used to produce, or required to use a digital data object.

Administrative

  • used to manage administrative aspects of the digital object e.g. intellectual property rights and acquisition.

Elements of metadata

  • Structured data files:

    • readable by machines and humans, accessible through the web
  • Controlled vocabularies eg. NERC Vocabulary server

    • allows for connectivity of data


KEY TO SEARCH FUNCTION

  • By structuring & adhering to controlled vocabularies, data can be combined, accessed and searched!

  • Different communities develop different standards which define both the structure and content of metadata

Metadata in research

Identifying the right metadata standard

Seek help from support teams

Most university libraries have assistants dedicated to Research Data Management:

@tomjwebb @ScientificData Talk to their librarian for data management strategies #datainfolit

— Yasmeen Shorish (@yasmeen_azadi) January 16, 2015

Key metadata

The bare minimum

Document data coverage information

  • taxonomic coverage: a table containing taxonomic information on species in data.
    • also record authority / source
  • temporal coverage: temporal range and resolution details
  • spatial coverage:
    • a human readable geographic description of the study area
    • spatial range and resolution details
    • include depth (marine/freshwater) or altitudinal (terrestrial) information

Make sure to record units!

Methods metadata

Document protocols in a methods document

Keep a dynamic document used to plan, record and write up methods.

Any additional information other users would need to combine your data with theirs? Record it

Practical metadata

ACCE DTP RDM course


Teaching this course has always felt challenging in terms of practical exercises

  • Defining Metadata & explaining importance:
  • Advising on domain specific Controlled Vocabularies & structure

  • How can we practice creating metadata?

rOpenSci Unconf 18

May 21 - 22, 2018. Seattle

rOpenSci Unconf mission

bringing together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a few days and hack on various projects.


Ideas for projects submitted through GitHub issues in the runconf18 repo

issue #72 🙋‍♀️

Metadata team!

Luckily, a whole bunch of other awesome folks were also thinking about these topics and interested in working on them! 🤩

(in alphabetical order):

pkg dataspice

Package dataspice makes it easier for researchers to create basic, lightweight and concise metadata files for their datasets.


  • Metadata collected in csv files
  • Metadata fields are based on schema.org
    • underlies Google Datasets metadata specification
  • Helper functions and shinyapps to extract and edit metadata files.
  • Ability to produce:
    • structured json-ld metadata file.
    • a helpful dataset README webpage.

Google unveils search engine for open data

The tool, called Google Dataset Search, should help researchers to find the data they need more easily.

Nature NEWS - 05 SEPTEMBER 2018


https://toolbox.google.com/datasetsearch

dataspice tutorial


The goal of this section is to provide a practical exercise in creating metadata for an example field collected data product using package dataspice.

  • Understand basic metadata and why it is important
  • Understand where and how to store them
  • Understand how they can feed into more complex metadata objects.

dataspice workflow

Practical

time for some live coding 😱

head to the dataspice tutorial