Using Metadata

Metadata allows users to annotate a QIIME 2 Result with study-specific values: age, elevation, body site, pH, etc. QIIME 2 offers a consistent API for developers to expose their Methods and Visualizers to user-defined metadata. For more details about how users might create and utilize metadata in their studies, check out the Metadata In QIIME 2 tutorial.

Metadata

Actions may request an entire Metadata object to work on. At its core, Metadata is just a pandas pd.Dataframe, but the Metadata object provides many convenience methods and properties, and unifies the code necessary for handling these data (or metadata). Examples of Actions that consume and operate on Metadata include:

Plugins may work with metadata directly, or they may choose to filter, regroup, partition, pivot, etc. - it all depends on the intended outcome relevant to the method or visualizer in question.

Metadata is subject to framework-level validations, normalization, and verification. We recommend familiarizing yourself with this behavior before utilizing Metadata in your Action. We think having this kind of behavior available via a centralized API helps ensure consistency for all users of Metadata.

def my_viz(output_dir: str, md: qiime2.Metadata) -> None:
    df = md.to_dataframe()
    ...

Metadata Columns

Plugin Actions may also request one or more MetadataColumn to operate on, a good example of this is identifying which column of metadata contains barcodes, when using demux emp-single or cutadapt demux-paired, for example. The exciting aspect of this is that there are no longer hard-coded column-naming requirements, allowing the user to select a naming convention appropriate to their study.

Instances of MetadataColumn exist as one of two concrete classes: NumericMetadataColumn and CategoricalMetadataColumn.

By default, QIIME 2 will attempt to infer the type of each metadata column: if the column consists only of numbers or missing data, the column is inferred to be numeric. Otherwise, if the column contains any non-numeric values, the column is inferred to be categorical. Missing data (i.e. empty cells) are supported in categorical columns as well as numeric columns.

...
numeric_md_cols = metadata.filter(column_type='numeric')
categorical_md_cols = metadata.filter(column_type='categorical')
...

If your Action always needs one type of column or another, you can simply register that type in your plugin registration:

plugin.methods.register_function(
    ...
    parameters={'metadata': MetadataColumn[Numeric]},
    parameter_descriptions={'metadata': 'Numeric metadata column to '
                            'compute pairwise Euclidean distances from'},
    ...

This will ensure that all the necessary type-checking is performed by the framework before these data are passed into the Action utilizing it.

Numeric Metadata Columns

Columns that consist only of numeric (or missing) values are eligible for being instantiated as NumericMetadataColumn (although these values can be loaded as CategoricalMetadataColumn, too).

Categorical Metadata Columns

All types of data columns can be instantiated as CategoricalMetadataColumn - values will be cast to strings.

How can the Metadata API Help Me?

The Metadata API has many interesting features - here are some of the more commonly utlitized elements amongst the core plugins.

Merging Metadata

Interfaces can allow users to specify more than one metadata file at a time, the framework will handle merging the files or objects prior to handing the final merged set to your Action.

Dropping Empty Columns

When working with a single metadata metadata column, plugin code can determine if there are missing values, and then subsequently drop those IDs from the column.

Normalizing TSV Files

By saving a materialized Metadata instance, visualizations that want to provide data exports can do so in a consistent manner (e.g. longitudinal volatility, and the relevant code).

Advanced Filtering

The filter method can be used to restrict column types, drop empty columns, or remove columns made entirely of unique values.

SQL Filtering

Advanced metadata querying is enabled by SQL-based filtering.

Making Artifacts Viewable as Metadata

By registering a transformer from a particular format to qiime2.Metadata, the framework will allow the type represented by that format to be viewed as Metadata — this can open up all kinds of exciting opportunities for plugins!

@plugin.register_transformer
def _1(data: cool_project.InterestingDataFormat) -> qiime2.Metadata:
    df = pd.Dataframe(data)
    return qiime2.Metadata(df)

A visualizer for free!

If your type is viewable as Metadata (as in, the necessary transformers are registered), there is a general-purpose metadata visualization called metadata tabulate, which renders an interactive table of the metadata in question. Cool!