Metadata

Metadata is used to annotate supplementary data to outputs of QIIME 2, and provides a convenient interface for QIIME 2 developers to interact with this class of data in a consistent manner.


class qiime2.metadata.Metadata(dataframe)

Store metadata associated with identifiers in a study.

Metadata is tabular in nature, mapping study identifiers (e.g. sample or feature IDs) to columns of metadata associated with each ID.

For more details about metadata in QIIME 2, including the TSV metadata file format, see the Metadata Tutorial at https://docs.qiime2.org.

The following text focuses on design and considerations when working with Metadata objects at the API level.

A Metadata object is composed of zero or more MetadataColumn objects. A Metadata object always contains at least one ID, regardless of the number of columns. Each column in the Metadata object has an associated column type representing either categorical or numeric data. Each metadata column is represented by an object corresponding to the column’s type: CategoricalMetadataColumn or NumericMetadataColumn, respectively.

A Metadata object is closely linked to its corresponding TSV metadata file format described at https://docs.qiime2.org. Therefore, certain requirements present in the file format are also enforced on the in-memory object in order to make serialized Metadata objects roundtrippable when loaded from disk again. For example, IDs cannot begin with a pound character (#) because those IDs would be interpreted as comment rows when written to disk as TSV. See the metadata file format spec for more details about data formatting requirements.

In addition to being loaded from or saved to disk, a Metadata object can be constructed from a pandas.DataFrame object. See the Parameters section below for details on how to construct Metadata objects from dataframes.

Metadata objects have various methods to access, filter, and merge data. A dataframe can be retrieved from the Metadata object for further data manipulation using the pandas API. Individual MetadataColumn objects can be retrieved to gain access to APIs applicable to a single metadata column.

Parameters

dataframe (pandas.DataFrame) – Dataframe containing metadata. The dataframe’s index defines the IDs, and the index name (Index.name) must match one of the required ID headers described in the metadata file format spec. Each column in the dataframe defines a metadata column, and the metadata column’s type (i.e. categorical or numeric) is determined based on the column’s dtype. If a column has dtype=object, it may contain strings or pandas missing values (e.g. np.nan, None). Columns matching this requirement are assumed to be categorical. If a column in the dataframe has dtype=float or dtype=int, it may contain floating point numbers or integers, as well as pandas missing values (e.g. np.nan). Columns matching this requirement are assumed to be numeric. Regardless of column type (categorical vs numeric), the dataframe stored within the Metadata object will have any missing values normalized to np.nan. Columns with dtype=int will be cast to dtype=float. To obtain a dataframe from the Metadata object containing these normalized data types and values, use Metadata.to_dataframe().

property artifacts

Artifacts that are the source of the metadata.

This property is read-only.

Returns

Source artifacts of the metadata.

Return type

tuple of qiime2.Artifact

property column_count

Number of metadata columns.

This property is read-only.

Returns

Number of metadata columns.

Return type

int

Notes

Zero metadata columns are allowed.

See also

id_count

property columns

Ordered mapping of column names to ColumnProperties.

The mapping that is returned is read-only. This property is also read-only.

Returns

Ordered mapping of column names to ColumnProperties.

Return type

types.MappingProxyType

filter_columns(*, column_type=None, drop_all_unique=False, drop_zero_variance=False, drop_all_missing=False)

Filter metadata by columns.

Parameters
  • column_type (str, optional) – If supplied, will retain only columns of this type. The currently supported column types are ‘numeric’ and ‘categorical’.

  • drop_all_unique (bool, optional) – If True, columns that contain a unique value for every ID will be dropped. Missing data (np.nan) are ignored when determining unique values. If a column consists solely of missing data, it will be dropped.

  • drop_zero_variance (bool, optional) – If True, columns that contain the same value for every ID will be dropped. Missing data (np.nan) are ignored when determining variance. If a column consists solely of missing data, it will be dropped.

  • drop_all_missing (bool, optional) – If True, columns that have a missing value (np.nan) for every ID will be dropped.

Returns

The metadata filtered by columns.

Return type

Metadata

See also

filter_ids()

filter_ids(ids_to_keep)

Filter metadata by IDs.

Parameters

ids_to_keep (iterable of str) – IDs that should be retained in the filtered Metadata object. If any IDs in ids_to_keep are not contained in this Metadata object, a ValueError will be raised. The filtered Metadata object will retain the same relative ordering of IDs in this Metadata object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filtered Metadata object.

Returns

The metadata filtered by IDs.

Return type

Metadata

get_column(name)

Retrieve metadata column based on column name.

Parameters

name (str) – Name of the metadata column to retrieve.

Returns

Requested metadata column (CategoricalMetadataColumn or NumericMetadataColumn).

Return type

MetadataColumn

See also

get_ids()

get_ids(where=None)

Retrieve IDs matching search criteria.

Parameters

where (str, optional) – SQLite WHERE clause specifying criteria IDs must meet to be included in the results. All IDs are included by default.

Returns

IDs matching search criteria specified in where.

Return type

set

Notes

The ID header (Metadata.id_header) may be used in the where clause to query the table’s ID column.

property id_count

Number of metadata IDs.

This property is read-only.

Returns

Number of metadata IDs.

Return type

int

property id_header

Name identifying the IDs associated with the metadata.

This property is read-only.

Returns

Name of IDs associated with the metadata.

Return type

str

property ids

IDs associated with the metadata.

This property is read-only.

Returns

Metadata IDs.

Return type

tuple of str

classmethod load(filepath, column_types=None)

Load a TSV metadata file.

The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.

Parameters
  • filepath (str) – Path to TSV metadata file to be loaded.

  • column_types (dict, optional) – Override metadata column types specified or inferred in the file. This is a dict mapping column names (str) to column types (str). Valid column types are ‘categorical’ and ‘numeric’. Column names may be omitted from this dict to use the column types read from the file.

Returns

Metadata object loaded from filepath.

Return type

Metadata

Raises

MetadataFileError – If the metadata file is invalid in any way (e.g. doesn’t meet the file format’s requirements).

See also

save()

merge(*others)

Merge this Metadata object with other Metadata objects.

Returns a new Metadata object containing the merged contents of this Metadata object and others. The merge is not in-place and will always return a new merged Metadata object.

The merge will include only those IDs that are shared across all Metadata objects being merged (i.e. the merge is an inner join).

Each metadata column being merged must have a unique name; merging metadata with overlapping column names will result in an error.

Parameters

others (tuple) – One or more Metadata objects to merge with this Metadata object.

Returns

New object containing merged metadata. The merged IDs will be in the same relative order as the IDs in this Metadata object after performing the inner join. The merged column order will match the column order of Metadata objects being merged from left to right.

Return type

Metadata

Raises

ValueError – If zero Metadata objects are provided in others (there is nothing to merge in this case).

Notes

The merged Metadata object will always have its id_header property set to 'id', regardless of the id_header values on the Metadata objects being merged.

The merged Metadata object tracks all source artifacts that it was built from to preserve provenance (i.e. the .artifacts property on all Metadata objects is merged).

save(filepath)

Save a TSV metadata file.

The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.

The file will always include the #q2:types directive in order to make the file roundtrippable without relying on column type inference.

Parameters

filepath (str) – Path to save TSV metadata file at.

See also

load()

to_dataframe()

Create a pandas dataframe from the metadata.

The dataframe’s index name (Index.name) will match this metadata object’s id_header, and the index will contain this metadata object’s IDs. The dataframe’s column names will match the column names in this metadata. Categorical columns will be stored as dtype=object (containing strings), and numeric columns will be stored as dtype=float.

Returns

Dataframe constructed from the metadata.

Return type

pandas.DataFrame

class qiime2.metadata.MetadataColumn(series)

Abstract base class representing a single metadata column.

Concrete subclasses represent specific metadata column types, e.g. CategoricalMetadataColumn and NumericMetadataColumn.

See the Metadata class docstring for details about Metadata and MetadataColumn objects, including a description of column types.

The main difference in constructing MetadataColumn vs Metadata objects is that MetadataColumn objects are constructed from a pandas.Series object instead of a pandas.DataFrame. Otherwise, the same restrictions, considerations, and data normalization are applied as with Metadata objects.

property artifacts

Artifacts that are the source of the metadata.

This property is read-only.

Returns

Source artifacts of the metadata.

Return type

tuple of qiime2.Artifact

drop_missing_values()

Filter out missing values from the metadata column.

Returns

Metadata column with missing values removed.

Return type

MetadataColumn

filter_ids(ids_to_keep)

Filter metadata column by IDs.

Parameters

ids_to_keep (iterable of str) – IDs that should be retained in the filtered MetadataColumn object. If any IDs in ids_to_keep are not contained in this MetadataColumn object, a ValueError will be raised. The filtered MetadataColumn object will retain the same relative ordering of IDs in this MetadataColumn object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filtered MetadataColumn object.

Returns

The metadata column filtered by IDs.

Return type

MetadataColumn

See also

get_ids()

get_ids(where_values_missing=False)

Retrieve IDs matching search criteria.

Parameters

where_values_missing (bool, optional) – If True, only return IDs that are associated with missing values (np.nan). If False (the default), return all IDs in the metadata column.

Returns

IDs matching search criteria.

Return type

set

get_value(id)

Retrieve metadata column value associated with an ID.

Parameters

id (str) – ID corresponding to the metadata column value to retrieve.

Returns

Value associated with the provided id.

Return type

object

has_missing_values()

Determine if the metadata column has one or more missing values.

Returns

True if the metadata column has one or more missing values (np.nan), False otherwise.

Return type

bool

property id_count

Number of metadata IDs.

This property is read-only.

Returns

Number of metadata IDs.

Return type

int

property id_header

Name identifying the IDs associated with the metadata.

This property is read-only.

Returns

Name of IDs associated with the metadata.

Return type

str

property ids

IDs associated with the metadata.

This property is read-only.

Returns

Metadata IDs.

Return type

tuple of str

property name

Metadata column name.

This property is read-only.

Returns

Metadata column name.

Return type

str

save(filepath)

Save a TSV metadata file containing this metadata column.

The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.

The file will always include the #q2:types directive in order to make the file roundtrippable without relying on column type inference.

Parameters

filepath (str) – Path to save TSV metadata file at.

to_dataframe()

Create a pandas dataframe from the metadata column.

The dataframe will contain exactly one column. The dataframe’s index name (Index.name) will match this metadata column’s id_header, and the index will contain this metadata column’s IDs. The dataframe’s column name will match this metadata column’s name.

Returns

Dataframe constructed from the metadata column.

Return type

pandas.DataFrame

See also

to_series()

to_series()

Create a pandas series from the metadata column.

The series index name (Index.name) will match this metadata column’s id_header, and the index will contain this metadata column’s IDs. The series name will match this metadata column’s name.

Returns

Series constructed from the metadata column.

Return type

pandas.Series

See also

to_dataframe()

class qiime2.metadata.NumericMetadataColumn(series)

A single metadata column containing numeric data.

See the Metadata class docstring for details about Metadata and MetadataColumn objects, including a description of column types and supported data formats.

property artifacts

Artifacts that are the source of the metadata.

This property is read-only.

Returns

Source artifacts of the metadata.

Return type

tuple of qiime2.Artifact

drop_missing_values()

Filter out missing values from the metadata column.

Returns

Metadata column with missing values removed.

Return type

MetadataColumn

filter_ids(ids_to_keep)

Filter metadata column by IDs.

Parameters

ids_to_keep (iterable of str) – IDs that should be retained in the filtered MetadataColumn object. If any IDs in ids_to_keep are not contained in this MetadataColumn object, a ValueError will be raised. The filtered MetadataColumn object will retain the same relative ordering of IDs in this MetadataColumn object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filtered MetadataColumn object.

Returns

The metadata column filtered by IDs.

Return type

MetadataColumn

See also

get_ids()

get_ids(where_values_missing=False)

Retrieve IDs matching search criteria.

Parameters

where_values_missing (bool, optional) – If True, only return IDs that are associated with missing values (np.nan). If False (the default), return all IDs in the metadata column.

Returns

IDs matching search criteria.

Return type

set

get_value(id)

Retrieve metadata column value associated with an ID.

Parameters

id (str) – ID corresponding to the metadata column value to retrieve.

Returns

Value associated with the provided id.

Return type

object

has_missing_values()

Determine if the metadata column has one or more missing values.

Returns

True if the metadata column has one or more missing values (np.nan), False otherwise.

Return type

bool

property id_count

Number of metadata IDs.

This property is read-only.

Returns

Number of metadata IDs.

Return type

int

property id_header

Name identifying the IDs associated with the metadata.

This property is read-only.

Returns

Name of IDs associated with the metadata.

Return type

str

property ids

IDs associated with the metadata.

This property is read-only.

Returns

Metadata IDs.

Return type

tuple of str

property name

Metadata column name.

This property is read-only.

Returns

Metadata column name.

Return type

str

save(filepath)

Save a TSV metadata file containing this metadata column.

The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.

The file will always include the #q2:types directive in order to make the file roundtrippable without relying on column type inference.

Parameters

filepath (str) – Path to save TSV metadata file at.

to_dataframe()

Create a pandas dataframe from the metadata column.

The dataframe will contain exactly one column. The dataframe’s index name (Index.name) will match this metadata column’s id_header, and the index will contain this metadata column’s IDs. The dataframe’s column name will match this metadata column’s name.

Returns

Dataframe constructed from the metadata column.

Return type

pandas.DataFrame

See also

to_series()

to_series()

Create a pandas series from the metadata column.

The series index name (Index.name) will match this metadata column’s id_header, and the index will contain this metadata column’s IDs. The series name will match this metadata column’s name.

Returns

Series constructed from the metadata column.

Return type

pandas.Series

See also

to_dataframe()

class qiime2.metadata.CategoricalMetadataColumn(series)

A single metadata column containing categorical data.

See the Metadata class docstring for details about Metadata and MetadataColumn objects, including a description of column types and supported data formats.

property artifacts

Artifacts that are the source of the metadata.

This property is read-only.

Returns

Source artifacts of the metadata.

Return type

tuple of qiime2.Artifact

drop_missing_values()

Filter out missing values from the metadata column.

Returns

Metadata column with missing values removed.

Return type

MetadataColumn

filter_ids(ids_to_keep)

Filter metadata column by IDs.

Parameters

ids_to_keep (iterable of str) – IDs that should be retained in the filtered MetadataColumn object. If any IDs in ids_to_keep are not contained in this MetadataColumn object, a ValueError will be raised. The filtered MetadataColumn object will retain the same relative ordering of IDs in this MetadataColumn object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filtered MetadataColumn object.

Returns

The metadata column filtered by IDs.

Return type

MetadataColumn

See also

get_ids()

get_ids(where_values_missing=False)

Retrieve IDs matching search criteria.

Parameters

where_values_missing (bool, optional) – If True, only return IDs that are associated with missing values (np.nan). If False (the default), return all IDs in the metadata column.

Returns

IDs matching search criteria.

Return type

set

get_value(id)

Retrieve metadata column value associated with an ID.

Parameters

id (str) – ID corresponding to the metadata column value to retrieve.

Returns

Value associated with the provided id.

Return type

object

has_missing_values()

Determine if the metadata column has one or more missing values.

Returns

True if the metadata column has one or more missing values (np.nan), False otherwise.

Return type

bool

property id_count

Number of metadata IDs.

This property is read-only.

Returns

Number of metadata IDs.

Return type

int

property id_header

Name identifying the IDs associated with the metadata.

This property is read-only.

Returns

Name of IDs associated with the metadata.

Return type

str

property ids

IDs associated with the metadata.

This property is read-only.

Returns

Metadata IDs.

Return type

tuple of str

property name

Metadata column name.

This property is read-only.

Returns

Metadata column name.

Return type

str

save(filepath)

Save a TSV metadata file containing this metadata column.

The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.

The file will always include the #q2:types directive in order to make the file roundtrippable without relying on column type inference.

Parameters

filepath (str) – Path to save TSV metadata file at.

to_dataframe()

Create a pandas dataframe from the metadata column.

The dataframe will contain exactly one column. The dataframe’s index name (Index.name) will match this metadata column’s id_header, and the index will contain this metadata column’s IDs. The dataframe’s column name will match this metadata column’s name.

Returns

Dataframe constructed from the metadata column.

Return type

pandas.DataFrame

See also

to_series()

to_series()

Create a pandas series from the metadata column.

The series index name (Index.name) will match this metadata column’s id_header, and the index will contain this metadata column’s IDs. The series name will match this metadata column’s name.

Returns

Series constructed from the metadata column.

Return type

pandas.Series

See also

to_dataframe()

exception qiime2.metadata.MetadataFileError(message, include_suffix=True)
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.