Metadata¶
Metadata is used to annotate supplementary data to outputs of QIIME 2, and provides a convenient interface for QIIME 2 developers to interact with this class of data in a consistent manner.
- class qiime2.metadata.CategoricalMetadataColumn(series)¶
A single metadata column containing categorical data.
See the
Metadata
class docstring for details aboutMetadata
andMetadataColumn
objects, including a description of column types and supported data formats.- property artifacts¶
Artifacts that are the source of the metadata.
This property is read-only.
- Returns
Source artifacts of the metadata.
- Return type
tuple of qiime2.Artifact
- drop_missing_values()¶
Filter out missing values from the metadata column.
- Returns
Metadata column with missing values removed.
- Return type
See also
- filter_ids(ids_to_keep)¶
Filter metadata column by IDs.
- Parameters
ids_to_keep (iterable of str) – IDs that should be retained in the filtered
MetadataColumn
object. If any IDs in ids_to_keep are not contained in thisMetadataColumn
object, aValueError
will be raised. The filteredMetadataColumn
object will retain the same relative ordering of IDs in thisMetadataColumn
object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filteredMetadataColumn
object.- Returns
The metadata column filtered by IDs.
- Return type
See also
- get_ids(where_values_missing=False)¶
Retrieve IDs matching search criteria.
- Parameters
where_values_missing (bool, optional) – If
True
, only return IDs that are associated with missing values (np.nan
). IfFalse
(the default), return all IDs in the metadata column.- Returns
IDs matching search criteria.
- Return type
set
See also
- get_value(id)¶
Retrieve metadata column value associated with an ID.
- Parameters
id (str) – ID corresponding to the metadata column value to retrieve.
- Returns
Value associated with the provided id.
- Return type
object
- has_missing_values()¶
Determine if the metadata column has one or more missing values.
- Returns
True
if the metadata column has one or more missing values (np.nan
),False
otherwise.- Return type
bool
See also
- property id_count¶
Number of metadata IDs.
This property is read-only.
- Returns
Number of metadata IDs.
- Return type
int
- property id_header¶
Name identifying the IDs associated with the metadata.
This property is read-only.
- Returns
Name of IDs associated with the metadata.
- Return type
str
- property ids¶
IDs associated with the metadata.
This property is read-only.
- Returns
Metadata IDs.
- Return type
tuple of str
- property name¶
Metadata column name.
This property is read-only.
- Returns
Metadata column name.
- Return type
str
- save(filepath, ext=None)¶
Save a TSV metadata file.
The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.
The file will always include the
#q2:types
directive in order to make the file roundtrippable without relying on column type inference.- Parameters
filepath (str) – Path to save TSV metadata file at.
ext (str) – Preferred file extension (.tsv, .txt, etc). Will be left blank if no extension is included. Including a period in the extension is optional, and any additional periods delimiting the filepath and the extension will be reduced to a single period.
- Returns
Filepath and extension (if provided) that the file was saved to.
- Return type
str
See also
- to_dataframe()¶
Create a pandas dataframe from the metadata column.
The dataframe will contain exactly one column. The dataframe’s index name (
Index.name
) will match this metadata column’sid_header
, and the index will contain this metadata column’s IDs. The dataframe’s column name will match this metadata column’s name.- Returns
Dataframe constructed from the metadata column.
- Return type
pandas.DataFrame
See also
- to_series()¶
Create a pandas series from the metadata column.
The series index name (
Index.name
) will match this metadata column’sid_header
, and the index will contain this metadata column’s IDs. The series name will match this metadata column’s name.- Returns
Series constructed from the metadata column.
- Return type
pandas.Series
See also
- class qiime2.metadata.Metadata(dataframe)¶
Store metadata associated with identifiers in a study.
Metadata is tabular in nature, mapping study identifiers (e.g. sample or feature IDs) to columns of metadata associated with each ID.
For more details about metadata in QIIME 2, including the TSV metadata file format, see the Metadata Tutorial at https://docs.qiime2.org.
The following text focuses on design and considerations when working with
Metadata
objects at the API level.A
Metadata
object is composed of zero or moreMetadataColumn
objects. AMetadata
object always contains at least one ID, regardless of the number of columns. Each column in theMetadata
object has an associated column type representing either categorical or numeric data. Each metadata column is represented by an object corresponding to the column’s type:CategoricalMetadataColumn
orNumericMetadataColumn
, respectively.A
Metadata
object is closely linked to its corresponding TSV metadata file format described at https://docs.qiime2.org. Therefore, certain requirements present in the file format are also enforced on the in-memory object in order to make serializedMetadata
objects roundtrippable when loaded from disk again. For example, IDs cannot begin with a pound character (#
) because those IDs would be interpreted as comment rows when written to disk as TSV. See the metadata file format spec for more details about data formatting requirements.In addition to being loaded from or saved to disk, a
Metadata
object can be constructed from apandas.DataFrame
object. See the Parameters section below for details on how to constructMetadata
objects from dataframes.Metadata
objects have various methods to access, filter, and merge data. A dataframe can be retrieved from theMetadata
object for further data manipulation using the pandas API. IndividualMetadataColumn
objects can be retrieved to gain access to APIs applicable to a single metadata column.- Parameters
dataframe (pandas.DataFrame) – Dataframe containing metadata. The dataframe’s index defines the IDs, and the index name (
Index.name
) must match one of the required ID headers described in the metadata file format spec. Each column in the dataframe defines a metadata column, and the metadata column’s type (i.e. categorical or numeric) is determined based on the column’s dtype. If a column hasdtype=object
, it may contain strings or pandas missing values (e.g.np.nan
,None
). Columns matching this requirement are assumed to be categorical. If a column in the dataframe hasdtype=float
ordtype=int
, it may contain floating point numbers or integers, as well as pandas missing values (e.g.np.nan
). Columns matching this requirement are assumed to be numeric. Regardless of column type (categorical vs numeric), the dataframe stored within theMetadata
object will have any missing values normalized tonp.nan
. Columns withdtype=int
will be cast todtype=float
. To obtain a dataframe from theMetadata
object containing these normalized data types and values, useMetadata.to_dataframe()
.
- property artifacts¶
Artifacts that are the source of the metadata.
This property is read-only.
- Returns
Source artifacts of the metadata.
- Return type
tuple of qiime2.Artifact
- property column_count¶
Number of metadata columns.
This property is read-only.
- Returns
Number of metadata columns.
- Return type
int
Notes
Zero metadata columns are allowed.
See also
- property columns¶
Ordered mapping of column names to ColumnProperties.
The mapping that is returned is read-only. This property is also read-only.
- Returns
Ordered mapping of column names to ColumnProperties.
- Return type
types.MappingProxyType
- filter_columns(*, column_type=None, drop_all_unique=False, drop_zero_variance=False, drop_all_missing=False)¶
Filter metadata by columns.
- Parameters
column_type (str, optional) – If supplied, will retain only columns of this type. The currently supported column types are ‘numeric’ and ‘categorical’.
drop_all_unique (bool, optional) – If
True
, columns that contain a unique value for every ID will be dropped. Missing data (np.nan
) are ignored when determining unique values. If a column consists solely of missing data, it will be dropped.drop_zero_variance (bool, optional) – If
True
, columns that contain the same value for every ID will be dropped. Missing data (np.nan
) are ignored when determining variance. If a column consists solely of missing data, it will be dropped.drop_all_missing (bool, optional) – If
True
, columns that have a missing value (np.nan
) for every ID will be dropped.
- Returns
The metadata filtered by columns.
- Return type
See also
- filter_ids(ids_to_keep)¶
Filter metadata by IDs.
- Parameters
ids_to_keep (iterable of str) – IDs that should be retained in the filtered
Metadata
object. If any IDs in ids_to_keep are not contained in thisMetadata
object, aValueError
will be raised. The filteredMetadata
object will retain the same relative ordering of IDs in thisMetadata
object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filteredMetadata
object.- Returns
The metadata filtered by IDs.
- Return type
See also
- get_column(name)¶
Retrieve metadata column based on column name.
- Parameters
name (str) – Name of the metadata column to retrieve.
- Returns
Requested metadata column (
CategoricalMetadataColumn
orNumericMetadataColumn
).- Return type
See also
- get_ids(where=None)¶
Retrieve IDs matching search criteria.
- Parameters
where (str, optional) – SQLite WHERE clause specifying criteria IDs must meet to be included in the results. All IDs are included by default.
- Returns
IDs matching search criteria specified in where.
- Return type
set
See also
Notes
The ID header (
Metadata.id_header
) may be used in the where clause to query the table’s ID column.
- property id_count¶
Number of metadata IDs.
This property is read-only.
- Returns
Number of metadata IDs.
- Return type
int
- property id_header¶
Name identifying the IDs associated with the metadata.
This property is read-only.
- Returns
Name of IDs associated with the metadata.
- Return type
str
- property ids¶
IDs associated with the metadata.
This property is read-only.
- Returns
Metadata IDs.
- Return type
tuple of str
- classmethod load(filepath, column_types=None)¶
Load a TSV metadata file.
The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.
- Parameters
filepath (str) – Path to TSV metadata file to be loaded.
column_types (dict, optional) – Override metadata column types specified or inferred in the file. This is a dict mapping column names (str) to column types (str). Valid column types are ‘categorical’ and ‘numeric’. Column names may be omitted from this dict to use the column types read from the file.
- Returns
Metadata object loaded from filepath.
- Return type
- Raises
MetadataFileError – If the metadata file is invalid in any way (e.g. doesn’t meet the file format’s requirements).
See also
- merge(*others)¶
Merge this
Metadata
object with otherMetadata
objects.Returns a new
Metadata
object containing the merged contents of thisMetadata
object and others. The merge is not in-place and will always return a new mergedMetadata
object.The merge will include only those IDs that are shared across all
Metadata
objects being merged (i.e. the merge is an inner join).Each metadata column being merged must have a unique name; merging metadata with overlapping column names will result in an error.
- Parameters
others (tuple) – One or more
Metadata
objects to merge with thisMetadata
object.- Returns
New object containing merged metadata. The merged IDs will be in the same relative order as the IDs in this
Metadata
object after performing the inner join. The merged column order will match the column order ofMetadata
objects being merged from left to right.- Return type
- Raises
ValueError – If zero
Metadata
objects are provided in others (there is nothing to merge in this case).
Notes
The merged
Metadata
object will always have itsid_header
property set to'id'
, regardless of theid_header
values on theMetadata
objects being merged.The merged
Metadata
object tracks all source artifacts that it was built from to preserve provenance (i.e. the.artifacts
property on allMetadata
objects is merged).
- save(filepath, ext=None)¶
Save a TSV metadata file.
The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.
The file will always include the
#q2:types
directive in order to make the file roundtrippable without relying on column type inference.- Parameters
filepath (str) – Path to save TSV metadata file at.
ext (str) – Preferred file extension (.tsv, .txt, etc). Will be left blank if no extension is included. Including a period in the extension is optional, and any additional periods delimiting the filepath and the extension will be reduced to a single period.
- Returns
Filepath and extension (if provided) that the file was saved to.
- Return type
str
See also
- to_dataframe()¶
Create a pandas dataframe from the metadata.
The dataframe’s index name (
Index.name
) will match this metadata object’sid_header
, and the index will contain this metadata object’s IDs. The dataframe’s column names will match the column names in this metadata. Categorical columns will be stored asdtype=object
(containing strings), and numeric columns will be stored asdtype=float
.- Returns
Dataframe constructed from the metadata.
- Return type
pandas.DataFrame
- class qiime2.metadata.MetadataColumn(series)¶
Abstract base class representing a single metadata column.
Concrete subclasses represent specific metadata column types, e.g.
CategoricalMetadataColumn
andNumericMetadataColumn
.See the
Metadata
class docstring for details aboutMetadata
andMetadataColumn
objects, including a description of column types.The main difference in constructing
MetadataColumn
vsMetadata
objects is thatMetadataColumn
objects are constructed from apandas.Series
object instead of apandas.DataFrame
. Otherwise, the same restrictions, considerations, and data normalization are applied as withMetadata
objects.- property artifacts¶
Artifacts that are the source of the metadata.
This property is read-only.
- Returns
Source artifacts of the metadata.
- Return type
tuple of qiime2.Artifact
- drop_missing_values()¶
Filter out missing values from the metadata column.
- Returns
Metadata column with missing values removed.
- Return type
See also
- filter_ids(ids_to_keep)¶
Filter metadata column by IDs.
- Parameters
ids_to_keep (iterable of str) – IDs that should be retained in the filtered
MetadataColumn
object. If any IDs in ids_to_keep are not contained in thisMetadataColumn
object, aValueError
will be raised. The filteredMetadataColumn
object will retain the same relative ordering of IDs in thisMetadataColumn
object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filteredMetadataColumn
object.- Returns
The metadata column filtered by IDs.
- Return type
See also
- get_ids(where_values_missing=False)¶
Retrieve IDs matching search criteria.
- Parameters
where_values_missing (bool, optional) – If
True
, only return IDs that are associated with missing values (np.nan
). IfFalse
(the default), return all IDs in the metadata column.- Returns
IDs matching search criteria.
- Return type
set
See also
- get_value(id)¶
Retrieve metadata column value associated with an ID.
- Parameters
id (str) – ID corresponding to the metadata column value to retrieve.
- Returns
Value associated with the provided id.
- Return type
object
- has_missing_values()¶
Determine if the metadata column has one or more missing values.
- Returns
True
if the metadata column has one or more missing values (np.nan
),False
otherwise.- Return type
bool
See also
- property id_count¶
Number of metadata IDs.
This property is read-only.
- Returns
Number of metadata IDs.
- Return type
int
- property id_header¶
Name identifying the IDs associated with the metadata.
This property is read-only.
- Returns
Name of IDs associated with the metadata.
- Return type
str
- property ids¶
IDs associated with the metadata.
This property is read-only.
- Returns
Metadata IDs.
- Return type
tuple of str
- property name¶
Metadata column name.
This property is read-only.
- Returns
Metadata column name.
- Return type
str
- save(filepath, ext=None)¶
Save a TSV metadata file.
The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.
The file will always include the
#q2:types
directive in order to make the file roundtrippable without relying on column type inference.- Parameters
filepath (str) – Path to save TSV metadata file at.
ext (str) – Preferred file extension (.tsv, .txt, etc). Will be left blank if no extension is included. Including a period in the extension is optional, and any additional periods delimiting the filepath and the extension will be reduced to a single period.
- Returns
Filepath and extension (if provided) that the file was saved to.
- Return type
str
See also
- to_dataframe()¶
Create a pandas dataframe from the metadata column.
The dataframe will contain exactly one column. The dataframe’s index name (
Index.name
) will match this metadata column’sid_header
, and the index will contain this metadata column’s IDs. The dataframe’s column name will match this metadata column’s name.- Returns
Dataframe constructed from the metadata column.
- Return type
pandas.DataFrame
See also
- to_series()¶
Create a pandas series from the metadata column.
The series index name (
Index.name
) will match this metadata column’sid_header
, and the index will contain this metadata column’s IDs. The series name will match this metadata column’s name.- Returns
Series constructed from the metadata column.
- Return type
pandas.Series
See also
- exception qiime2.metadata.MetadataFileError(message, include_suffix=True)¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class qiime2.metadata.NumericMetadataColumn(series)¶
A single metadata column containing numeric data.
See the
Metadata
class docstring for details aboutMetadata
andMetadataColumn
objects, including a description of column types and supported data formats.- property artifacts¶
Artifacts that are the source of the metadata.
This property is read-only.
- Returns
Source artifacts of the metadata.
- Return type
tuple of qiime2.Artifact
- drop_missing_values()¶
Filter out missing values from the metadata column.
- Returns
Metadata column with missing values removed.
- Return type
See also
- filter_ids(ids_to_keep)¶
Filter metadata column by IDs.
- Parameters
ids_to_keep (iterable of str) – IDs that should be retained in the filtered
MetadataColumn
object. If any IDs in ids_to_keep are not contained in thisMetadataColumn
object, aValueError
will be raised. The filteredMetadataColumn
object will retain the same relative ordering of IDs in thisMetadataColumn
object. Thus, the ordering of IDs in ids_to_keep does not determine the ordering of IDs in the filteredMetadataColumn
object.- Returns
The metadata column filtered by IDs.
- Return type
See also
- get_ids(where_values_missing=False)¶
Retrieve IDs matching search criteria.
- Parameters
where_values_missing (bool, optional) – If
True
, only return IDs that are associated with missing values (np.nan
). IfFalse
(the default), return all IDs in the metadata column.- Returns
IDs matching search criteria.
- Return type
set
See also
- get_value(id)¶
Retrieve metadata column value associated with an ID.
- Parameters
id (str) – ID corresponding to the metadata column value to retrieve.
- Returns
Value associated with the provided id.
- Return type
object
- has_missing_values()¶
Determine if the metadata column has one or more missing values.
- Returns
True
if the metadata column has one or more missing values (np.nan
),False
otherwise.- Return type
bool
See also
- property id_count¶
Number of metadata IDs.
This property is read-only.
- Returns
Number of metadata IDs.
- Return type
int
- property id_header¶
Name identifying the IDs associated with the metadata.
This property is read-only.
- Returns
Name of IDs associated with the metadata.
- Return type
str
- property ids¶
IDs associated with the metadata.
This property is read-only.
- Returns
Metadata IDs.
- Return type
tuple of str
- property name¶
Metadata column name.
This property is read-only.
- Returns
Metadata column name.
- Return type
str
- save(filepath, ext=None)¶
Save a TSV metadata file.
The TSV metadata file format is described at https://docs.qiime2.org in the Metadata Tutorial.
The file will always include the
#q2:types
directive in order to make the file roundtrippable without relying on column type inference.- Parameters
filepath (str) – Path to save TSV metadata file at.
ext (str) – Preferred file extension (.tsv, .txt, etc). Will be left blank if no extension is included. Including a period in the extension is optional, and any additional periods delimiting the filepath and the extension will be reduced to a single period.
- Returns
Filepath and extension (if provided) that the file was saved to.
- Return type
str
See also
- to_dataframe()¶
Create a pandas dataframe from the metadata column.
The dataframe will contain exactly one column. The dataframe’s index name (
Index.name
) will match this metadata column’sid_header
, and the index will contain this metadata column’s IDs. The dataframe’s column name will match this metadata column’s name.- Returns
Dataframe constructed from the metadata column.
- Return type
pandas.DataFrame
See also
- to_series()¶
Create a pandas series from the metadata column.
The series index name (
Index.name
) will match this metadata column’sid_header
, and the index will contain this metadata column’s IDs. The series name will match this metadata column’s name.- Returns
Series constructed from the metadata column.
- Return type
pandas.Series
See also