Metabase
Important Capabilities
Capability | Status | Notes |
---|---|---|
Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
Platform Instance | ✅ | Enabled by default. |
Table-Level Lineage | ✅ | Supported by default. |
This plugin extracts Charts, dashboards, and associated metadata. This plugin is in beta and has only been tested on PostgreSQL and H2 database.
Collection
/api/collection endpoint is used to retrieve the available collections.
/api/collection/<COLLECTION_ID>/items?models=dashboard endpoint is used to retrieve a given collection and list their dashboards.
Dashboard
/api/dashboard/<DASHBOARD_ID> endpoint is used to retrieve a given Dashboard and grab its information.
- Title and description
- Last edited by
- Owner
- Link to the dashboard in Metabase
- Associated charts
Chart
/api/card endpoint is used to retrieve the following information.
- Title and description
- Last edited by
- Owner
- Link to the chart in Metabase
- Datasource and lineage
The following properties for a chart are ingested in DataHub.
Name | Description |
---|---|
Dimensions | Column names |
Filters | Any filters applied to the chart |
Metrics | All columns that are being used for aggregation |
CLI based Ingestion
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
api_key One of string(password), null | Metabase API key. If provided, the username and password will be ignored. Recommended method. Default: None |
connect_uri string | Metabase host URL. Default: localhost:3000 |
convert_urns_to_lowercase boolean | Whether to convert dataset urns to lowercase. Default: False |
database_alias_map One of object, null | Database name map to use when constructing dataset URN. Default: None |
database_id_to_instance_map One of string, null | Custom mappings between metabase database id and DataHub platform instance Default: None |
default_schema string | Default schema name to use when schema is not provided in an SQL query Default: public |
display_uri One of string, null | optional URL to use in links (if connect_uri is only for ingestion) Default: None |
engine_platform_map One of string, null | Custom mappings between metabase database engines and DataHub platforms Default: None |
exclude_other_user_collections boolean | Flag that if true, exclude other user collections Default: False |
password One of string(password), null | Metabase password, used when an API key is not provided. Default: None |
platform_instance_map One of string, null | A holder for platform -> platform_instance mappings to generate correct dataset urns Default: None |
username One of string, null | Metabase username, used when an API key is not provided. Default: None |
env string | The environment that all assets produced by this connector belong to Default: PROD |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"convert_urns_to_lowercase": {
"default": false,
"description": "Whether to convert dataset urns to lowercase.",
"title": "Convert Urns To Lowercase",
"type": "boolean"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null
},
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance_map": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "A holder for platform -> platform_instance mappings to generate correct dataset urns",
"title": "Platform Instance Map"
},
"connect_uri": {
"default": "localhost:3000",
"description": "Metabase host URL.",
"title": "Connect Uri",
"type": "string"
},
"display_uri": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "optional URL to use in links (if `connect_uri` is only for ingestion)",
"title": "Display Uri"
},
"username": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Metabase username, used when an API key is not provided.",
"title": "Username"
},
"password": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Metabase password, used when an API key is not provided.",
"title": "Password"
},
"api_key": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Metabase API key. If provided, the username and password will be ignored. Recommended method.",
"title": "Api Key"
},
"database_alias_map": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Database name map to use when constructing dataset URN.",
"title": "Database Alias Map"
},
"engine_platform_map": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Custom mappings between metabase database engines and DataHub platforms",
"title": "Engine Platform Map"
},
"database_id_to_instance_map": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Custom mappings between metabase database id and DataHub platform instance",
"title": "Database Id To Instance Map"
},
"default_schema": {
"default": "public",
"description": "Default schema name to use when schema is not provided in an SQL query",
"title": "Default Schema",
"type": "string"
},
"exclude_other_user_collections": {
"default": false,
"description": "Flag that if true, exclude other user collections",
"title": "Exclude Other User Collections",
"type": "boolean"
}
},
"title": "MetabaseConfig",
"type": "object"
}
Metabase databases will be mapped to a DataHub platform based on the engine listed in the
api/database response. This mapping can be
customized by using the engine_platform_map
config option. For example, to map databases using the athena
engine to
the underlying datasets in the glue
platform, the following snippet can be used:
engine_platform_map:
athena: glue
DataHub will try to determine database name from Metabase api/database
payload. However, the name can be overridden from database_alias_map
for a given database connected to Metabase.
If several platform instances with the same platform (e.g. from several distinct clickhouse clusters) are present in DataHub, the mapping between database id in Metabase and platform instance in DataHub may be configured with the following map:
database_id_to_instance_map:
"42": platform_instance_in_datahub
The key in this map must be string, not integer although Metabase API provides id
as number.
If database_id_to_instance_map
is not specified, platform_instance_map
is used for platform instance mapping. If none of the above are specified, platform instance is not used when constructing urn
when searching for dataset relations.
If needed it is possible to exclude collections from other users by setting the following configuration:
exclude_other_user_collections: true
Compatibility
Metabase version v0.48.3
Code Coordinates
- Class Name:
datahub.ingestion.source.metabase.MetabaseSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Metabase, feel free to ping us on our Slack.