Version: Next

Snowflake

Snowflake Ingestion through the UI

The following video shows you how to ingest Snowflake metadata through the UI.

Read on if you are interested in ingesting Snowflake metadata using the datahub cli, or want to learn about all the configuration parameters that are supported by the connectors.

Important Capabilities

Capability	Status	Notes
Asset Containers	✅	Enabled by default. Supported for types - Database, Schema.
Classification	✅	Optionally enabled via `classification.enabled`.
Column-level Lineage	✅	Enabled by default, can be disabled via configuration `include_column_lineage`.
Data Profiling	✅	Optionally enabled via configuration `profiling.enabled`.
Dataset Usage	✅	Enabled by default, can be disabled via configuration `include_usage_stats`.
Descriptions	✅	Enabled by default.
Detect Deleted Entities	✅	Enabled by default via stateful ingestion.
Domains	✅	Supported via the `domain` config field.
Extract Tags	✅	Optionally enabled via `extract_tags`.
Platform Instance	✅	Enabled by default.
Schema Metadata	✅	Enabled by default.
Table-Level Lineage	✅	Enabled by default, can be disabled via configuration `include_table_lineage`.
Test Connection	✅	Enabled by default.

Prerequisites

In order to execute this source, your Snowflake user will need to have specific privileges granted to it for reading metadata from your warehouse.

Snowflake system admin can follow this guide to create a DataHub-specific role, assign it the required privileges, and assign it to a new DataHub user by executing the following Snowflake commands from a user with the ACCOUNTADMIN role or MANAGE GRANTS privilege.

create or replace role datahub_role;

// Grant access to a warehouse to run queries to view metadata
grant operate, usage on warehouse "<your-warehouse>" to role datahub_role;

// Grant access to view database and schema in which your tables/views/dynamic tables exist
grant usage on DATABASE "<your-database>" to role datahub_role;
grant usage on all schemas in database "<your-database>" to role datahub_role;
grant usage on future schemas in database "<your-database>" to role datahub_role;
grant select on all streams in database "<your-database>" to role datahub_role;
grant select on future streams in database "<your-database>" to role datahub_role;

// If you are NOT using Snowflake Profiling or Classification feature: Grant references privileges to your tables and views
grant references on all tables in database "<your-database>" to role datahub_role;
grant references on future tables in database "<your-database>" to role datahub_role;
grant references on all external tables in database "<your-database>" to role datahub_role;
grant references on future external tables in database "<your-database>" to role datahub_role;
grant references on all views in database "<your-database>" to role datahub_role;
grant references on future views in database "<your-database>" to role datahub_role;

// Grant monitor privileges for dynamic tables (Enterprise Edition feature)
grant monitor on all dynamic tables in database "<your-database>" to role datahub_role;
grant monitor on future dynamic tables in database "<your-database>" to role datahub_role;

// If you ARE using Snowflake Profiling or Classification feature: Grant select privileges to your tables
grant select on all tables in database "<your-database>" to role datahub_role;
grant select on future tables in database "<your-database>" to role datahub_role;
grant select on all external tables in database "<your-database>" to role datahub_role;
grant select on future external tables in database "<your-database>" to role datahub_role;
grant select on all dynamic tables in database "<your-database>" to role datahub_role;
grant select on future dynamic tables in database "<your-database>" to role datahub_role;

// Create a new DataHub user and assign the DataHub role to it
create user datahub_user display_name = 'DataHub' password='' default_role = datahub_role default_warehouse = '<your-warehouse>';

// Grant the datahub_role to the new DataHub user.
grant role datahub_role to user datahub_user;

// Optional - required if extracting lineage, usage or tags (without lineage)
grant imported privileges on database snowflake to role datahub_role;

The details of each granted privilege can be viewed in snowflake docs. A summarization of each privilege, and why it is required for this connector:

operate is required only to start the warehouse. If the warehouse is already running during ingestion or has auto-resume enabled, this permission is not required.
usage is required for us to run queries using the warehouse
usage on database and schema are required because without it tables, views, and streams inside them are not accessible. If an admin does the required grants on table but misses the grants on schema or the database in which the table/view/stream exists then we will not be able to get metadata for the table/view/stream.
If metadata is required only on some schemas then you can grant the usage privileges only on a particular schema like

grant usage on schema "<your-database>"."<your-schema>" to role datahub_role;

select on streams is required in order for stream definitions to be available. This does not allow selecting of the data (not required) unless the underlying dataset has select access as well.

grant usage on schema "<your-database>"."<your-schema>" to role datahub_role;

This represents the bare minimum privileges required to extract databases, schemas, views, tables from Snowflake.

If you plan to enable extraction of table lineage, via the include_table_lineage config flag, extraction of usage statistics, via the include_usage_stats config, or extraction of tags (without lineage), via the extract_tags config, you'll also need to grant access to the Account Usage system tables, using which the DataHub source extracts information. This can be done by granting access to the snowflake database.

grant imported privileges on database snowflake to role datahub_role;

Authentication

Authentication is most simply done via a Snowflake user and password.

Alternatively, other authentication methods are supported via the authentication_type config option.

Key Pair Authentication

To set up Key Pair authentication, follow the three steps in this guide

Generate the private key
Generate the public key
Assign the public key to datahub user to be configured in recipe.

Pass in the following values in recipe config instead of password, ensuring the private key maintains proper PEM format with line breaks at the beginning, end, and approximately every 64 characters within the key:

authentication_type: KEY_PAIR_AUTHENTICATOR
private_key: <Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----'>

# Optional - if using encrypted private key
private_key_password: <Password for your private key>

Okta OAuth

To set up Okta OAuth authentication, roughly follow the four steps in this guide.

Pass in the following values, as described in the article, for your recipe's oauth_config:

provider: okta
client_id: <OAUTH_CLIENT_ID>
client_secret: <OAUTH_CLIENT_SECRET>
authority_url: <OKTA_OAUTH_TOKEN_ENDPOINT>
scopes: The list of your Okta scopes, i.e. with the session:role: prefix

Datahub only supports two OAuth grant types: client_credentials and password. The steps slightly differ based on which you decide to use.

Client Credentials Grant Type (Simpler)

When creating an Okta App Integration, choose type API Services
- Ensure client authentication method is Client secret
- Note your Client ID
Create a Snowflake user to correspond to your newly created Okta client credentials
- Ensure the user's Login Name matches your Okta application's Client ID
- Ensure the user has been granted your datahub role

Password Grant Type

When creating an Okta App Integration, choose type OIDC -> Native Application
- Add Grant Type Resource Owner Password
- Ensure client authentication method is Client secret
Create an Okta user to sign into, noting the Username and Password
Create a Snowflake user to correspond to your newly created Okta client credentials
- Ensure the user's Login Name matches your Okta user's Username (likely an email)
- Ensure the user has been granted your datahub role
When running ingestion, provide the required oauth_config fields, including client_id and client_secret, plus your Okta user's Username and Password
- Note: the username and password config options are not nested under oauth_config

Snowflake Shares

If you are using Snowflake Shares to share data across different snowflake accounts, and you have set up DataHub recipes for ingesting metadata from all these accounts, you may end up having multiple similar dataset entities corresponding to virtual versions of same table in different snowflake accounts. DataHub Snowflake connector can automatically link such tables together through Siblings and Lineage relationship if user provides information necessary to establish the relationship using configuration shares in recipe.

Example

Snowflake account account1 (ingested as platform_instance instance1) owns a database db1. A share X is created in account1 that includes database db1 along with schemas and tables inside it.
Now, X is shared with snowflake account account2 (ingested as platform_instance instance2). A database db1_from_X is created from inbound share X in account2. In this case, all tables and views included in share X will also be present in instance2.db1_from_X.

This can be represented in shares configuration section as

shares:
  X: # name of the share
    database: db1
    platform_instance: instance1
    consumers: # list of all databases created from share X
      - database: db1_from_X
        platform_instance: instance2

If share X is shared with more snowflake accounts and database is created from share X in those account then additional entries need to be added in consumers list for share X, one per snowflake account. The same shares config can then be copied across recipes of all accounts.

Caveats

Some of the features are only available in the Snowflake Enterprise Edition. This includes dynamic tables, advanced lineage features, and tags. This doc has notes mentioning where this applies.
Dynamic tables require the monitor privilege for metadata extraction. Without this privilege, dynamic tables will not be visible to DataHub.
The underlying Snowflake views that we use to get metadata have a latency of 45 minutes to 3 hours. So we would not be able to get very recent metadata in some cases like queries you ran within that time period etc. This is applicable particularly for lineage, usage and tags (without lineage) extraction.
If there is any incident going on for Snowflake we will not be able to get the metadata until that incident is resolved.

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: snowflake
  config:
    # This option is recommended to be used to ingest all lineage on the first run.
    ignore_start_time_lineage: true

    # Coordinates
    account_id: "abc48144"
    warehouse: "COMPUTE_WH"

    # Credentials
    username: "${SNOWFLAKE_USER}"
    password: "${SNOWFLAKE_PASS}"
    role: "datahub_role"

    # (Optional) Uncomment and update this section to filter ingested datasets 
    # database_pattern:
    #   allow:
    #     - "^ACCOUNTING_DB$"
    #     - "^MARKETING_DB$"

    profiling:
      # Change to false to disable profiling
      enabled: true
      # This option is recommended to reduce profiling time and costs.
      turn_off_expensive_profiling_metrics: true

    # (Optional) Uncomment and update this section to filter profiled tables
    # profile_pattern:
    #   allow:
    #   - "ACCOUNTING_DB.*.*"
    #   - "MARKETING_DB.*.*"

# Default sink is datahub-rest and doesn't need to be configured
# See https://docs.datahub.com/docs/metadata-ingestion/sink_docs/datahub for customization options

Config Details

Options
Schema

Note that a . is used to denote nested fields in the YAML recipe.

Field	Description
account_id ✅ string	Snowflake account identifier. e.g. xy12345, xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer Account Identifiers for more details.
apply_view_usage_to_tables boolean	Whether to apply view's usage to its base tables. If set to True, usage is applied to base tables only. Default: False
authentication_type string	The type of authenticator to use when connecting to Snowflake. Supports "DEFAULT_AUTHENTICATOR", "OAUTH_AUTHENTICATOR", "EXTERNAL_BROWSER_AUTHENTICATOR" and "KEY_PAIR_AUTHENTICATOR". Default: DEFAULT_AUTHENTICATOR
bucket_duration Enum	One of: "DAY", "HOUR"
connect_args One of object, null	Connect args to pass to Snowflake SqlAlchemy driver Default: None
convert_urns_to_lowercase boolean	Whether to convert dataset urns to lowercase. Default: True
email_domain One of string, null	Email domain of your organization so users can be displayed on UI appropriately. This is used only if we cannot infer email ID. Default: None
enable_stateful_lineage_ingestion boolean	Enable stateful lineage ingestion. This will store lineage window timestamps after successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run. Default: True
enable_stateful_profiling boolean	Enable stateful profiling. This will store profiling timestamps per dataset after successful profiling. and will not run profiling again in subsequent run if table has not been updated. Default: True
enable_stateful_usage_ingestion boolean	Enable stateful lineage ingestion. This will store usage window timestamps after successful usage ingestion. and will not run usage ingestion for same timestamps in subsequent run. Default: True
end_time string(date-time)	Latest date of lineage/usage to consider. Default: Current time in UTC
extract_tags Enum	One of: "with_lineage", "without_lineage", "skip"
extract_tags_as_structured_properties boolean	If enabled along with `extract_tags`, extracts snowflake's key-value tags as DataHub structured properties instead of DataHub tags. Default: False
fetch_views_from_information_schema boolean	If enabled, uses information_schema.views to fetch view definitions instead of SHOW VIEWS command. This alternative method can be more reliable for databases with large numbers of views (> 10K views), as the SHOW VIEWS approach has proven unreliable and can lead to missing views in such scenarios. However, this method requires OWNERSHIP privileges on views to retrieve their definitions. For views without ownership permissions (where VIEW_DEFINITION is null/empty), the system will automatically fall back to using batched SHOW VIEWS queries to populate the missing definitions. Default: False
format_sql_queries boolean	Whether to format sql queries Default: False
ignore_start_time_lineage boolean	Default: False
include_assertion_results boolean	Whether to ingest assertion run results for assertions created using Datahub assertions CLI in snowflake Default: False
include_column_lineage boolean	Populates table->table and view->table column lineage. Requires appropriate grants given to the role and the Snowflake Enterprise Edition or above. Default: True
include_external_url boolean	Whether to populate Snowsight url for Snowflake Objects Default: True
include_foreign_keys boolean	If enabled, populates the snowflake foreign keys. Default: True
include_operational_stats boolean	Whether to display operational stats. Default: True
include_primary_keys boolean	If enabled, populates the snowflake primary keys. Default: True
include_procedures boolean	If enabled, procedures will be ingested as pipelines/tasks. Default: True
include_queries boolean	If enabled, generate query entities associated with lineage edges. Only applicable if `use_queries_v2` is enabled. Default: True
include_query_usage_statistics boolean	If enabled, generate query popularity statistics. Only applicable if `use_queries_v2` is enabled. Default: True
include_read_operational_stats boolean	Whether to report read operational stats. Experimental. Default: False
include_streams boolean	If enabled, streams will be ingested as separate entities from tables/views. Default: True
include_table_lineage boolean	If enabled, populates the snowflake table-to-table and s3-to-snowflake table lineage. Requires appropriate grants given to the role and Snowflake Enterprise Edition or above. Default: True
include_table_location_lineage boolean	If the source supports it, include table lineage to the underlying storage location. Default: True
include_tables boolean	Whether tables should be ingested. Default: True
include_technical_schema boolean	If enabled, populates the snowflake technical schema and descriptions. Default: True
include_top_n_queries boolean	Whether to ingest the top_n_queries. Default: True
include_usage_stats boolean	If enabled, populates the snowflake usage statistics. Requires appropriate grants given to the role. Default: True
include_view_column_lineage boolean	Populates column-level lineage for view->view and table->view lineage using DataHub's sql parser. Requires `include_view_lineage` to be enabled. Default: True
include_view_definitions boolean	If enabled, populates the ingested views' definitions. Default: True
include_view_lineage boolean	Populates view->view and table->view lineage using DataHub's sql parser. Default: True
include_views boolean	Whether views should be ingested. Default: True
incremental_lineage boolean	When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run. Default: False
incremental_properties boolean	When enabled, emits dataset properties as incremental to existing dataset properties in DataHub. When disabled, re-states dataset properties on each run. Default: False
known_snowflake_edition One of Enum, null	Explicitly specify the Snowflake edition (STANDARD or ENTERPRISE). If unset, the edition will be inferred automatically using 'SHOW TAGS'. Default: None
lazy_schema_resolver boolean	If enabled, uses lazy schema resolver to resolve schemas for tables and views. This is useful if you have a large number of schemas and want to avoid bulk fetching the schema for each table/view. Default: True
match_fully_qualified_names boolean	Whether `schema_pattern` is matched against fully qualified schema name `<catalog>.<schema>`. Default: False
options object	Any options specified here will be passed to SQLAlchemy.create_engine as kwargs.
password One of string(password), null	Snowflake password. Default: None
platform_instance One of string, null	The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None
private_key One of string, null	Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\nencrypted-private-key\n-----END ENCRYPTED PRIVATE KEY-----\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html Default: None
private_key_password One of string(password), null	Password for your private key. Required if using key pair authentication with encrypted private key. Default: None
private_key_path One of string, null	The path to the private key if using key pair authentication. Ignored if `private_key` is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html Default: None
push_down_database_pattern_access_history boolean	If enabled, pushes down database pattern filtering to the access_history table for improved performance. This filters on the accessed objects in access_history. Default: False
query_dedup_strategy Enum	One of: "STANDARD", "NONE"
role One of string, null	Snowflake role. Default: None
snowflake_domain string	Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region. Default: snowflakecomputing.com
start_time string(date-time)	Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on `bucket_duration`). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'. Default: None
token One of string, null	OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired. Default: None
top_n_queries integer	Number of top queries to save to each table. Default: 10
upstream_lineage_in_report boolean	Default: False
use_file_backed_cache boolean	Whether to use a file backed cache for the view definitions. Default: True
use_queries_v2 boolean	If enabled, uses the new queries extractor to extract queries from snowflake. Default: True
username One of string, null	Snowflake username. Default: None
validate_upstreams_against_patterns boolean	Whether to validate upstream snowflake tables against allow-deny patterns Default: True
warehouse One of string, null	Snowflake warehouse. Default: None
env string	The environment that all assets produced by this connector belong to Default: PROD
additional_database_names_allowlist array	Additional database names (no pattern matching) to be included in the accesshistory filter. Only applies if push_down_database_pattern_access_history=True. These databases will be included in the filter being pushed down regardless of database_pattern settings.This may be required in the case of _eg temporary tables being created in a different database than the ones in the database_name patterns. Default: []
additional_database_names_allowlist.string string
database_pattern AllowDenyPattern	A class to store allow deny regexes
database_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
domain map(str,AllowDenyPattern)	A class to store allow deny regexes
domain.`key`.allow array	List of regex patterns to include in ingestion Default: ['.*']
domain.`key`.allow.string string
domain.`key`.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
domain.`key`.deny array	List of regex patterns to exclude from ingestion. Default: []
domain.`key`.deny.string string
oauth_config One of OAuthConfiguration, null	oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth Default: None
oauth_config.authority_url ❓ string	Authority url of your identity provider
oauth_config.client_id ❓ string	client id of your registered application
oauth_config.provider ❓ Enum	One of: "microsoft", "okta"
oauth_config.scopes ❓ array	scopes required to connect to snowflake
oauth_config.scopes.string string
oauth_config.client_secret One of string(password), null	client secret of the application if use_certificate = false Default: None
oauth_config.encoded_oauth_private_key One of string, null	base64 encoded private key content if use_certificate = true Default: None
oauth_config.encoded_oauth_public_key One of string, null	base64 encoded certificate content if use_certificate = true Default: None
oauth_config.use_certificate boolean	Do you want to use certificate and private key to authenticate using oauth Default: False
procedure_pattern AllowDenyPattern	A class to store allow deny regexes
procedure_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
profile_pattern AllowDenyPattern	A class to store allow deny regexes
profile_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
pushdown_allow_usernames array	List of snowflake usernames (SQL LIKE patterns, e.g., 'ANALYST_%', '%_USER', 'MAIN_ACCOUNT') which WILL be considered for lineage/usage/queries extraction. This is primarily useful for improving performance by filtering in only specific users. Only applicable if `use_queries_v2` is enabled. If not specified, all users not in deny list are included. Default: []
pushdown_allow_usernames.string string
pushdown_deny_usernames array	List of snowflake usernames (SQL LIKE patterns, e.g., 'SERVICE_%', '%_PROD', 'TEST_USER') which will NOT be considered for lineage/usage/queries extraction. This is primarily useful for improving performance by filtering out users with extremely high query volumes. Only applicable if `use_queries_v2` is enabled. Default: []
pushdown_deny_usernames.string string
schema_pattern AllowDenyPattern	A class to store allow deny regexes
schema_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
shares One of SnowflakeShareConfig, null	Required if current account owns or consumes snowflake share.If specified, connector creates lineage and siblings relationship between current account's database tables and consumer/producer account's database tables. Map of share name -> details of share. Default: None
shares.`key`.database ❓ string	Database from which share is created.
shares.`key`.consumers ❓ array	List of databases created in consumer accounts.
shares.`key`.consumers.DatabaseId DatabaseId
shares.`key`.consumers.DatabaseId.database ❓ string
shares.`key`.consumers.DatabaseId.platform_instance One of string, null	Default: None
shares.`key`.platform_instance One of string, null	Platform instance for snowflake account in which share is created. Default: None
stream_pattern AllowDenyPattern	A class to store allow deny regexes
stream_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
structured_property_pattern AllowDenyPattern	A class to store allow deny regexes
structured_property_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
table_pattern AllowDenyPattern	A class to store allow deny regexes
table_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
tag_pattern AllowDenyPattern	A class to store allow deny regexes
tag_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
temporary_tables_pattern array	[Advanced] Regex patterns for temporary tables to filter in lineage ingestion. Specify regex to match the entire table name in database.schema.table format. Defaults are to set in such a way to ignore the temporary staging tables created by known ETL tools. Default: ['.\.FIVETRAN_._STAGING\..', '.__DBT_TMP$', ...
temporary_tables_pattern.string string
user_email_pattern AllowDenyPattern	A class to store allow deny regexes
user_email_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
view_pattern AllowDenyPattern	A class to store allow deny regexes
view_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
classification ClassificationConfig
classification.enabled boolean	Whether classification should be used to auto-detect glossary terms Default: False
classification.info_type_to_term map(str,string)
classification.max_workers integer	Number of worker processes to use for classification. Set to 1 to disable. Default: 4
classification.sample_size integer	Number of sample values used for classification. Default: 100
classification.classifiers array	Classifiers to use to auto-detect glossary terms. If more than one classifier, infotype predictions from the classifier defined later in sequence take precedance. Default: [{'type': 'datahub', 'config': None}]
classification.classifiers.DynamicTypedClassifierConfig DynamicTypedClassifierConfig
classification.classifiers.DynamicTypedClassifierConfig.type ❓ string	The type of the classifier to use. For DataHub, use `datahub`
classification.classifiers.DynamicTypedClassifierConfig.config One of object, null	The configuration required for initializing the classifier. If not specified, uses defaults for classifer type. Default: None
classification.column_pattern AllowDenyPattern	A class to store allow deny regexes
classification.column_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
classification.table_pattern AllowDenyPattern	A class to store allow deny regexes
classification.table_pattern.ignoreCase One of boolean, null	Whether to ignore case sensitivity during pattern matching. Default: True
profiling GEProfilingConfig
profiling.catch_exceptions boolean	Default: True
profiling.enabled boolean	Whether profiling should be done. Default: False
profiling.field_sample_values_limit integer	Upper limit for number of sample values to collect for all columns. Default: 20
profiling.include_field_distinct_count boolean	Whether to profile for the number of distinct values for each column. Default: True
profiling.include_field_distinct_value_frequencies boolean	Whether to profile for distinct value frequencies. Default: False
profiling.include_field_histogram boolean	Whether to profile for the histogram for numeric fields. Default: False
profiling.include_field_max_value boolean	Whether to profile for the max value of numeric columns. Default: True
profiling.include_field_mean_value boolean	Whether to profile for the mean value of numeric columns. Default: True
profiling.include_field_median_value boolean	Whether to profile for the median value of numeric columns. Default: True
profiling.include_field_min_value boolean	Whether to profile for the min value of numeric columns. Default: True
profiling.include_field_null_count boolean	Whether to profile for the number of nulls for each column. Default: True
profiling.include_field_quantiles boolean	Whether to profile for the quantiles of numeric columns. Default: False
profiling.include_field_sample_values boolean	Whether to profile for the sample values for all columns. Default: True
profiling.include_field_stddev_value boolean	Whether to profile for the standard deviation of numeric columns. Default: True
profiling.limit One of integer, null	Max number of documents to profile. By default, profiles all documents. Default: None
profiling.max_number_of_fields_to_profile One of integer, null	A positive integer that specifies the maximum number of columns to profile for any table. `None` implies all columns. The cost of profiling goes up significantly as the number of columns to profile goes up. Default: None
profiling.max_workers integer	Number of worker threads to use for profiling. Set to 1 to disable. Default: 20
profiling.offset One of integer, null	Offset in documents to profile. By default, uses no offset. Default: None
profiling.partition_datetime One of string(date-time), null	If specified, profile only the partition which matches this datetime. If not specified, profile the latest partition. Only Bigquery supports this. Default: None
profiling.partition_profiling_enabled boolean	Whether to profile partitioned tables. Only BigQuery and Aws Athena supports this. If enabled, latest partition data is used for profiling. Default: True
profiling.profile_external_tables boolean	Whether to profile external tables. Only Snowflake and Redshift supports this. Default: False
profiling.profile_if_updated_since_days One of number, null	Profile table only if it has been updated since these many number of days. If set to `null`, no constraint of last modified time for tables to profile. Supported only in `snowflake` and `BigQuery`. Default: None
profiling.profile_nested_fields boolean	Whether to profile complex types like structs, arrays and maps. Default: False
profiling.profile_table_level_only boolean	Whether to perform profiling at table-level only, or include column-level profiling as well. Default: False
profiling.profile_table_row_count_estimate_only boolean	Use an approximate query for row count. This will be much faster but slightly less accurate. Only supported for Postgres and MySQL. Default: False
profiling.profile_table_row_limit One of integer, null	Profile tables only if their row count is less than specified count. If set to `null`, no limit on the row count of tables to profile. Supported only in `Snowflake`, `BigQuery`. Supported for `Oracle` based on gathered stats. Default: 5000000
profiling.profile_table_size_limit One of integer, null	Profile tables only if their size is less than specified GBs. If set to `null`, no limit on the size of tables to profile. Supported only in `Snowflake`, `BigQuery` and `Databricks`. Supported for `Oracle` based on calculated size from gathered stats. Default: 5
profiling.query_combiner_enabled boolean	This feature is still experimental and can be disabled if it causes issues. Reduces the total number of queries issued and speeds up profiling by dynamically combining SQL queries where possible. Default: True
profiling.report_dropped_profiles boolean	Whether to report datasets or dataset columns which were not profiled. Set to `True` for debugging purposes. Default: False
profiling.sample_size integer	Number of rows to be sampled from table for column level profiling.Applicable only if `use_sampling` is set to True. Default: 10000
profiling.turn_off_expensive_profiling_metrics boolean	Whether to turn off expensive profiling or not. This turns off profiling for quantiles, distinct_value_frequencies, histogram & sample_values. This also limits maximum number of fields being profiled to 10. Default: False
profiling.use_sampling boolean	Whether to profile column level stats on sample of table. Only BigQuery and Snowflake support this. If enabled, profiling is done on rows sampled from table. Sampling is not done for smaller tables. Default: True
profiling.operation_config OperationConfig
profiling.operation_config.lower_freq_profile_enabled boolean	Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling. Default: False
profiling.operation_config.profile_date_of_month One of integer, null	Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect. Default: None
profiling.operation_config.profile_day_of_week One of integer, null	Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect. Default: None
profiling.tags_to_ignore_sampling One of array, null	Fixed list of tags to ignore sampling. If not specified, tables will be sampled based on `use_sampling`. Default: None
profiling.tags_to_ignore_sampling.string string
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null	Default: None
stateful_ingestion.enabled boolean	Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False Default: False
stateful_ingestion.fail_safe_threshold number	Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0
stateful_ingestion.remove_stale_metadata boolean	Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True

The JSONSchema for this configuration is inlined below.

{
  "$defs": {
    "AllowDenyPattern": {
      "additionalProperties": false,
      "description": "A class to store allow deny regexes",
      "properties": {
        "allow": {
          "default": [
            ".*"
          ],
          "description": "List of regex patterns to include in ingestion",
          "items": {
            "type": "string"
          },
          "title": "Allow",
          "type": "array"
        },
        "deny": {
          "default": [],
          "description": "List of regex patterns to exclude from ingestion.",
          "items": {
            "type": "string"
          },
          "title": "Deny",
          "type": "array"
        },
        "ignoreCase": {
          "anyOf": [
            {
              "type": "boolean"
            },
            {
              "type": "null"
            }
          ],
          "default": true,
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "title": "Ignorecase"
        }
      },
      "title": "AllowDenyPattern",
      "type": "object"
    },
    "BucketDuration": {
      "enum": [
        "DAY",
        "HOUR"
      ],
      "title": "BucketDuration",
      "type": "string"
    },
    "ClassificationConfig": {
      "additionalProperties": false,
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether classification should be used to auto-detect glossary terms",
          "title": "Enabled",
          "type": "boolean"
        },
        "sample_size": {
          "default": 100,
          "description": "Number of sample values used for classification.",
          "title": "Sample Size",
          "type": "integer"
        },
        "max_workers": {
          "default": 4,
          "description": "Number of worker processes to use for classification. Set to 1 to disable.",
          "title": "Max Workers",
          "type": "integer"
        },
        "table_pattern": {
          "$ref": "#/$defs/AllowDenyPattern",
          "default": {
            "allow": [
              ".*"
            ],
            "deny": [],
            "ignoreCase": true
          },
          "description": "Regex patterns to filter tables for classification. This is used in combination with other patterns in parent config. Specify regex to match the entire table name in `database.schema.table` format. e.g. to match all tables starting with customer in Customer database and public schema, use the regex 'Customer.public.customer.*'"
        },
        "column_pattern": {
          "$ref": "#/$defs/AllowDenyPattern",
          "default": {
            "allow": [
              ".*"
            ],
            "deny": [],
            "ignoreCase": true
          },
          "description": "Regex patterns to filter columns for classification. This is used in combination with other patterns in parent config. Specify regex to match the column name in `database.schema.table.column` format."
        },
        "info_type_to_term": {
          "additionalProperties": {
            "type": "string"
          },
          "default": {},
          "description": "Optional mapping to provide glossary term identifier for info type",
          "title": "Info Type To Term",
          "type": "object"
        },
        "classifiers": {
          "default": [
            {
              "type": "datahub",
              "config": null
            }
          ],
          "description": "Classifiers to use to auto-detect glossary terms. If more than one classifier, infotype predictions from the classifier defined later in sequence take precedance.",
          "items": {
            "$ref": "#/$defs/DynamicTypedClassifierConfig"
          },
          "title": "Classifiers",
          "type": "array"
        }
      },
      "title": "ClassificationConfig",
      "type": "object"
    },
    "DatabaseId": {
      "properties": {
        "database": {
          "title": "Database",
          "type": "string"
        },
        "platform_instance": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Platform Instance"
        }
      },
      "required": [
        "database"
      ],
      "title": "DatabaseId",
      "type": "object"
    },
    "DynamicTypedClassifierConfig": {
      "additionalProperties": false,
      "properties": {
        "type": {
          "description": "The type of the classifier to use. For DataHub,  use `datahub`",
          "title": "Type",
          "type": "string"
        },
        "config": {
          "anyOf": [
            {},
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The configuration required for initializing the classifier. If not specified, uses defaults for classifer type.",
          "title": "Config"
        }
      },
      "required": [
        "type"
      ],
      "title": "DynamicTypedClassifierConfig",
      "type": "object"
    },
    "GEProfilingConfig": {
      "additionalProperties": false,
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether profiling should be done.",
          "title": "Enabled",
          "type": "boolean"
        },
        "operation_config": {
          "$ref": "#/$defs/OperationConfig",
          "description": "Experimental feature. To specify operation configs."
        },
        "limit": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Max number of documents to profile. By default, profiles all documents.",
          "title": "Limit"
        },
        "offset": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Offset in documents to profile. By default, uses no offset.",
          "title": "Offset"
        },
        "profile_table_level_only": {
          "default": false,
          "description": "Whether to perform profiling at table-level only, or include column-level profiling as well.",
          "title": "Profile Table Level Only",
          "type": "boolean"
        },
        "include_field_null_count": {
          "default": true,
          "description": "Whether to profile for the number of nulls for each column.",
          "title": "Include Field Null Count",
          "type": "boolean"
        },
        "include_field_distinct_count": {
          "default": true,
          "description": "Whether to profile for the number of distinct values for each column.",
          "title": "Include Field Distinct Count",
          "type": "boolean"
        },
        "include_field_min_value": {
          "default": true,
          "description": "Whether to profile for the min value of numeric columns.",
          "title": "Include Field Min Value",
          "type": "boolean"
        },
        "include_field_max_value": {
          "default": true,
          "description": "Whether to profile for the max value of numeric columns.",
          "title": "Include Field Max Value",
          "type": "boolean"
        },
        "include_field_mean_value": {
          "default": true,
          "description": "Whether to profile for the mean value of numeric columns.",
          "title": "Include Field Mean Value",
          "type": "boolean"
        },
        "include_field_median_value": {
          "default": true,
          "description": "Whether to profile for the median value of numeric columns.",
          "title": "Include Field Median Value",
          "type": "boolean"
        },
        "include_field_stddev_value": {
          "default": true,
          "description": "Whether to profile for the standard deviation of numeric columns.",
          "title": "Include Field Stddev Value",
          "type": "boolean"
        },
        "include_field_quantiles": {
          "default": false,
          "description": "Whether to profile for the quantiles of numeric columns.",
          "title": "Include Field Quantiles",
          "type": "boolean"
        },
        "include_field_distinct_value_frequencies": {
          "default": false,
          "description": "Whether to profile for distinct value frequencies.",
          "title": "Include Field Distinct Value Frequencies",
          "type": "boolean"
        },
        "include_field_histogram": {
          "default": false,
          "description": "Whether to profile for the histogram for numeric fields.",
          "title": "Include Field Histogram",
          "type": "boolean"
        },
        "include_field_sample_values": {
          "default": true,
          "description": "Whether to profile for the sample values for all columns.",
          "title": "Include Field Sample Values",
          "type": "boolean"
        },
        "max_workers": {
          "default": 20,
          "description": "Number of worker threads to use for profiling. Set to 1 to disable.",
          "title": "Max Workers",
          "type": "integer"
        },
        "report_dropped_profiles": {
          "default": false,
          "description": "Whether to report datasets or dataset columns which were not profiled. Set to `True` for debugging purposes.",
          "title": "Report Dropped Profiles",
          "type": "boolean"
        },
        "turn_off_expensive_profiling_metrics": {
          "default": false,
          "description": "Whether to turn off expensive profiling or not. This turns off profiling for quantiles, distinct_value_frequencies, histogram & sample_values. This also limits maximum number of fields being profiled to 10.",
          "title": "Turn Off Expensive Profiling Metrics",
          "type": "boolean"
        },
        "field_sample_values_limit": {
          "default": 20,
          "description": "Upper limit for number of sample values to collect for all columns.",
          "title": "Field Sample Values Limit",
          "type": "integer"
        },
        "max_number_of_fields_to_profile": {
          "anyOf": [
            {
              "exclusiveMinimum": 0,
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "A positive integer that specifies the maximum number of columns to profile for any table. `None` implies all columns. The cost of profiling goes up significantly as the number of columns to profile goes up.",
          "title": "Max Number Of Fields To Profile"
        },
        "profile_if_updated_since_days": {
          "anyOf": [
            {
              "exclusiveMinimum": 0,
              "type": "number"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Profile table only if it has been updated since these many number of days. If set to `null`, no constraint of last modified time for tables to profile. Supported only in `snowflake` and `BigQuery`.",
          "schema_extra": {
            "supported_sources": [
              "snowflake",
              "bigquery"
            ]
          },
          "title": "Profile If Updated Since Days"
        },
        "profile_table_size_limit": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": 5,
          "description": "Profile tables only if their size is less than specified GBs. If set to `null`, no limit on the size of tables to profile. Supported only in `Snowflake`, `BigQuery` and `Databricks`. Supported for `Oracle` based on calculated size from gathered stats.",
          "schema_extra": {
            "supported_sources": [
              "snowflake",
              "bigquery",
              "unity-catalog",
              "oracle"
            ]
          },
          "title": "Profile Table Size Limit"
        },
        "profile_table_row_limit": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": 5000000,
          "description": "Profile tables only if their row count is less than specified count. If set to `null`, no limit on the row count of tables to profile. Supported only in `Snowflake`, `BigQuery`. Supported for `Oracle` based on gathered stats.",
          "schema_extra": {
            "supported_sources": [
              "snowflake",
              "bigquery",
              "oracle"
            ]
          },
          "title": "Profile Table Row Limit"
        },
        "profile_table_row_count_estimate_only": {
          "default": false,
          "description": "Use an approximate query for row count. This will be much faster but slightly less accurate. Only supported for Postgres and MySQL. ",
          "schema_extra": {
            "supported_sources": [
              "postgres",
              "mysql"
            ]
          },
          "title": "Profile Table Row Count Estimate Only",
          "type": "boolean"
        },
        "query_combiner_enabled": {
          "default": true,
          "description": "*This feature is still experimental and can be disabled if it causes issues.* Reduces the total number of queries issued and speeds up profiling by dynamically combining SQL queries where possible.",
          "title": "Query Combiner Enabled",
          "type": "boolean"
        },
        "catch_exceptions": {
          "default": true,
          "description": "",
          "title": "Catch Exceptions",
          "type": "boolean"
        },
        "partition_profiling_enabled": {
          "default": true,
          "description": "Whether to profile partitioned tables. Only BigQuery and Aws Athena supports this. If enabled, latest partition data is used for profiling.",
          "schema_extra": {
            "supported_sources": [
              "athena",
              "bigquery"
            ]
          },
          "title": "Partition Profiling Enabled",
          "type": "boolean"
        },
        "partition_datetime": {
          "anyOf": [
            {
              "format": "date-time",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "If specified, profile only the partition which matches this datetime. If not specified, profile the latest partition. Only Bigquery supports this.",
          "schema_extra": {
            "supported_sources": [
              "bigquery"
            ]
          },
          "title": "Partition Datetime"
        },
        "use_sampling": {
          "default": true,
          "description": "Whether to profile column level stats on sample of table. Only BigQuery and Snowflake support this. If enabled, profiling is done on rows sampled from table. Sampling is not done for smaller tables. ",
          "schema_extra": {
            "supported_sources": [
              "bigquery",
              "snowflake"
            ]
          },
          "title": "Use Sampling",
          "type": "boolean"
        },
        "sample_size": {
          "default": 10000,
          "description": "Number of rows to be sampled from table for column level profiling.Applicable only if `use_sampling` is set to True.",
          "schema_extra": {
            "supported_sources": [
              "bigquery",
              "snowflake"
            ]
          },
          "title": "Sample Size",
          "type": "integer"
        },
        "profile_external_tables": {
          "default": false,
          "description": "Whether to profile external tables. Only Snowflake and Redshift supports this.",
          "schema_extra": {
            "supported_sources": [
              "redshift",
              "snowflake"
            ]
          },
          "title": "Profile External Tables",
          "type": "boolean"
        },
        "tags_to_ignore_sampling": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Fixed list of tags to ignore sampling. If not specified, tables will be sampled based on `use_sampling`.",
          "title": "Tags To Ignore Sampling"
        },
        "profile_nested_fields": {
          "default": false,
          "description": "Whether to profile complex types like structs, arrays and maps. ",
          "title": "Profile Nested Fields",
          "type": "boolean"
        }
      },
      "title": "GEProfilingConfig",
      "type": "object"
    },
    "OAuthConfiguration": {
      "additionalProperties": false,
      "properties": {
        "provider": {
          "$ref": "#/$defs/OAuthIdentityProvider",
          "description": "Identity provider for oauth.Supported providers are microsoft and okta."
        },
        "authority_url": {
          "description": "Authority url of your identity provider",
          "title": "Authority Url",
          "type": "string"
        },
        "client_id": {
          "description": "client id of your registered application",
          "title": "Client Id",
          "type": "string"
        },
        "scopes": {
          "description": "scopes required to connect to snowflake",
          "items": {
            "type": "string"
          },
          "title": "Scopes",
          "type": "array"
        },
        "use_certificate": {
          "default": false,
          "description": "Do you want to use certificate and private key to authenticate using oauth",
          "title": "Use Certificate",
          "type": "boolean"
        },
        "client_secret": {
          "anyOf": [
            {
              "format": "password",
              "type": "string",
              "writeOnly": true
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "client secret of the application if use_certificate = false",
          "title": "Client Secret"
        },
        "encoded_oauth_public_key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "base64 encoded certificate content if use_certificate = true",
          "title": "Encoded Oauth Public Key"
        },
        "encoded_oauth_private_key": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "base64 encoded private key content if use_certificate = true",
          "title": "Encoded Oauth Private Key"
        }
      },
      "required": [
        "provider",
        "authority_url",
        "client_id",
        "scopes"
      ],
      "title": "OAuthConfiguration",
      "type": "object"
    },
    "OAuthIdentityProvider": {
      "enum": [
        "microsoft",
        "okta"
      ],
      "title": "OAuthIdentityProvider",
      "type": "string"
    },
    "OperationConfig": {
      "additionalProperties": false,
      "properties": {
        "lower_freq_profile_enabled": {
          "default": false,
          "description": "Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.",
          "title": "Lower Freq Profile Enabled",
          "type": "boolean"
        },
        "profile_day_of_week": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.",
          "title": "Profile Day Of Week"
        },
        "profile_date_of_month": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.",
          "title": "Profile Date Of Month"
        }
      },
      "title": "OperationConfig",
      "type": "object"
    },
    "QueryDedupStrategyType": {
      "enum": [
        "STANDARD",
        "NONE"
      ],
      "title": "QueryDedupStrategyType",
      "type": "string"
    },
    "SnowflakeEdition": {
      "enum": [
        "Standard",
        "Enterprise or above"
      ],
      "title": "SnowflakeEdition",
      "type": "string"
    },
    "SnowflakeShareConfig": {
      "additionalProperties": false,
      "properties": {
        "database": {
          "description": "Database from which share is created.",
          "title": "Database",
          "type": "string"
        },
        "platform_instance": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Platform instance for snowflake account in which share is created.",
          "title": "Platform Instance"
        },
        "consumers": {
          "description": "List of databases created in consumer accounts.",
          "items": {
            "$ref": "#/$defs/DatabaseId"
          },
          "title": "Consumers",
          "type": "array",
          "uniqueItems": true
        }
      },
      "required": [
        "database",
        "consumers"
      ],
      "title": "SnowflakeShareConfig",
      "type": "object"
    },
    "StatefulStaleMetadataRemovalConfig": {
      "additionalProperties": false,
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "properties": {
        "enabled": {
          "default": false,
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "title": "Enabled",
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "default": true,
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "title": "Remove Stale Metadata",
          "type": "boolean"
        },
        "fail_safe_threshold": {
          "default": 75.0,
          "description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
          "maximum": 100.0,
          "minimum": 0.0,
          "title": "Fail Safe Threshold",
          "type": "number"
        }
      },
      "title": "StatefulStaleMetadataRemovalConfig",
      "type": "object"
    },
    "TagOption": {
      "enum": [
        "with_lineage",
        "without_lineage",
        "skip"
      ],
      "title": "TagOption",
      "type": "string"
    }
  },
  "additionalProperties": false,
  "properties": {
    "incremental_properties": {
      "default": false,
      "description": "When enabled, emits dataset properties as incremental to existing dataset properties in DataHub. When disabled, re-states dataset properties on each run.",
      "title": "Incremental Properties",
      "type": "boolean"
    },
    "schema_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for schemas to filter in ingestion. Will match against the full `database.schema` name if `match_fully_qualified_names` is enabled."
    },
    "table_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for tables to filter in ingestion. Specify regex to match the entire table name in database.schema.table format. e.g. to match all tables starting with customer in Customer database and public schema, use the regex 'Customer.public.customer.*'"
    },
    "view_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for views to filter in ingestion. Note: Defaults to table_pattern if not specified. Specify regex to match the entire view name in database.schema.view format. e.g. to match all views starting with customer in Customer database and public schema, use the regex 'Customer.public.customer.*'"
    },
    "classification": {
      "$ref": "#/$defs/ClassificationConfig",
      "default": {
        "enabled": false,
        "sample_size": 100,
        "max_workers": 4,
        "table_pattern": {
          "allow": [
            ".*"
          ],
          "deny": [],
          "ignoreCase": true
        },
        "column_pattern": {
          "allow": [
            ".*"
          ],
          "deny": [],
          "ignoreCase": true
        },
        "info_type_to_term": {},
        "classifiers": [
          {
            "config": null,
            "type": "datahub"
          }
        ]
      },
      "description": "For details, refer to [Classification](../../../../metadata-ingestion/docs/dev_guides/classification.md)."
    },
    "enable_stateful_profiling": {
      "default": true,
      "description": "Enable stateful profiling. This will store profiling timestamps per dataset after successful profiling. and will not run profiling again in subsequent run if table has not been updated. ",
      "title": "Enable Stateful Profiling",
      "type": "boolean"
    },
    "incremental_lineage": {
      "default": false,
      "description": "When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run.",
      "title": "Incremental Lineage",
      "type": "boolean"
    },
    "convert_urns_to_lowercase": {
      "default": true,
      "description": "Whether to convert dataset urns to lowercase.",
      "title": "Convert Urns To Lowercase",
      "type": "boolean"
    },
    "env": {
      "default": "PROD",
      "description": "The environment that all assets produced by this connector belong to",
      "title": "Env",
      "type": "string"
    },
    "platform_instance": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
      "title": "Platform Instance"
    },
    "stateful_ingestion": {
      "anyOf": [
        {
          "$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
        },
        {
          "type": "null"
        }
      ],
      "default": null
    },
    "options": {
      "additionalProperties": true,
      "description": "Any options specified here will be passed to [SQLAlchemy.create_engine](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine) as kwargs.",
      "title": "Options",
      "type": "object"
    },
    "profile_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns to filter tables (or specific columns) for profiling during ingestion. Note that only tables allowed by the `table_pattern` will be considered."
    },
    "domain": {
      "additionalProperties": {
        "$ref": "#/$defs/AllowDenyPattern"
      },
      "default": {},
      "description": "Attach domains to databases, schemas or tables during ingestion using regex patterns. Domain key can be a guid like *urn:li:domain:ec428203-ce86-4db3-985d-5a8ee6df32ba* or a string like \"Marketing\".) If you provide strings, then datahub will attempt to resolve this name to a guid, and will error out if this fails. There can be multiple domain keys specified.",
      "title": "Domain",
      "type": "object"
    },
    "include_views": {
      "default": true,
      "description": "Whether views should be ingested.",
      "title": "Include Views",
      "type": "boolean"
    },
    "include_tables": {
      "default": true,
      "description": "Whether tables should be ingested.",
      "title": "Include Tables",
      "type": "boolean"
    },
    "include_table_location_lineage": {
      "default": true,
      "description": "If the source supports it, include table lineage to the underlying storage location.",
      "title": "Include Table Location Lineage",
      "type": "boolean"
    },
    "include_view_lineage": {
      "default": true,
      "description": "Populates view->view and table->view lineage using DataHub's sql parser.",
      "title": "Include View Lineage",
      "type": "boolean"
    },
    "include_view_column_lineage": {
      "default": true,
      "description": "Populates column-level lineage for  view->view and table->view lineage using DataHub's sql parser. Requires `include_view_lineage` to be enabled.",
      "title": "Include View Column Lineage",
      "type": "boolean"
    },
    "use_file_backed_cache": {
      "default": true,
      "description": "Whether to use a file backed cache for the view definitions.",
      "title": "Use File Backed Cache",
      "type": "boolean"
    },
    "profiling": {
      "$ref": "#/$defs/GEProfilingConfig",
      "default": {
        "enabled": false,
        "operation_config": {
          "lower_freq_profile_enabled": false,
          "profile_date_of_month": null,
          "profile_day_of_week": null
        },
        "limit": null,
        "offset": null,
        "profile_table_level_only": false,
        "include_field_null_count": true,
        "include_field_distinct_count": true,
        "include_field_min_value": true,
        "include_field_max_value": true,
        "include_field_mean_value": true,
        "include_field_median_value": true,
        "include_field_stddev_value": true,
        "include_field_quantiles": false,
        "include_field_distinct_value_frequencies": false,
        "include_field_histogram": false,
        "include_field_sample_values": true,
        "max_workers": 20,
        "report_dropped_profiles": false,
        "turn_off_expensive_profiling_metrics": false,
        "field_sample_values_limit": 20,
        "max_number_of_fields_to_profile": null,
        "profile_if_updated_since_days": null,
        "profile_table_size_limit": 5,
        "profile_table_row_limit": 5000000,
        "profile_table_row_count_estimate_only": false,
        "query_combiner_enabled": true,
        "catch_exceptions": true,
        "partition_profiling_enabled": true,
        "partition_datetime": null,
        "use_sampling": true,
        "sample_size": 10000,
        "profile_external_tables": false,
        "tags_to_ignore_sampling": null,
        "profile_nested_fields": false
      }
    },
    "bucket_duration": {
      "$ref": "#/$defs/BucketDuration",
      "default": "DAY",
      "description": "Size of the time window to aggregate usage stats."
    },
    "end_time": {
      "description": "Latest date of lineage/usage to consider. Default: Current time in UTC",
      "format": "date-time",
      "title": "End Time",
      "type": "string"
    },
    "start_time": {
      "default": null,
      "description": "Earliest date of lineage/usage to consider. Default: Last full day in UTC (or hour, depending on `bucket_duration`). You can also specify relative time with respect to end_time such as '-7 days' Or '-7d'.",
      "format": "date-time",
      "title": "Start Time",
      "type": "string"
    },
    "enable_stateful_usage_ingestion": {
      "default": true,
      "description": "Enable stateful lineage ingestion. This will store usage window timestamps after successful usage ingestion. and will not run usage ingestion for same timestamps in subsequent run. ",
      "title": "Enable Stateful Usage Ingestion",
      "type": "boolean"
    },
    "enable_stateful_lineage_ingestion": {
      "default": true,
      "description": "Enable stateful lineage ingestion. This will store lineage window timestamps after successful lineage ingestion. and will not run lineage ingestion for same timestamps in subsequent run. ",
      "title": "Enable Stateful Lineage Ingestion",
      "type": "boolean"
    },
    "top_n_queries": {
      "default": 10,
      "description": "Number of top queries to save to each table.",
      "exclusiveMinimum": 0,
      "title": "Top N Queries",
      "type": "integer"
    },
    "user_email_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "regex patterns for user emails to filter in usage."
    },
    "include_operational_stats": {
      "default": true,
      "description": "Whether to display operational stats.",
      "title": "Include Operational Stats",
      "type": "boolean"
    },
    "include_read_operational_stats": {
      "default": false,
      "description": "Whether to report read operational stats. Experimental.",
      "title": "Include Read Operational Stats",
      "type": "boolean"
    },
    "format_sql_queries": {
      "default": false,
      "description": "Whether to format sql queries",
      "title": "Format Sql Queries",
      "type": "boolean"
    },
    "include_top_n_queries": {
      "default": true,
      "description": "Whether to ingest the top_n_queries.",
      "title": "Include Top N Queries",
      "type": "boolean"
    },
    "apply_view_usage_to_tables": {
      "default": false,
      "description": "Whether to apply view's usage to its base tables. If set to True, usage is applied to base tables only.",
      "title": "Apply View Usage To Tables",
      "type": "boolean"
    },
    "username": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Snowflake username.",
      "title": "Username"
    },
    "password": {
      "anyOf": [
        {
          "format": "password",
          "type": "string",
          "writeOnly": true
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Snowflake password.",
      "title": "Password"
    },
    "private_key": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Private key in a form of '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\\nencrypted-private-key\\n-----END ENCRYPTED PRIVATE KEY-----\\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html",
      "title": "Private Key"
    },
    "private_key_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "The path to the private key if using key pair authentication. Ignored if `private_key` is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html",
      "title": "Private Key Path"
    },
    "private_key_password": {
      "anyOf": [
        {
          "format": "password",
          "type": "string",
          "writeOnly": true
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Password for your private key. Required if using key pair authentication with encrypted private key.",
      "title": "Private Key Password"
    },
    "oauth_config": {
      "anyOf": [
        {
          "$ref": "#/$defs/OAuthConfiguration"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth"
    },
    "authentication_type": {
      "default": "DEFAULT_AUTHENTICATOR",
      "description": "The type of authenticator to use when connecting to Snowflake. Supports \"DEFAULT_AUTHENTICATOR\", \"OAUTH_AUTHENTICATOR\", \"EXTERNAL_BROWSER_AUTHENTICATOR\" and \"KEY_PAIR_AUTHENTICATOR\".",
      "title": "Authentication Type",
      "type": "string"
    },
    "account_id": {
      "description": "Snowflake account identifier. e.g. xy12345,  xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#format-2-legacy-account-locator-in-a-region) for more details.",
      "title": "Account Id",
      "type": "string"
    },
    "warehouse": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Snowflake warehouse.",
      "title": "Warehouse"
    },
    "role": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Snowflake role.",
      "title": "Role"
    },
    "connect_args": {
      "anyOf": [
        {
          "additionalProperties": true,
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Connect args to pass to Snowflake SqlAlchemy driver",
      "title": "Connect Args"
    },
    "token": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired.",
      "title": "Token"
    },
    "snowflake_domain": {
      "default": "snowflakecomputing.com",
      "description": "Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region.",
      "title": "Snowflake Domain",
      "type": "string"
    },
    "database_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [
          "^UTIL_DB$",
          "^SNOWFLAKE$",
          "^SNOWFLAKE_SAMPLE_DATA$"
        ],
        "ignoreCase": true
      },
      "description": "Regex patterns for databases to filter in ingestion."
    },
    "stream_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for streams to filter in ingestion. Specify regex to match the entire view name in database.schema.view format. e.g. to match all views starting with customer in Customer database and public schema, use the regex 'Customer.public.customer.*'"
    },
    "procedure_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "Regex patterns for procedures to filter in ingestion. Specify regex to match the entire procedure name in database.schema.procedure format. e.g. to match all procedures starting with customer in Customer database and public schema, use the regex 'Customer.public.customer.*'"
    },
    "match_fully_qualified_names": {
      "default": false,
      "description": "Whether `schema_pattern` is matched against fully qualified schema name `<catalog>.<schema>`.",
      "title": "Match Fully Qualified Names",
      "type": "boolean"
    },
    "email_domain": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Email domain of your organization so users can be displayed on UI appropriately. This is used only if we cannot infer email ID.",
      "title": "Email Domain"
    },
    "include_table_lineage": {
      "default": true,
      "description": "If enabled, populates the snowflake table-to-table and s3-to-snowflake table lineage. Requires appropriate grants given to the role and Snowflake Enterprise Edition or above.",
      "title": "Include Table Lineage",
      "type": "boolean"
    },
    "ignore_start_time_lineage": {
      "default": false,
      "title": "Ignore Start Time Lineage",
      "type": "boolean"
    },
    "upstream_lineage_in_report": {
      "default": false,
      "title": "Upstream Lineage In Report",
      "type": "boolean"
    },
    "include_usage_stats": {
      "default": true,
      "description": "If enabled, populates the snowflake usage statistics. Requires appropriate grants given to the role.",
      "title": "Include Usage Stats",
      "type": "boolean"
    },
    "include_view_definitions": {
      "default": true,
      "description": "If enabled, populates the ingested views' definitions.",
      "title": "Include View Definitions",
      "type": "boolean"
    },
    "fetch_views_from_information_schema": {
      "default": false,
      "description": "If enabled, uses information_schema.views to fetch view definitions instead of SHOW VIEWS command. This alternative method can be more reliable for databases with large numbers of views (> 10K views), as the SHOW VIEWS approach has proven unreliable and can lead to missing views in such scenarios. However, this method requires OWNERSHIP privileges on views to retrieve their definitions. For views without ownership permissions (where VIEW_DEFINITION is null/empty), the system will automatically fall back to using batched SHOW VIEWS queries to populate the missing definitions.",
      "title": "Fetch Views From Information Schema",
      "type": "boolean"
    },
    "include_technical_schema": {
      "default": true,
      "description": "If enabled, populates the snowflake technical schema and descriptions.",
      "title": "Include Technical Schema",
      "type": "boolean"
    },
    "include_primary_keys": {
      "default": true,
      "description": "If enabled, populates the snowflake primary keys.",
      "title": "Include Primary Keys",
      "type": "boolean"
    },
    "include_foreign_keys": {
      "default": true,
      "description": "If enabled, populates the snowflake foreign keys.",
      "title": "Include Foreign Keys",
      "type": "boolean"
    },
    "include_column_lineage": {
      "default": true,
      "description": "Populates table->table and view->table column lineage. Requires appropriate grants given to the role and the Snowflake Enterprise Edition or above.",
      "title": "Include Column Lineage",
      "type": "boolean"
    },
    "use_queries_v2": {
      "default": true,
      "description": "If enabled, uses the new queries extractor to extract queries from snowflake.",
      "title": "Use Queries V2",
      "type": "boolean"
    },
    "include_queries": {
      "default": true,
      "description": "If enabled, generate query entities associated with lineage edges. Only applicable if `use_queries_v2` is enabled.",
      "title": "Include Queries",
      "type": "boolean"
    },
    "include_query_usage_statistics": {
      "default": true,
      "description": "If enabled, generate query popularity statistics. Only applicable if `use_queries_v2` is enabled.",
      "title": "Include Query Usage Statistics",
      "type": "boolean"
    },
    "lazy_schema_resolver": {
      "default": true,
      "description": "If enabled, uses lazy schema resolver to resolve schemas for tables and views. This is useful if you have a large number of schemas and want to avoid bulk fetching the schema for each table/view.",
      "title": "Lazy Schema Resolver",
      "type": "boolean"
    },
    "query_dedup_strategy": {
      "$ref": "#/$defs/QueryDedupStrategyType",
      "default": "STANDARD",
      "description": "Experimental: Choose the strategy for query deduplication (default value is appropriate for most use-cases; make sure you understand performance implications before changing it). Allowed values are: STANDARD, NONE"
    },
    "extract_tags": {
      "$ref": "#/$defs/TagOption",
      "default": "skip",
      "description": "Optional. Allowed values are `without_lineage`, `with_lineage`, and `skip` (default). `without_lineage` only extracts tags that have been applied directly to the given entity. `with_lineage` extracts both directly applied and propagated tags, but will be significantly slower. See the [Snowflake documentation](https://docs.snowflake.com/en/user-guide/object-tagging.html#tag-lineage) for information about tag lineage/propagation. "
    },
    "extract_tags_as_structured_properties": {
      "default": false,
      "description": "If enabled along with `extract_tags`, extracts snowflake's key-value tags as DataHub structured properties instead of DataHub tags.",
      "title": "Extract Tags As Structured Properties",
      "type": "boolean"
    },
    "include_external_url": {
      "default": true,
      "description": "Whether to populate Snowsight url for Snowflake Objects",
      "title": "Include External Url",
      "type": "boolean"
    },
    "validate_upstreams_against_patterns": {
      "default": true,
      "description": "Whether to validate upstream snowflake tables against allow-deny patterns",
      "title": "Validate Upstreams Against Patterns",
      "type": "boolean"
    },
    "tag_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "List of regex patterns for tags to include in ingestion. Only used if `extract_tags` is enabled."
    },
    "include_streams": {
      "default": true,
      "description": "If enabled, streams will be ingested as separate entities from tables/views.",
      "title": "Include Streams",
      "type": "boolean"
    },
    "include_procedures": {
      "default": true,
      "description": "If enabled, procedures will be ingested as pipelines/tasks.",
      "title": "Include Procedures",
      "type": "boolean"
    },
    "structured_property_pattern": {
      "$ref": "#/$defs/AllowDenyPattern",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "description": "List of regex patterns for structured properties to include in ingestion. Applied to tags with form `<database>.<schema>.<tag_name>`. Only used if `extract_tags` and `extract_tags_as_structured_properties` are enabled."
    },
    "temporary_tables_pattern": {
      "default": [
        ".*\\.FIVETRAN_.*_STAGING\\..*",
        ".*__DBT_TMP$",
        ".*\\.SEGMENT_[a-f0-9]{8}[-_][a-f0-9]{4}[-_][a-f0-9]{4}[-_][a-f0-9]{4}[-_][a-f0-9]{12}",
        ".*\\.STAGING_.*_[a-f0-9]{8}[-_][a-f0-9]{4}[-_][a-f0-9]{4}[-_][a-f0-9]{4}[-_][a-f0-9]{12}",
        ".*\\.(GE_TMP_|GE_TEMP_|GX_TEMP_)[0-9A-F]{8}",
        ".*\\.SNOWPARK_TEMP_TABLE_.+"
      ],
      "description": "[Advanced] Regex patterns for temporary tables to filter in lineage ingestion. Specify regex to match the entire table name in database.schema.table format. Defaults are to set in such a way to ignore the temporary staging tables created by known ETL tools.",
      "items": {
        "type": "string"
      },
      "title": "Temporary Tables Pattern",
      "type": "array"
    },
    "shares": {
      "anyOf": [
        {
          "additionalProperties": {
            "$ref": "#/$defs/SnowflakeShareConfig"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Required if current account owns or consumes snowflake share.If specified, connector creates lineage and siblings relationship between current account's database tables and consumer/producer account's database tables. Map of share name -> details of share.",
      "title": "Shares"
    },
    "known_snowflake_edition": {
      "anyOf": [
        {
          "$ref": "#/$defs/SnowflakeEdition"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Explicitly specify the Snowflake edition (STANDARD or ENTERPRISE). If unset, the edition will be inferred automatically using 'SHOW TAGS'."
    },
    "include_assertion_results": {
      "default": false,
      "description": "Whether to ingest assertion run results for assertions created using Datahub assertions CLI in snowflake",
      "title": "Include Assertion Results",
      "type": "boolean"
    },
    "pushdown_deny_usernames": {
      "default": [],
      "description": "List of snowflake usernames (SQL LIKE patterns, e.g., 'SERVICE_%', '%_PROD', 'TEST_USER') which will NOT be considered for lineage/usage/queries extraction. This is primarily useful for improving performance by filtering out users with extremely high query volumes. Only applicable if `use_queries_v2` is enabled.",
      "items": {
        "type": "string"
      },
      "title": "Pushdown Deny Usernames",
      "type": "array"
    },
    "pushdown_allow_usernames": {
      "default": [],
      "description": "List of snowflake usernames (SQL LIKE patterns, e.g., 'ANALYST_%', '%_USER', 'MAIN_ACCOUNT') which WILL be considered for lineage/usage/queries extraction. This is primarily useful for improving performance by filtering in only specific users. Only applicable if `use_queries_v2` is enabled. If not specified, all users not in deny list are included.",
      "items": {
        "type": "string"
      },
      "title": "Pushdown Allow Usernames",
      "type": "array"
    },
    "push_down_database_pattern_access_history": {
      "default": false,
      "description": "If enabled, pushes down database pattern filtering to the access_history table for improved performance. This filters on the accessed objects in access_history.",
      "title": "Push Down Database Pattern Access History",
      "type": "boolean"
    },
    "additional_database_names_allowlist": {
      "default": [],
      "description": "Additional database names (no pattern matching) to be included in the access_history filter. Only applies if push_down_database_pattern_access_history=True. These databases will be included in the filter being pushed down regardless of database_pattern settings.This may be required in the case of _eg_ temporary tables being created in a different database than the ones in the database_name patterns.",
      "items": {
        "type": "string"
      },
      "title": "Additional Database Names Allowlist",
      "type": "array"
    }
  },
  "required": [
    "account_id"
  ],
  "title": "SnowflakeV2Config",
  "type": "object"
}

Code Coordinates

Class Name: datahub.ingestion.source.snowflake.snowflake_v2.SnowflakeV2Source
Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Snowflake, feel free to ping us on our Slack.

Is this page helpful?

Snowflake

Snowflake Ingestion through the UI​

Important Capabilities​

Prerequisites​

Authentication​

Key Pair Authentication​

Okta OAuth​

Client Credentials Grant Type (Simpler)​

Password Grant Type​

Snowflake Shares​

Example​

Caveats​

CLI based Ingestion​

Starter Recipe​

Config Details​

Code Coordinates​