Fivetran
Important Capabilities
Capability | Status | Notes |
---|---|---|
Column-level Lineage | ✅ | Enabled by default, can be disabled via configuration include_column_lineage . |
Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
Platform Instance | ✅ | Enabled by default. |
This plugin extracts fivetran users, connectors, destinations and sync history.
Integration Details
This source extracts the following:
- Connectors in fivetran as Data Pipelines and Data Jobs to represent data lineage information between source and destination.
- Connector sources - DataJob input Datasets.
- Connector destination - DataJob output Datasets.
- Connector runs - DataProcessInstances as DataJob runs.
Configuration Notes
- Fivetran supports the fivetran platform connector to dump the log events and connectors, destinations, users and roles metadata in your destination.
- You need to setup and start the initial sync of the fivetran platform connector before using this source. Refer link.
- Once initial sync up of your fivetran platform connector is done, you need to provide the fivetran platform connector's destination platform and its configuration in the recipe.
- We expect our users to enable automatic schema updates (default) in fivetran platform connector configured for DataHub, this ensures latest schema changes are applied and avoids inconsistency data syncs.
Concept mapping
Fivetran | Datahub |
---|---|
Connector | DataJob |
Source | Dataset |
Destination | Dataset |
Connector Run | DataProcessInstance |
Source and destination are mapped to Dataset as an Input and Output of Connector.
Current limitations
Works only for
- Snowflake destination
- Bigquery destination
- Databricks destination
Snowflake destination Configuration Guide
- If your fivetran platform connector destination is snowflake, you need to provide user details and its role with correct privileges in order to fetch metadata.
- Snowflake system admin can follow this guide to create a fivetran_datahub role, assign it the required privileges, and assign it to a user by executing the following Snowflake commands from a user with the ACCOUNTADMIN role or MANAGE GRANTS privilege.
create or replace role fivetran_datahub;
// Grant access to a warehouse to run queries to view metadata
grant operate, usage on warehouse "<your-warehouse>" to role fivetran_datahub;
// Grant access to view database and schema in which your log and metadata tables exist
grant usage on DATABASE "<fivetran-log-database>" to role fivetran_datahub;
grant usage on SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;
// Grant access to execute select query on schema in which your log and metadata tables exist
grant select on all tables in SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;
// Grant the fivetran_datahub to the snowflake user.
grant role fivetran_datahub to user snowflake_user;
Bigquery destination Configuration Guide
- If your fivetran platform connector destination is bigquery, you need to setup a ServiceAccount as per BigQuery docs and select BigQuery Data Viewer and BigQuery Job User IAM roles.
- Create and Download a service account JSON keyfile and provide bigquery connection credential in bigquery destination config.
Databricks destination Configuration Guide
- Get your Databricks instance's workspace url
- Create a Databricks Service Principal
- You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.
- Generate a Databricks Personal Access token following the following guides:
- Provision your service account, to ingest your workspace's metadata and lineage, your service principal must have all of the following:
- One of: metastore admin role, ownership of, or
USE CATALOG
privilege on any catalogs you want to ingest - One of: metastore admin role, ownership of, or
USE SCHEMA
privilege on any schemas you want to ingest - Ownership of or
SELECT
privilege on any tables and views you want to ingest - Ownership documentation
- Privileges documentation
- One of: metastore admin role, ownership of, or
- Check the starter recipe below and replace
workspace_url
andtoken
with your information from the previous steps.
Advanced Configurations
Working with Platform Instances
If you have multiple instances of source/destination systems that are referred in your fivetran
setup, you'd need to configure platform instance for these systems in fivetran
recipe to generate correct lineage edges. Refer the document Working with Platform Instances to understand more about this.
While configuring the platform instance for source system you need to provide connector id as key and for destination system provide destination id as key.
When creating the connection details in the fivetran UI make a note of the destination Group ID of the service account, as that will need to be used in the destination_to_platform_instance
configuration.
I.e:
In this case the configuration would be something like:
destination_to_platform_instance:
greyish_positive: <--- this comes from bigquery destination - see screenshot
database: <big query project ID>
env: PROD
Example - Multiple Postgres Source Connectors each reading from different postgres instance
# Map of connector source to platform instance
sources_to_platform_instance:
postgres_connector_id1:
platform_instance: cloud_postgres_instance
env: PROD
postgres_connector_id2:
platform_instance: local_postgres_instance
env: DEV
Example - Multiple Snowflake Destinations each writing to different snowflake instance
# Map of destination to platform instance
destination_to_platform_instance:
snowflake_destination_id1:
platform_instance: prod_snowflake_instance
env: PROD
snowflake_destination_id2:
platform_instance: dev_snowflake_instance
env: PROD
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: fivetran
config:
# Fivetran log connector destination server configurations
fivetran_log_config:
destination_platform: snowflake
# Optional - If destination platform is 'snowflake', provide snowflake configuration.
snowflake_destination_config:
# Coordinates
account_id: "abc48144"
warehouse: "COMPUTE_WH"
database: "MY_SNOWFLAKE_DB"
log_schema: "FIVETRAN_LOG"
# Credentials
username: "${SNOWFLAKE_USER}"
password: "${SNOWFLAKE_PASS}"
role: "snowflake_role"
# Optional - If destination platform is 'bigquery', provide bigquery configuration.
bigquery_destination_config:
# Credentials
credential:
private_key_id: "project_key_id"
project_id: "project_id"
client_email: "client_email"
client_id: "client_id"
private_key: "private_key"
dataset: "fivetran_log_dataset"
# Optional - If destination platform is 'databricks', provide databricks configuration.
databricks_destination_config:
# Credentials
credential:
token: "token"
workspace_url: "workspace_url"
warehouse_id: "warehouse_id"
# Coordinates
catalog: "fivetran_catalog"
log_schema: "fivetran_log"
# Optional - filter for certain connector names instead of ingesting everything.
# connector_patterns:
# allow:
# - connector_name
# Optional -- A mapping of the connector's all sources to its database.
# sources_to_database:
# connector_id: source_db
# Optional -- This mapping is optional and only required to configure platform-instance for source
# A mapping of Fivetran connector id to data platform instance
# sources_to_platform_instance:
# connector_id:
# platform_instance: cloud_instance
# env: DEV
# Optional -- This mapping is optional and only required to configure platform-instance for destination.
# A mapping of Fivetran destination id to data platform instance
# destination_to_platform_instance:
# destination_id:
# platform_instance: cloud_instance
# env: DEV
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
fivetran_log_config ✅ FivetranLogConfig | |
fivetran_log_config.destination_platform Enum | One of: "snowflake", "bigquery", "databricks" Default: snowflake |
fivetran_log_config.bigquery_destination_config One of BigQueryDestinationConfig, null | If destination platform is 'bigquery', provide bigquery configuration. Default: None |
fivetran_log_config.bigquery_destination_config.dataset ❓ string | The fivetran connector log dataset. |
fivetran_log_config.bigquery_destination_config.extra_client_options object | Additional options to pass to google.cloud.logging_v2.client.Client. Default: {} |
fivetran_log_config.bigquery_destination_config.project_on_behalf One of string, null | [Advanced] The BigQuery project in which queries are executed. Will be passed when creating a job. If not passed, falls back to the project associated with the service account. Default: None |
fivetran_log_config.bigquery_destination_config.credential One of GCPCredential, null | BigQuery credential informations Default: None |
fivetran_log_config.bigquery_destination_config.credential.client_email ❓ string | Client email |
fivetran_log_config.bigquery_destination_config.credential.client_id ❓ string | Client Id |
fivetran_log_config.bigquery_destination_config.credential.private_key ❓ string | Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n' |
fivetran_log_config.bigquery_destination_config.credential.private_key_id ❓ string | Private key id |
fivetran_log_config.bigquery_destination_config.credential.auth_provider_x509_cert_url string | Auth provider x509 certificate url |
fivetran_log_config.bigquery_destination_config.credential.auth_uri string | Authentication uri |
fivetran_log_config.bigquery_destination_config.credential.client_x509_cert_url One of string, null | If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email Default: None |
fivetran_log_config.bigquery_destination_config.credential.project_id One of string, null | Project id to set the credentials Default: None |
fivetran_log_config.bigquery_destination_config.credential.token_uri string | Token uri Default: https://oauth2.googleapis.com/token |
fivetran_log_config.bigquery_destination_config.credential.type string | Authentication type Default: service_account |
fivetran_log_config.databricks_destination_config One of DatabricksDestinationConfig, null | If destination platform is 'databricks', provide databricks configuration. Default: None |
fivetran_log_config.databricks_destination_config.catalog ❓ string | The fivetran connector log catalog. |
fivetran_log_config.databricks_destination_config.log_schema ❓ string | The fivetran connector log schema. |
fivetran_log_config.databricks_destination_config.token ❓ string | Databricks personal access token |
fivetran_log_config.databricks_destination_config.workspace_url ❓ string | Databricks workspace url. e.g. https://my-workspace.cloud.databricks.com |
fivetran_log_config.databricks_destination_config.extra_client_options object | Additional options to pass to Databricks SQLAlchemy client. Default: {} |
fivetran_log_config.databricks_destination_config.scheme string | Default: databricks |
fivetran_log_config.databricks_destination_config.warehouse_id One of string, null | SQL Warehouse id, for running queries. Must be explicitly provided to enable SQL-based features. Required for the following features that need SQL access: 1) Tag extraction (include_tags=True) - queries system.information_schema.tags 2) Hive Metastore catalog (include_hive_metastore=True) - queries legacy hive_metastore catalog 3) System table lineage (lineage_data_source=SYSTEM_TABLES) - queries system.access.table_lineage/column_lineage 4) Data profiling (profiling.enabled=True) - runs SELECT/ANALYZE queries on tables. When warehouse_id is missing, these features will be automatically disabled (with warnings) to allow ingestion to continue. Default: None |
fivetran_log_config.snowflake_destination_config One of SnowflakeDestinationConfig, null | If destination platform is 'snowflake', provide snowflake configuration. Default: None |
fivetran_log_config.snowflake_destination_config.account_id ❓ string | Snowflake account identifier. e.g. xy12345, xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer Account Identifiers for more details. |
fivetran_log_config.snowflake_destination_config.database ❓ string | The fivetran connector log database. |
fivetran_log_config.snowflake_destination_config.log_schema ❓ string | The fivetran connector log schema. |
fivetran_log_config.snowflake_destination_config.authentication_type string | The type of authenticator to use when connecting to Snowflake. Supports "DEFAULT_AUTHENTICATOR", "OAUTH_AUTHENTICATOR", "EXTERNAL_BROWSER_AUTHENTICATOR" and "KEY_PAIR_AUTHENTICATOR". Default: DEFAULT_AUTHENTICATOR |
fivetran_log_config.snowflake_destination_config.connect_args One of object, null | Connect args to pass to Snowflake SqlAlchemy driver Default: None |
fivetran_log_config.snowflake_destination_config.options object | Any options specified here will be passed to SQLAlchemy.create_engine as kwargs. |
fivetran_log_config.snowflake_destination_config.password One of string(password), null | Snowflake password. Default: None |
fivetran_log_config.snowflake_destination_config.private_key One of string, null | Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\nencrypted-private-key\n-----END ENCRYPTED PRIVATE KEY-----\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html Default: None |
fivetran_log_config.snowflake_destination_config.private_key_password One of string(password), null | Password for your private key. Required if using key pair authentication with encrypted private key. Default: None |
fivetran_log_config.snowflake_destination_config.private_key_path One of string, null | The path to the private key if using key pair authentication. Ignored if private_key is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html Default: None |
fivetran_log_config.snowflake_destination_config.role One of string, null | Snowflake role. Default: None |
fivetran_log_config.snowflake_destination_config.snowflake_domain string | Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region. Default: snowflakecomputing.com |
fivetran_log_config.snowflake_destination_config.token One of string, null | OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired. Default: None |
fivetran_log_config.snowflake_destination_config.username One of string, null | Snowflake username. Default: None |
fivetran_log_config.snowflake_destination_config.warehouse One of string, null | Snowflake warehouse. Default: None |
fivetran_log_config.snowflake_destination_config.oauth_config One of OAuthConfiguration, null | oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth Default: None |
fivetran_log_config.snowflake_destination_config.oauth_config.authority_url ❓ string | Authority url of your identity provider |
fivetran_log_config.snowflake_destination_config.oauth_config.client_id ❓ string | client id of your registered application |
fivetran_log_config.snowflake_destination_config.oauth_config.provider ❓ Enum | One of: "microsoft", "okta" |
fivetran_log_config.snowflake_destination_config.oauth_config.scopes ❓ array | scopes required to connect to snowflake |
fivetran_log_config.snowflake_destination_config.oauth_config.scopes.string string | |
fivetran_log_config.snowflake_destination_config.oauth_config.client_secret One of string(password), null | client secret of the application if use_certificate = false Default: None |
fivetran_log_config.snowflake_destination_config.oauth_config.encoded_oauth_private_key One of string, null | base64 encoded private key content if use_certificate = true Default: None |
fivetran_log_config.snowflake_destination_config.oauth_config.encoded_oauth_public_key One of string, null | base64 encoded certificate content if use_certificate = true Default: None |
fivetran_log_config.snowflake_destination_config.oauth_config.use_certificate boolean | Do you want to use certificate and private key to authenticate using oauth Default: False |
history_sync_lookback_period integer | The number of days to look back when extracting connectors' sync history. Default: 7 |
include_column_lineage boolean | Populates table->table column lineage. Default: True |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
env string | The environment that all assets produced by this connector belong to Default: PROD |
connector_patterns AllowDenyPattern | A class to store allow deny regexes |
connector_patterns.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
connector_patterns.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
connector_patterns.allow.string string | |
connector_patterns.deny array | List of regex patterns to exclude from ingestion. Default: [] |
connector_patterns.deny.string string | |
destination_patterns AllowDenyPattern | A class to store allow deny regexes |
destination_patterns.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
destination_patterns.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
destination_patterns.allow.string string | |
destination_patterns.deny array | List of regex patterns to exclude from ingestion. Default: [] |
destination_patterns.deny.string string | |
destination_to_platform_instance map(str,PlatformDetail) | |
destination_to_platform_instance. key .platformOne of string, null | Override the platform type detection. Default: None |
destination_to_platform_instance. key .databaseOne of string, null | The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database. Default: None |
destination_to_platform_instance. key .include_schema_in_urnboolean | Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector. Default: True |
destination_to_platform_instance. key .platform_instanceOne of string, null | The instance of the platform that all assets produced by this recipe belong to Default: None |
destination_to_platform_instance. key .envstring | The environment that all assets produced by DataHub platform ingestion source belong to Default: PROD |
sources_to_platform_instance map(str,PlatformDetail) | |
sources_to_platform_instance. key .platformOne of string, null | Override the platform type detection. Default: None |
sources_to_platform_instance. key .databaseOne of string, null | The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database. Default: None |
sources_to_platform_instance. key .include_schema_in_urnboolean | Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector. Default: True |
sources_to_platform_instance. key .platform_instanceOne of string, null | The instance of the platform that all assets produced by this recipe belong to Default: None |
sources_to_platform_instance. key .envstring | The environment that all assets produced by DataHub platform ingestion source belong to Default: PROD |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Fivetran Stateful Ingestion Config. Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"BigQueryDestinationConfig": {
"additionalProperties": false,
"properties": {
"credential": {
"anyOf": [
{
"$ref": "#/$defs/GCPCredential"
},
{
"type": "null"
}
],
"default": null,
"description": "BigQuery credential informations"
},
"extra_client_options": {
"additionalProperties": true,
"default": {},
"description": "Additional options to pass to google.cloud.logging_v2.client.Client.",
"title": "Extra Client Options",
"type": "object"
},
"project_on_behalf": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "[Advanced] The BigQuery project in which queries are executed. Will be passed when creating a job. If not passed, falls back to the project associated with the service account.",
"title": "Project On Behalf"
},
"dataset": {
"description": "The fivetran connector log dataset.",
"title": "Dataset",
"type": "string"
}
},
"required": [
"dataset"
],
"title": "BigQueryDestinationConfig",
"type": "object"
},
"DatabricksDestinationConfig": {
"additionalProperties": false,
"properties": {
"scheme": {
"default": "databricks",
"title": "Scheme",
"type": "string"
},
"token": {
"description": "Databricks personal access token",
"title": "Token",
"type": "string"
},
"workspace_url": {
"description": "Databricks workspace url. e.g. https://my-workspace.cloud.databricks.com",
"title": "Workspace Url",
"type": "string"
},
"warehouse_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "SQL Warehouse id, for running queries. Must be explicitly provided to enable SQL-based features. Required for the following features that need SQL access: 1) Tag extraction (include_tags=True) - queries system.information_schema.tags 2) Hive Metastore catalog (include_hive_metastore=True) - queries legacy hive_metastore catalog 3) System table lineage (lineage_data_source=SYSTEM_TABLES) - queries system.access.table_lineage/column_lineage 4) Data profiling (profiling.enabled=True) - runs SELECT/ANALYZE queries on tables. When warehouse_id is missing, these features will be automatically disabled (with warnings) to allow ingestion to continue.",
"title": "Warehouse Id"
},
"extra_client_options": {
"additionalProperties": true,
"default": {},
"description": "Additional options to pass to Databricks SQLAlchemy client.",
"title": "Extra Client Options",
"type": "object"
},
"catalog": {
"description": "The fivetran connector log catalog.",
"title": "Catalog",
"type": "string"
},
"log_schema": {
"description": "The fivetran connector log schema.",
"title": "Log Schema",
"type": "string"
}
},
"required": [
"token",
"workspace_url",
"catalog",
"log_schema"
],
"title": "DatabricksDestinationConfig",
"type": "object"
},
"FivetranLogConfig": {
"additionalProperties": false,
"properties": {
"destination_platform": {
"default": "snowflake",
"description": "The destination platform where fivetran connector log tables are dumped.",
"enum": [
"snowflake",
"bigquery",
"databricks"
],
"title": "Destination Platform",
"type": "string"
},
"snowflake_destination_config": {
"anyOf": [
{
"$ref": "#/$defs/SnowflakeDestinationConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "If destination platform is 'snowflake', provide snowflake configuration."
},
"bigquery_destination_config": {
"anyOf": [
{
"$ref": "#/$defs/BigQueryDestinationConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "If destination platform is 'bigquery', provide bigquery configuration."
},
"databricks_destination_config": {
"anyOf": [
{
"$ref": "#/$defs/DatabricksDestinationConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "If destination platform is 'databricks', provide databricks configuration."
}
},
"title": "FivetranLogConfig",
"type": "object"
},
"GCPCredential": {
"additionalProperties": false,
"properties": {
"project_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Project id to set the credentials",
"title": "Project Id"
},
"private_key_id": {
"description": "Private key id",
"title": "Private Key Id",
"type": "string"
},
"private_key": {
"description": "Private key in a form of '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n'",
"title": "Private Key",
"type": "string"
},
"client_email": {
"description": "Client email",
"title": "Client Email",
"type": "string"
},
"client_id": {
"description": "Client Id",
"title": "Client Id",
"type": "string"
},
"auth_uri": {
"default": "https://accounts.google.com/o/oauth2/auth",
"description": "Authentication uri",
"title": "Auth Uri",
"type": "string"
},
"token_uri": {
"default": "https://oauth2.googleapis.com/token",
"description": "Token uri",
"title": "Token Uri",
"type": "string"
},
"auth_provider_x509_cert_url": {
"default": "https://www.googleapis.com/oauth2/v1/certs",
"description": "Auth provider x509 certificate url",
"title": "Auth Provider X509 Cert Url",
"type": "string"
},
"type": {
"default": "service_account",
"description": "Authentication type",
"title": "Type",
"type": "string"
},
"client_x509_cert_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email",
"title": "Client X509 Cert Url"
}
},
"required": [
"private_key_id",
"private_key",
"client_email",
"client_id"
],
"title": "GCPCredential",
"type": "object"
},
"OAuthConfiguration": {
"additionalProperties": false,
"properties": {
"provider": {
"$ref": "#/$defs/OAuthIdentityProvider",
"description": "Identity provider for oauth.Supported providers are microsoft and okta."
},
"authority_url": {
"description": "Authority url of your identity provider",
"title": "Authority Url",
"type": "string"
},
"client_id": {
"description": "client id of your registered application",
"title": "Client Id",
"type": "string"
},
"scopes": {
"description": "scopes required to connect to snowflake",
"items": {
"type": "string"
},
"title": "Scopes",
"type": "array"
},
"use_certificate": {
"default": false,
"description": "Do you want to use certificate and private key to authenticate using oauth",
"title": "Use Certificate",
"type": "boolean"
},
"client_secret": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "client secret of the application if use_certificate = false",
"title": "Client Secret"
},
"encoded_oauth_public_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "base64 encoded certificate content if use_certificate = true",
"title": "Encoded Oauth Public Key"
},
"encoded_oauth_private_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "base64 encoded private key content if use_certificate = true",
"title": "Encoded Oauth Private Key"
}
},
"required": [
"provider",
"authority_url",
"client_id",
"scopes"
],
"title": "OAuthConfiguration",
"type": "object"
},
"OAuthIdentityProvider": {
"enum": [
"microsoft",
"okta"
],
"title": "OAuthIdentityProvider",
"type": "string"
},
"PlatformDetail": {
"additionalProperties": false,
"properties": {
"platform": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Override the platform type detection.",
"title": "Platform"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to",
"title": "Platform Instance"
},
"env": {
"default": "PROD",
"description": "The environment that all assets produced by DataHub platform ingestion source belong to",
"title": "Env",
"type": "string"
},
"database": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database.",
"title": "Database"
},
"include_schema_in_urn": {
"default": true,
"description": "Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector.",
"title": "Include Schema In Urn",
"type": "boolean"
}
},
"title": "PlatformDetail",
"type": "object"
},
"SnowflakeDestinationConfig": {
"additionalProperties": false,
"properties": {
"options": {
"additionalProperties": true,
"description": "Any options specified here will be passed to [SQLAlchemy.create_engine](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine) as kwargs.",
"title": "Options",
"type": "object"
},
"username": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Snowflake username.",
"title": "Username"
},
"password": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Snowflake password.",
"title": "Password"
},
"private_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Private key in a form of '-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\\nencrypted-private-key\\n-----END ENCRYPTED PRIVATE KEY-----\\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html",
"title": "Private Key"
},
"private_key_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The path to the private key if using key pair authentication. Ignored if `private_key` is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html",
"title": "Private Key Path"
},
"private_key_password": {
"anyOf": [
{
"format": "password",
"type": "string",
"writeOnly": true
},
{
"type": "null"
}
],
"default": null,
"description": "Password for your private key. Required if using key pair authentication with encrypted private key.",
"title": "Private Key Password"
},
"oauth_config": {
"anyOf": [
{
"$ref": "#/$defs/OAuthConfiguration"
},
{
"type": "null"
}
],
"default": null,
"description": "oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth"
},
"authentication_type": {
"default": "DEFAULT_AUTHENTICATOR",
"description": "The type of authenticator to use when connecting to Snowflake. Supports \"DEFAULT_AUTHENTICATOR\", \"OAUTH_AUTHENTICATOR\", \"EXTERNAL_BROWSER_AUTHENTICATOR\" and \"KEY_PAIR_AUTHENTICATOR\".",
"title": "Authentication Type",
"type": "string"
},
"account_id": {
"description": "Snowflake account identifier. e.g. xy12345, xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html#format-2-legacy-account-locator-in-a-region) for more details.",
"title": "Account Id",
"type": "string"
},
"warehouse": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Snowflake warehouse.",
"title": "Warehouse"
},
"role": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Snowflake role.",
"title": "Role"
},
"connect_args": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Connect args to pass to Snowflake SqlAlchemy driver",
"title": "Connect Args"
},
"token": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired.",
"title": "Token"
},
"snowflake_domain": {
"default": "snowflakecomputing.com",
"description": "Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region.",
"title": "Snowflake Domain",
"type": "string"
},
"database": {
"description": "The fivetran connector log database.",
"title": "Database",
"type": "string"
},
"log_schema": {
"description": "The fivetran connector log schema.",
"title": "Log Schema",
"type": "string"
}
},
"required": [
"account_id",
"database",
"log_schema"
],
"title": "SnowflakeDestinationConfig",
"type": "object"
},
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Fivetran Stateful Ingestion Config."
},
"fivetran_log_config": {
"$ref": "#/$defs/FivetranLogConfig",
"description": "Fivetran log connector destination server configurations."
},
"connector_patterns": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Filtering regex patterns for connector names."
},
"destination_patterns": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for destination ids to filter in ingestion. Fivetran destination IDs are usually two word identifiers e.g. canyon_tolerable, and are not the same as the destination database name. They're visible in the Fivetran UI under Destinations -> Overview -> Destination Group ID."
},
"include_column_lineage": {
"default": true,
"description": "Populates table->table column lineage.",
"title": "Include Column Lineage",
"type": "boolean"
},
"sources_to_platform_instance": {
"additionalProperties": {
"$ref": "#/$defs/PlatformDetail"
},
"default": {},
"description": "A mapping from connector id to its platform/instance/env/database details.",
"title": "Sources To Platform Instance",
"type": "object"
},
"destination_to_platform_instance": {
"additionalProperties": {
"$ref": "#/$defs/PlatformDetail"
},
"default": {},
"description": "A mapping of destination id to its platform/instance/env details.",
"title": "Destination To Platform Instance",
"type": "object"
},
"history_sync_lookback_period": {
"default": 7,
"description": "The number of days to look back when extracting connectors' sync history.",
"title": "History Sync Lookback Period",
"type": "integer"
}
},
"required": [
"fivetran_log_config"
],
"title": "FivetranSourceConfig",
"type": "object"
}
Code Coordinates
- Class Name:
datahub.ingestion.source.fivetran.fivetran.FivetranSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Fivetran, feel free to ping us on our Slack.