Skip to main content
Version: Next

Fivetran

Incubating

Important Capabilities

CapabilityStatusNotes
Column-level LineageEnabled by default, can be disabled via configuration include_column_lineage.
Detect Deleted EntitiesEnabled by default via stateful ingestion.
Platform InstanceEnabled by default.

This plugin extracts fivetran users, connectors, destinations and sync history.

Integration Details

This source extracts the following:

  • Connectors in fivetran as Data Pipelines and Data Jobs to represent data lineage information between source and destination.
  • Connector sources - DataJob input Datasets.
  • Connector destination - DataJob output Datasets.
  • Connector runs - DataProcessInstances as DataJob runs.

Configuration Notes

  1. Fivetran supports the fivetran platform connector to dump the log events and connectors, destinations, users and roles metadata in your destination.
  2. You need to setup and start the initial sync of the fivetran platform connector before using this source. Refer link.
  3. Once initial sync up of your fivetran platform connector is done, you need to provide the fivetran platform connector's destination platform and its configuration in the recipe.
  4. We expect our users to enable automatic schema updates (default) in fivetran platform connector configured for DataHub, this ensures latest schema changes are applied and avoids inconsistency data syncs.

Concept mapping

FivetranDatahub
ConnectorDataJob
SourceDataset
DestinationDataset
Connector RunDataProcessInstance

Source and destination are mapped to Dataset as an Input and Output of Connector.

Current limitations

Works only for

  • Snowflake destination
  • Bigquery destination
  • Databricks destination

Snowflake destination Configuration Guide

  1. If your fivetran platform connector destination is snowflake, you need to provide user details and its role with correct privileges in order to fetch metadata.
  2. Snowflake system admin can follow this guide to create a fivetran_datahub role, assign it the required privileges, and assign it to a user by executing the following Snowflake commands from a user with the ACCOUNTADMIN role or MANAGE GRANTS privilege.
create or replace role fivetran_datahub;

// Grant access to a warehouse to run queries to view metadata
grant operate, usage on warehouse "<your-warehouse>" to role fivetran_datahub;

// Grant access to view database and schema in which your log and metadata tables exist
grant usage on DATABASE "<fivetran-log-database>" to role fivetran_datahub;
grant usage on SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;

// Grant access to execute select query on schema in which your log and metadata tables exist
grant select on all tables in SCHEMA "<fivetran-log-database>"."<fivetran-log-schema>" to role fivetran_datahub;

// Grant the fivetran_datahub to the snowflake user.
grant role fivetran_datahub to user snowflake_user;

Bigquery destination Configuration Guide

  1. If your fivetran platform connector destination is bigquery, you need to setup a ServiceAccount as per BigQuery docs and select BigQuery Data Viewer and BigQuery Job User IAM roles.
  2. Create and Download a service account JSON keyfile and provide bigquery connection credential in bigquery destination config.

Databricks destination Configuration Guide

  1. Get your Databricks instance's workspace url
  2. Create a Databricks Service Principal
    1. You can skip this step and use your own account to get things running quickly, but we strongly recommend creating a dedicated service principal for production use.
  3. Generate a Databricks Personal Access token following the following guides:
    1. Service Principals
    2. Personal Access Tokens
  4. Provision your service account, to ingest your workspace's metadata and lineage, your service principal must have all of the following:
    1. One of: metastore admin role, ownership of, or USE CATALOG privilege on any catalogs you want to ingest
    2. One of: metastore admin role, ownership of, or USE SCHEMA privilege on any schemas you want to ingest
    3. Ownership of or SELECT privilege on any tables and views you want to ingest
    4. Ownership documentation
    5. Privileges documentation
  5. Check the starter recipe below and replace workspace_url and token with your information from the previous steps.

Advanced Configurations

Working with Platform Instances

If you have multiple instances of source/destination systems that are referred in your fivetran setup, you'd need to configure platform instance for these systems in fivetran recipe to generate correct lineage edges. Refer the document Working with Platform Instances to understand more about this.

While configuring the platform instance for source system you need to provide connector id as key and for destination system provide destination id as key. When creating the connection details in the fivetran UI make a note of the destination Group ID of the service account, as that will need to be used in the destination_to_platform_instance configuration. I.e:

In this case the configuration would be something like:

destination_to_platform_instance:
greyish_positive: <--- this comes from bigquery destination - see screenshot
database: <big query project ID>
env: PROD

Example - Multiple Postgres Source Connectors each reading from different postgres instance

# Map of connector source to platform instance
sources_to_platform_instance:
postgres_connector_id1:
platform_instance: cloud_postgres_instance
env: PROD

postgres_connector_id2:
platform_instance: local_postgres_instance
env: DEV

Example - Multiple Snowflake Destinations each writing to different snowflake instance

# Map of destination to platform instance
destination_to_platform_instance:
snowflake_destination_id1:
platform_instance: prod_snowflake_instance
env: PROD

snowflake_destination_id2:
platform_instance: dev_snowflake_instance
env: PROD

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: fivetran
config:
# Fivetran log connector destination server configurations
fivetran_log_config:
destination_platform: snowflake
# Optional - If destination platform is 'snowflake', provide snowflake configuration.
snowflake_destination_config:
# Coordinates
account_id: "abc48144"
warehouse: "COMPUTE_WH"
database: "MY_SNOWFLAKE_DB"
log_schema: "FIVETRAN_LOG"

# Credentials
username: "${SNOWFLAKE_USER}"
password: "${SNOWFLAKE_PASS}"
role: "snowflake_role"
# Optional - If destination platform is 'bigquery', provide bigquery configuration.
bigquery_destination_config:
# Credentials
credential:
private_key_id: "project_key_id"
project_id: "project_id"
client_email: "client_email"
client_id: "client_id"
private_key: "private_key"
dataset: "fivetran_log_dataset"
# Optional - If destination platform is 'databricks', provide databricks configuration.
databricks_destination_config:
# Credentials
credential:
token: "token"
workspace_url: "workspace_url"
warehouse_id: "warehouse_id"

# Coordinates
catalog: "fivetran_catalog"
log_schema: "fivetran_log"

# Optional - filter for certain connector names instead of ingesting everything.
# connector_patterns:
# allow:
# - connector_name

# Optional -- A mapping of the connector's all sources to its database.
# sources_to_database:
# connector_id: source_db

# Optional -- This mapping is optional and only required to configure platform-instance for source
# A mapping of Fivetran connector id to data platform instance
# sources_to_platform_instance:
# connector_id:
# platform_instance: cloud_instance
# env: DEV

# Optional -- This mapping is optional and only required to configure platform-instance for destination.
# A mapping of Fivetran destination id to data platform instance
# destination_to_platform_instance:
# destination_id:
# platform_instance: cloud_instance
# env: DEV

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
fivetran_log_config 
FivetranLogConfig
fivetran_log_config.destination_platform
Enum
One of: "snowflake", "bigquery", "databricks"
Default: snowflake
fivetran_log_config.bigquery_destination_config
One of BigQueryDestinationConfig, null
If destination platform is 'bigquery', provide bigquery configuration.
Default: None
fivetran_log_config.bigquery_destination_config.dataset 
string
The fivetran connector log dataset.
fivetran_log_config.bigquery_destination_config.extra_client_options
object
Additional options to pass to google.cloud.logging_v2.client.Client.
Default: {}
fivetran_log_config.bigquery_destination_config.project_on_behalf
One of string, null
[Advanced] The BigQuery project in which queries are executed. Will be passed when creating a job. If not passed, falls back to the project associated with the service account.
Default: None
fivetran_log_config.bigquery_destination_config.credential
One of GCPCredential, null
BigQuery credential informations
Default: None
fivetran_log_config.bigquery_destination_config.credential.client_email 
string
Client email
fivetran_log_config.bigquery_destination_config.credential.client_id 
string
Client Id
fivetran_log_config.bigquery_destination_config.credential.private_key 
string
Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n'
fivetran_log_config.bigquery_destination_config.credential.private_key_id 
string
Private key id
fivetran_log_config.bigquery_destination_config.credential.auth_provider_x509_cert_url
string
Auth provider x509 certificate url
fivetran_log_config.bigquery_destination_config.credential.auth_uri
string
Authentication uri
fivetran_log_config.bigquery_destination_config.credential.client_x509_cert_url
One of string, null
If not set it will be default to https://www.googleapis.com/robot/v1/metadata/x509/client_email
Default: None
fivetran_log_config.bigquery_destination_config.credential.project_id
One of string, null
Project id to set the credentials
Default: None
fivetran_log_config.bigquery_destination_config.credential.token_uri
string
Token uri
fivetran_log_config.bigquery_destination_config.credential.type
string
Authentication type
Default: service_account
fivetran_log_config.databricks_destination_config
One of DatabricksDestinationConfig, null
If destination platform is 'databricks', provide databricks configuration.
Default: None
fivetran_log_config.databricks_destination_config.catalog 
string
The fivetran connector log catalog.
fivetran_log_config.databricks_destination_config.log_schema 
string
The fivetran connector log schema.
fivetran_log_config.databricks_destination_config.token 
string
Databricks personal access token
fivetran_log_config.databricks_destination_config.workspace_url 
string
Databricks workspace url. e.g. https://my-workspace.cloud.databricks.com
fivetran_log_config.databricks_destination_config.extra_client_options
object
Additional options to pass to Databricks SQLAlchemy client.
Default: {}
fivetran_log_config.databricks_destination_config.scheme
string
Default: databricks
fivetran_log_config.databricks_destination_config.warehouse_id
One of string, null
SQL Warehouse id, for running queries. Must be explicitly provided to enable SQL-based features. Required for the following features that need SQL access: 1) Tag extraction (include_tags=True) - queries system.information_schema.tags 2) Hive Metastore catalog (include_hive_metastore=True) - queries legacy hive_metastore catalog 3) System table lineage (lineage_data_source=SYSTEM_TABLES) - queries system.access.table_lineage/column_lineage 4) Data profiling (profiling.enabled=True) - runs SELECT/ANALYZE queries on tables. When warehouse_id is missing, these features will be automatically disabled (with warnings) to allow ingestion to continue.
Default: None
fivetran_log_config.snowflake_destination_config
One of SnowflakeDestinationConfig, null
If destination platform is 'snowflake', provide snowflake configuration.
Default: None
fivetran_log_config.snowflake_destination_config.account_id 
string
Snowflake account identifier. e.g. xy12345, xy12345.us-east-2.aws, xy12345.us-central1.gcp, xy12345.central-us.azure, xy12345.us-west-2.privatelink. Refer Account Identifiers for more details.
fivetran_log_config.snowflake_destination_config.database 
string
The fivetran connector log database.
fivetran_log_config.snowflake_destination_config.log_schema 
string
The fivetran connector log schema.
fivetran_log_config.snowflake_destination_config.authentication_type
string
The type of authenticator to use when connecting to Snowflake. Supports "DEFAULT_AUTHENTICATOR", "OAUTH_AUTHENTICATOR", "EXTERNAL_BROWSER_AUTHENTICATOR" and "KEY_PAIR_AUTHENTICATOR".
Default: DEFAULT_AUTHENTICATOR
fivetran_log_config.snowflake_destination_config.connect_args
One of object, null
Connect args to pass to Snowflake SqlAlchemy driver
Default: None
fivetran_log_config.snowflake_destination_config.options
object
Any options specified here will be passed to SQLAlchemy.create_engine as kwargs.
fivetran_log_config.snowflake_destination_config.password
One of string(password), null
Snowflake password.
Default: None
fivetran_log_config.snowflake_destination_config.private_key
One of string, null
Private key in a form of '-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n' if using key pair authentication. Encrypted version of private key will be in a form of '-----BEGIN ENCRYPTED PRIVATE KEY-----\nencrypted-private-key\n-----END ENCRYPTED PRIVATE KEY-----\n' See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html
Default: None
fivetran_log_config.snowflake_destination_config.private_key_password
One of string(password), null
Password for your private key. Required if using key pair authentication with encrypted private key.
Default: None
fivetran_log_config.snowflake_destination_config.private_key_path
One of string, null
The path to the private key if using key pair authentication. Ignored if private_key is set. See: https://docs.snowflake.com/en/user-guide/key-pair-auth.html
Default: None
fivetran_log_config.snowflake_destination_config.role
One of string, null
Snowflake role.
Default: None
fivetran_log_config.snowflake_destination_config.snowflake_domain
string
Snowflake domain. Use 'snowflakecomputing.com' for most regions or 'snowflakecomputing.cn' for China (cn-northwest-1) region.
Default: snowflakecomputing.com
fivetran_log_config.snowflake_destination_config.token
One of string, null
OAuth token from external identity provider. Not recommended for most use cases because it will not be able to refresh once expired.
Default: None
fivetran_log_config.snowflake_destination_config.username
One of string, null
Snowflake username.
Default: None
fivetran_log_config.snowflake_destination_config.warehouse
One of string, null
Snowflake warehouse.
Default: None
fivetran_log_config.snowflake_destination_config.oauth_config
One of OAuthConfiguration, null
oauth configuration - https://docs.snowflake.com/en/user-guide/python-connector-example.html#connecting-with-oauth
Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.authority_url 
string
Authority url of your identity provider
fivetran_log_config.snowflake_destination_config.oauth_config.client_id 
string
client id of your registered application
fivetran_log_config.snowflake_destination_config.oauth_config.provider 
Enum
One of: "microsoft", "okta"
fivetran_log_config.snowflake_destination_config.oauth_config.scopes 
array
scopes required to connect to snowflake
fivetran_log_config.snowflake_destination_config.oauth_config.scopes.string
string
fivetran_log_config.snowflake_destination_config.oauth_config.client_secret
One of string(password), null
client secret of the application if use_certificate = false
Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.encoded_oauth_private_key
One of string, null
base64 encoded private key content if use_certificate = true
Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.encoded_oauth_public_key
One of string, null
base64 encoded certificate content if use_certificate = true
Default: None
fivetran_log_config.snowflake_destination_config.oauth_config.use_certificate
boolean
Do you want to use certificate and private key to authenticate using oauth
Default: False
history_sync_lookback_period
integer
The number of days to look back when extracting connectors' sync history.
Default: 7
include_column_lineage
boolean
Populates table->table column lineage.
Default: True
platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.
Default: None
env
string
The environment that all assets produced by this connector belong to
Default: PROD
connector_patterns
AllowDenyPattern
A class to store allow deny regexes
connector_patterns.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
connector_patterns.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
connector_patterns.allow.string
string
connector_patterns.deny
array
List of regex patterns to exclude from ingestion.
Default: []
connector_patterns.deny.string
string
destination_patterns
AllowDenyPattern
A class to store allow deny regexes
destination_patterns.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
destination_patterns.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
destination_patterns.allow.string
string
destination_patterns.deny
array
List of regex patterns to exclude from ingestion.
Default: []
destination_patterns.deny.string
string
destination_to_platform_instance
map(str,PlatformDetail)
destination_to_platform_instance.key.platform
One of string, null
Override the platform type detection.
Default: None
destination_to_platform_instance.key.database
One of string, null
The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database.
Default: None
destination_to_platform_instance.key.include_schema_in_urn
boolean
Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector.
Default: True
destination_to_platform_instance.key.platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to
Default: None
destination_to_platform_instance.key.env
string
The environment that all assets produced by DataHub platform ingestion source belong to
Default: PROD
sources_to_platform_instance
map(str,PlatformDetail)
sources_to_platform_instance.key.platform
One of string, null
Override the platform type detection.
Default: None
sources_to_platform_instance.key.database
One of string, null
The database that all assets produced by this connector belong to. For destinations, this defaults to the fivetran log config's database.
Default: None
sources_to_platform_instance.key.include_schema_in_urn
boolean
Include schema in the dataset URN. In some cases, the schema is not relevant to the dataset URN and Fivetran sets it to the source and destination table names in the connector.
Default: True
sources_to_platform_instance.key.platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to
Default: None
sources_to_platform_instance.key.env
string
The environment that all assets produced by DataHub platform ingestion source belong to
Default: PROD
stateful_ingestion
One of StatefulStaleMetadataRemovalConfig, null
Fivetran Stateful Ingestion Config.
Default: None
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.fail_safe_threshold
number
Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.
Default: 75.0
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.fivetran.fivetran.FivetranSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Fivetran, feel free to ping us on our Slack.