Skip to main content
Version: Next

Salesforce

Incubating

Important Capabilities

CapabilityStatusNotes
Data ProfilingOnly table level profiling is supported via profiling.enabled config field. Supported for types - Table.
Detect Deleted EntitiesEnabled by default via stateful ingestion.
DomainsSupported via the domain config field.
Extract TagsEnabled by default.
Platform InstanceCan be equivalent to Salesforce organization.
Schema MetadataEnabled by default.
Table-Level LineageExtract table-level lineage for Salesforce objects. Supported for types - Custom Object, Object.

Prerequisites

In order to ingest metadata from Salesforce, you will need one of:

  • Salesforce username, password, security token
  • Salesforce username, consumer key and private key for JSON web token access
  • Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)

The account used to access Salesforce requires the following permissions for this integration to work:

  • View Setup and Configuration
  • View All Data

Integration Details

This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrive details from Salesforce instance.

REST API Resources used in this integration

Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
SalesforceData Platform
Standard ObjectDatasetsubtype "Standard Object"
Custom ObjectDatasetsubtype "Custom Object"

Caveats

  • This connector has only been tested with Salesforce Developer Edition.
  • This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
  • This integration does not support ingesting Salesforce External Objects

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"

object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
access_token
One of string, null
Access token for instance url
Default: None
api_version
One of string, null
If specified, overrides default version used by the Salesforce package. Example value: '59.0'
Default: None
auth
Enum
One of: "USERNAME_PASSWORD", "DIRECT_ACCESS_TOKEN", "JSON_WEB_TOKEN"
consumer_key
One of string, null
Consumer key for Salesforce JSON web token access
Default: None
ingest_tags
boolean
Ingest Tags from source. This will override Tags entered from UI
Default: False
instance_url
One of string, null
Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com
Default: None
is_sandbox
boolean
Connect to Sandbox instance of your Salesforce
Default: False
password
One of string, null
Password for Salesforce user
Default: None
platform
string
Default: salesforce
platform_instance
One of string, null
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.
Default: None
private_key
One of string, null
Private key as a string for Salesforce JSON web token access
Default: None
security_token
One of string, null
Security token for Salesforce username
Default: None
use_referenced_entities_as_upstreams
boolean
(Experimental) If enabled, referenced entities will be treated as upstream entities.
Default: False
username
One of string, null
Salesforce username
Default: None
env
string
The environment that all assets produced by this connector belong to
Default: PROD
domain
map(str,AllowDenyPattern)
A class to store allow deny regexes
domain.key.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
domain.key.allow.string
string
domain.key.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
domain.key.deny
array
List of regex patterns to exclude from ingestion.
Default: []
domain.key.deny.string
string
object_pattern
AllowDenyPattern
A class to store allow deny regexes
object_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
profile_pattern
AllowDenyPattern
A class to store allow deny regexes
profile_pattern.ignoreCase
One of boolean, null
Whether to ignore case sensitivity during pattern matching.
Default: True
profiling
SalesforceProfilingConfig
profiling.enabled
boolean
Whether profiling should be done. Supports only table-level profiling at this stage
Default: False
profiling.operation_config
OperationConfig
profiling.operation_config.lower_freq_profile_enabled
boolean
Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.
Default: False
profiling.operation_config.profile_date_of_month
One of integer, null
Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.
Default: None
profiling.operation_config.profile_day_of_week
One of integer, null
Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.
Default: None
stateful_ingestion
One of StatefulIngestionConfig, null
Stateful Ingestion Config
Default: None
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False

Code Coordinates

  • Class Name: datahub.ingestion.source.salesforce.SalesforceSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.