Salesforce
Important Capabilities
Capability | Status | Notes |
---|---|---|
Data Profiling | ✅ | Only table level profiling is supported via profiling.enabled config field. Supported for types - Table. |
Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
Domains | ✅ | Supported via the domain config field. |
Extract Tags | ✅ | Enabled by default. |
Platform Instance | ✅ | Can be equivalent to Salesforce organization. |
Schema Metadata | ✅ | Enabled by default. |
Table-Level Lineage | ✅ | Extract table-level lineage for Salesforce objects. Supported for types - Custom Object, Object. |
Prerequisites
In order to ingest metadata from Salesforce, you will need one of:
- Salesforce username, password, security token
- Salesforce username, consumer key and private key for JSON web token access
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)
The account used to access Salesforce requires the following permissions for this integration to work:
- View Setup and Configuration
- View All Data
Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrive details from Salesforce instance.
REST API Resources used in this integration
- Versions
- Tooling API Query on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- Record Count
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
Salesforce | Data Platform | |
Standard Object | Dataset | subtype "Standard Object" |
Custom Object | Dataset | subtype "Custom Object" |
Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
- This integration does not support ingesting Salesforce External Objects
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"
object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
access_token One of string, null | Access token for instance url Default: None |
api_version One of string, null | If specified, overrides default version used by the Salesforce package. Example value: '59.0' Default: None |
auth Enum | One of: "USERNAME_PASSWORD", "DIRECT_ACCESS_TOKEN", "JSON_WEB_TOKEN" |
consumer_key One of string, null | Consumer key for Salesforce JSON web token access Default: None |
ingest_tags boolean | Ingest Tags from source. This will override Tags entered from UI Default: False |
instance_url One of string, null | Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com Default: None |
is_sandbox boolean | Connect to Sandbox instance of your Salesforce Default: False |
password One of string, null | Password for Salesforce user Default: None |
platform string | Default: salesforce |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
private_key One of string, null | Private key as a string for Salesforce JSON web token access Default: None |
security_token One of string, null | Security token for Salesforce username Default: None |
use_referenced_entities_as_upstreams boolean | (Experimental) If enabled, referenced entities will be treated as upstream entities. Default: False |
username One of string, null | Salesforce username Default: None |
env string | The environment that all assets produced by this connector belong to Default: PROD |
domain map(str,AllowDenyPattern) | A class to store allow deny regexes |
domain. key .allowarray | List of regex patterns to include in ingestion Default: ['.*'] |
domain. key .allow.stringstring | |
domain. key .ignoreCaseOne of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
domain. key .denyarray | List of regex patterns to exclude from ingestion. Default: [] |
domain. key .deny.stringstring | |
object_pattern AllowDenyPattern | A class to store allow deny regexes |
object_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
profile_pattern AllowDenyPattern | A class to store allow deny regexes |
profile_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
profiling SalesforceProfilingConfig | |
profiling.enabled boolean | Whether profiling should be done. Supports only table-level profiling at this stage Default: False |
profiling.operation_config OperationConfig | |
profiling.operation_config.lower_freq_profile_enabled boolean | Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling. Default: False |
profiling.operation_config.profile_date_of_month One of integer, null | Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect. Default: None |
profiling.operation_config.profile_day_of_week One of integer, null | Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect. Default: None |
stateful_ingestion One of StatefulIngestionConfig, null | Stateful Ingestion Config Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"OperationConfig": {
"additionalProperties": false,
"properties": {
"lower_freq_profile_enabled": {
"default": false,
"description": "Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.",
"title": "Lower Freq Profile Enabled",
"type": "boolean"
},
"profile_day_of_week": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.",
"title": "Profile Day Of Week"
},
"profile_date_of_month": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.",
"title": "Profile Date Of Month"
}
},
"title": "OperationConfig",
"type": "object"
},
"SalesforceAuthType": {
"enum": [
"USERNAME_PASSWORD",
"DIRECT_ACCESS_TOKEN",
"JSON_WEB_TOKEN"
],
"title": "SalesforceAuthType",
"type": "string"
},
"SalesforceProfilingConfig": {
"additionalProperties": false,
"properties": {
"enabled": {
"default": false,
"description": "Whether profiling should be done. Supports only table-level profiling at this stage",
"title": "Enabled",
"type": "boolean"
},
"operation_config": {
"$ref": "#/$defs/OperationConfig",
"description": "Experimental feature. To specify operation configs."
}
},
"title": "SalesforceProfilingConfig",
"type": "object"
},
"StatefulIngestionConfig": {
"additionalProperties": false,
"description": "Basic Stateful Ingestion Specific Configuration for any source.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
}
},
"title": "StatefulIngestionConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulIngestionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Stateful Ingestion Config"
},
"platform": {
"default": "salesforce",
"title": "Platform",
"type": "string"
},
"auth": {
"$ref": "#/$defs/SalesforceAuthType",
"default": "USERNAME_PASSWORD"
},
"username": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Salesforce username",
"title": "Username"
},
"password": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Password for Salesforce user",
"title": "Password"
},
"consumer_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Consumer key for Salesforce JSON web token access",
"title": "Consumer Key"
},
"private_key": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Private key as a string for Salesforce JSON web token access",
"title": "Private Key"
},
"security_token": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Security token for Salesforce username",
"title": "Security Token"
},
"instance_url": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com",
"title": "Instance Url"
},
"is_sandbox": {
"default": false,
"description": "Connect to Sandbox instance of your Salesforce",
"title": "Is Sandbox",
"type": "boolean"
},
"access_token": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Access token for instance url",
"title": "Access Token"
},
"ingest_tags": {
"default": false,
"description": "Ingest Tags from source. This will override Tags entered from UI",
"title": "Ingest Tags",
"type": "boolean"
},
"object_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for Salesforce objects to filter in ingestion."
},
"domain": {
"additionalProperties": {
"$ref": "#/$defs/AllowDenyPattern"
},
"default": {},
"description": "Regex patterns for tables/schemas to describe domain_key domain key (domain_key can be any string like \"sales\".) There can be multiple domain keys specified.",
"title": "Domain",
"type": "object"
},
"api_version": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "If specified, overrides default version used by the Salesforce package. Example value: '59.0'",
"title": "Api Version"
},
"profiling": {
"$ref": "#/$defs/SalesforceProfilingConfig",
"default": {
"enabled": false,
"operation_config": {
"lower_freq_profile_enabled": false,
"profile_date_of_month": null,
"profile_day_of_week": null
}
}
},
"profile_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex patterns for profiles to filter in ingestion, allowed by the `object_pattern`."
},
"use_referenced_entities_as_upstreams": {
"default": false,
"description": "(Experimental) If enabled, referenced entities will be treated as upstream entities.",
"title": "Use Referenced Entities As Upstreams",
"type": "boolean"
}
},
"title": "SalesforceConfig",
"type": "object"
}
Code Coordinates
- Class Name:
datahub.ingestion.source.salesforce.SalesforceSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.