Hex
This connector ingests Hex assets into DataHub.
Concept Mapping
Hex Concept | DataHub Concept | Notes |
---|---|---|
"hex" | Data Platform | |
Workspace | Container | |
Project | Dashboard | Subtype Project |
Component | Dashboard | Subtype Component |
Collection | Tag |
Other Hex concepts are not mapped to DataHub entities yet.
Limitations
Currently, the Hex API has some limitations that affect the completeness of the extracted metadata:
Projects and Components Relationship: The API does not support fetching the many-to-many relationship between Projects and their Components.
Metadata Access: There is no direct method to retrieve metadata for Collections, Status, or Categories. This information is only available indirectly through references within Projects and Components.
Please keep these limitations in mind when working with the Hex connector.
For the Dataset - Hex Project lineage, the connector relies on the Hex query metadata feature. Therefore, in order to extract lineage information, the required setup must include:
- A separated warehouse ingestor (eg BigQuery, Snowflake, Redshift, ...) with
use_queries_v2
enabled in order to fetch Queries. This will ingest the queries into DataHub asQuery
entities and the ones triggered by Hex will include the corresponding Hex query metadata. - A DataHub server with version >= SaaS
0.3.10
or > OSS1.0.0
so theQuery
entities are properly indexed by source (Hex in this case) and so fetched and processed by the Hex ingestor in order to emit the Dataset - Project lineage.
Please note:
- Lineage is only captured for scheduled executions of the Project.
- In cases where queries are handled by
hextoolkit
, Hex query metadata is not injected, which prevents capturing lineage.
Important Capabilities
Capability | Status | Notes |
---|---|---|
Asset Containers | ✅ | Enabled by default. |
Dataset Usage | ✅ | Supported by default. Supported for types - Project. |
Descriptions | ✅ | Supported by default. |
Detect Deleted Entities | ✅ | Enabled by default via stateful ingestion. |
Extract Ownership | ✅ | Supported by default. |
Platform Instance | ✅ | Enabled by default. |
Prerequisites
Workspace name
Workspace name is required to fetch the data from Hex. You can find the workspace name in the URL of your Hex home page.
https://app.hex.tech/<workspace_name>"
Eg: In https://app.hex.tech/acryl-partnership, acryl-partnership
is the workspace name.
Authentication
To authenticate with Hex, you will need to provide your Hex API Bearer token. You can obtain your API key by following the instructions on the Hex documentation.
Either PAT (Personal Access Token) or Workspace Token can be used as API Bearer token:
- (Recommended) If Workspace Token, a read-only token would be enough for ingestion.
- If PAT, ingestion will be done with the user's permissions.
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: hex
config:
workspace_name: # Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>
token: # Your PAT or Workspace token
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
token ✅ string(password) | Hex API token; either PAT or Workflow token - https://learn.hex.tech/docs/api/api-overview#authentication |
workspace_name ✅ string | Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name> |
base_url string | Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex. Default: https://app.hex.tech/api/v1 |
categories_as_tags boolean | Emit Hex Category as tags Default: True |
collections_as_tags boolean | Emit Hex Collections as tags Default: True |
datahub_page_size integer | Number of items to fetch per DataHub API call. Default: 100 |
include_components boolean | Include Hex Components in the ingestion Default: True |
include_lineage boolean | Include Hex lineage, being fetched from DataHub. See "Limitations" section in the docs for more details about the limitations of this feature. Default: True |
lineage_end_time One of string(date-time), null | Latest date of lineage to consider. Default: Current time in UTC. You can specify absolute time like '2023-01-01' or relative time like '-1 day' or '-1d'. Default: None |
lineage_start_time One of string(date-time), null | Earliest date of lineage to consider. Default: 1 day before lineage end time. You can specify absolute time like '2023-01-01' or relative time like '-7 days' or '-7d'. Default: None |
page_size integer | Number of items to fetch per Hex API call. Default: 100 |
patch_metadata boolean | Emit metadata as patch events Default: False |
platform_instance One of string, null | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. Default: None |
set_ownership_from_email boolean | Set ownership identity from owner/creator email Default: True |
status_as_tag boolean | Emit Hex Status as tags Default: True |
env string | The environment that all assets produced by this connector belong to Default: PROD |
component_title_pattern AllowDenyPattern | A class to store allow deny regexes |
component_title_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
project_title_pattern AllowDenyPattern | A class to store allow deny regexes |
project_title_pattern.ignoreCase One of boolean, null | Whether to ignore case sensitivity during pattern matching. Default: True |
stateful_ingestion One of StatefulStaleMetadataRemovalConfig, null | Configuration for stateful ingestion and stale metadata removal. Default: None |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"$defs": {
"AllowDenyPattern": {
"additionalProperties": false,
"description": "A class to store allow deny regexes",
"properties": {
"allow": {
"default": [
".*"
],
"description": "List of regex patterns to include in ingestion",
"items": {
"type": "string"
},
"title": "Allow",
"type": "array"
},
"deny": {
"default": [],
"description": "List of regex patterns to exclude from ingestion.",
"items": {
"type": "string"
},
"title": "Deny",
"type": "array"
},
"ignoreCase": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Whether to ignore case sensitivity during pattern matching.",
"title": "Ignorecase"
}
},
"title": "AllowDenyPattern",
"type": "object"
},
"StatefulStaleMetadataRemovalConfig": {
"additionalProperties": false,
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"properties": {
"enabled": {
"default": false,
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"title": "Enabled",
"type": "boolean"
},
"remove_stale_metadata": {
"default": true,
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"title": "Remove Stale Metadata",
"type": "boolean"
},
"fail_safe_threshold": {
"default": 75.0,
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"maximum": 100.0,
"minimum": 0.0,
"title": "Fail Safe Threshold",
"type": "number"
}
},
"title": "StatefulStaleMetadataRemovalConfig",
"type": "object"
}
},
"additionalProperties": false,
"properties": {
"env": {
"default": "PROD",
"description": "The environment that all assets produced by this connector belong to",
"title": "Env",
"type": "string"
},
"platform_instance": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
"title": "Platform Instance"
},
"stateful_ingestion": {
"anyOf": [
{
"$ref": "#/$defs/StatefulStaleMetadataRemovalConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Configuration for stateful ingestion and stale metadata removal."
},
"workspace_name": {
"description": "Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>",
"title": "Workspace Name",
"type": "string"
},
"token": {
"description": "Hex API token; either PAT or Workflow token - https://learn.hex.tech/docs/api/api-overview#authentication",
"format": "password",
"title": "Token",
"type": "string",
"writeOnly": true
},
"base_url": {
"default": "https://app.hex.tech/api/v1",
"description": "Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex.",
"title": "Base Url",
"type": "string"
},
"include_components": {
"default": true,
"description": "Include Hex Components in the ingestion",
"title": "Include Components",
"type": "boolean"
},
"page_size": {
"default": 100,
"description": "Number of items to fetch per Hex API call.",
"title": "Page Size",
"type": "integer"
},
"patch_metadata": {
"default": false,
"description": "Emit metadata as patch events",
"title": "Patch Metadata",
"type": "boolean"
},
"collections_as_tags": {
"default": true,
"description": "Emit Hex Collections as tags",
"title": "Collections As Tags",
"type": "boolean"
},
"status_as_tag": {
"default": true,
"description": "Emit Hex Status as tags",
"title": "Status As Tag",
"type": "boolean"
},
"categories_as_tags": {
"default": true,
"description": "Emit Hex Category as tags",
"title": "Categories As Tags",
"type": "boolean"
},
"project_title_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex pattern for project titles to filter in ingestion."
},
"component_title_pattern": {
"$ref": "#/$defs/AllowDenyPattern",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"description": "Regex pattern for component titles to filter in ingestion."
},
"set_ownership_from_email": {
"default": true,
"description": "Set ownership identity from owner/creator email",
"title": "Set Ownership From Email",
"type": "boolean"
},
"include_lineage": {
"default": true,
"description": "Include Hex lineage, being fetched from DataHub. See \"Limitations\" section in the docs for more details about the limitations of this feature.",
"title": "Include Lineage",
"type": "boolean"
},
"lineage_start_time": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Earliest date of lineage to consider. Default: 1 day before lineage end time. You can specify absolute time like '2023-01-01' or relative time like '-7 days' or '-7d'.",
"title": "Lineage Start Time"
},
"lineage_end_time": {
"anyOf": [
{
"format": "date-time",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Latest date of lineage to consider. Default: Current time in UTC. You can specify absolute time like '2023-01-01' or relative time like '-1 day' or '-1d'.",
"title": "Lineage End Time"
},
"datahub_page_size": {
"default": 100,
"description": "Number of items to fetch per DataHub API call.",
"title": "Datahub Page Size",
"type": "integer"
}
},
"required": [
"workspace_name",
"token"
],
"title": "HexSourceConfig",
"type": "object"
}
Code Coordinates
- Class Name:
datahub.ingestion.source.hex.hex.HexSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Hex, feel free to ping us on our Slack.