Skip to main content
Version: Next

Environment Variables

The following is a summary of a few important environment variables which expose various levers which control how DataHub works.


DataHub Java Components

This includes GMS, System Update, MAE/MCE Consumers.

Authentication & Authorization

Reference Links:

Authentication Configuration

Environment VariableDefaultDescriptionComponents
METADATA_SERVICE_AUTH_ENABLEDtrueEnable if you want all requests to the Metadata Service to be authenticatedGMS, MAE Consumer, MCE Consumer, PE Consumer, Frontend
DATAHUB_SYSTEM_CLIENT_SECRETSystem client secret used by AuthServiceControllerGMS, MAE Consumer, MCE Consumer, PE Consumer, Actions, Frontend
METADATA_SERVICE_AUTHENTICATOR_EXCEPTIONS_ENABLEDfalseNormally failures are only warnings, enable this to throw themGMS
DATAHUB_TOKEN_SERVICE_SIGNING_KEYKey used to validate incoming tokens and sign new tokensGMS
DATAHUB_TOKEN_SERVICE_SALTSalt used for token validation and signingGMS
DATAHUB_TOKEN_SERVICE_SIGNING_ALGORITHMHS256Signing algorithm for DataHub tokensGMS
SESSION_TOKEN_DURATION_MS86400000The max duration of a UI session in milliseconds (defaults to 1 day)GMS
GUEST_AUTHENTICATION_USERguestGuest user for unauthenticated accessGMS
GUEST_AUTHENTICATION_ENABLEDfalseEnable guest authenticationGMS

Authorization Configuration

Environment VariableDefaultDescriptionComponents
AUTH_POLICIES_ENABLEDtrueEnable the default DataHub policies-based authorizerGMS
POLICY_CACHE_REFRESH_INTERVAL_SECONDS120Cache refresh interval for policies in secondsGMS
POLICY_CACHE_FETCH_SIZE1000Cache policy fetch sizeGMS
REST_API_AUTHORIZATION_ENABLEDtrueEnable authorization of reads, writes, and deletes on REST APIsGMS
VIEW_AUTHORIZATION_ENABLEDfalseControls whether entity pages can limit access based on policiesGMS
VIEW_AUTHORIZATION_RECOMMENDATIONS_PEER_GROUP_ENABLEDtrueEnable peer group recommendations for view authorizationGMS

Ingestion Configuration

Reference Links:

Environment VariableDefaultDescriptionComponents
UI_INGESTION_ENABLEDtrueEnable UI-based ingestionGMS, MAE Consumer
INGESTION_BATCH_REFRESH_COUNT100Number of entities to refresh in a single batch when refreshing entities after ingestionGMS
INGESTION_SOURCE_REFRESH_INTERVAL_SECONDS43200Interval at which the ingestion source scheduler will check for new or updated ingestion sourcesGMS

Telemetry & Analytics

Environment VariableDefaultDescriptionComponents
INGESTION_REPORTING_ENABLEDfalseEnable ingestion reportingGMS
ENABLE_THIRD_PARTY_LOGGINGfalseWhether mixpanel tracking is enabledGMS

DataHub Core Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_SERVER_TYPEprodDataHub server typeGMS
DATAHUB_GMS_ASYNC_REQUEST_TIMEOUT_MS55000Async request timeout for GMSGMS
DATAHUB_GMS_HOSTlocalhostGMS hostFrontend
DATAHUB_GMS_PORT8080GMS portFrontend
DATAHUB_GMS_USE_SSLfalseUse SSL for GMS connectionsFrontend
DATAHUB_GMS_URInullURI instead of separate host/port/ssl parameters (takes priority)Frontend
DATAHUB_GMS_SSL_PROTOCOLnullSSL protocol for GMSFrontend

Plugin Configuration

Environment VariableDefaultDescriptionComponents
PLUGIN_SECURITY_MODERESTRICTEDPlugin security mode (RESTRICTED or LENIENT)GMS
ENTITY_REGISTRY_PLUGIN_PATH/etc/datahub/plugins/modelsPath for entity registry pluginsGMS
ENTITY_REGISTRY_PLUGIN_LOAD_DELAY_SECONDS60Rate at which plugin runnable executesGMS
IGNORE_FAILURE_WHEN_LOADING_ENTITY_REGISTRY_PLUGINtrueWhether to ignore failure when loading entity registryGMS
RETENTION_PLUGIN_PATH/etc/datahub/plugins/retentionPath for retention pluginsGMS
AUTH_PLUGIN_PATH/etc/datahub/plugins/authPath for auth pluginsGMS

Metrics Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_METRICS_HOOK_LATENCY_PERCENTILES0.5,0.95,0.99,0.999Hook latency percentilesGMS, MAE Consumer
DATAHUB_METRICS_HOOK_LATENCY_SERVICE_LEVEL_OBJECTIVES300,1800,3000,10800,21600,43200Hook latency SLOs in secondsGMS, MAE Consumer
DATAHUB_METRICS_HOOK_LATENCY_MAX_EXPECTED_VALUE86000Maximum expected hook latency value in secondsGMS, MAE Consumer

Entity Service Configuration

Environment VariableDefaultDescriptionComponents
ENTITY_SERVICE_IMPLebeanEntity service implementationGMS, MCE Consumer
ENTITY_SERVICE_ENABLE_RETENTIONtrueEnable entity retentionGMS, MCE Consumer
ENTITY_SERVICE_APPLY_RETENTION_BOOTSTRAPfalseApply retention on bootstrapGMS, MCE Consumer

Graph Service Configuration

Environment VariableDefaultDescriptionComponents
GRAPH_SERVICE_IMPLelasticsearchGraph service implementationGMS, MAE Consumer
GRAPH_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
GRAPH_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
GRAPH_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

Search Service Configuration

Environment VariableDefaultDescriptionComponents
SEARCH_SERVICE_BATCH_SIZE100Search service batch sizeGMS
SEARCH_SERVICE_ENABLE_CACHEfalseEnable search service cacheGMS
SEARCH_SERVICE_ENABLE_CACHE_EVICTIONfalseEnable search service cache evictionGMS
SEARCH_SERVICE_CACHE_IMPLEMENTATIONcaffeineSearch service cache implementationGMS
SEARCH_SERVICE_HAZELCAST_SERVICE_NAMEhazelcast-serviceHazelcast service name for search cacheGMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_ENABLEDtrueEnable container expansion in search filtersGMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_PAGE_SIZE100Page size for container expansionGMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_LIMIT100Limit for container expansionGMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_ENABLEDtrueEnable domain expansion in search filtersGMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_PAGE_SIZE100Page size for domain expansionGMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_LIMIT100Limit for domain expansionGMS
SEARCH_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
SEARCH_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
SEARCH_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

Timeseries Aspect Service

Environment VariableDefaultDescriptionComponents
TIMESERIES_ASPECT_SERVICE_QUERY_CONCURRENCY10Parallel threads for timeseries queriesGMS
TIMESERIES_ASPECT_SERVICE_QUERY_QUEUE_SIZE500Queue size for timeseries queriesGMS
TIMESERIES_ASPECT_SERVICE_QUERY_THREAD_KEEP_ALIVE60Thread keep alive time for timeseries queriesGMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

System Metadata Service

Environment VariableDefaultDescriptionComponents
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_MAX10000Maximum allowed result count for queriesGMS
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_API_DEFAULT5000Default API result limitGMS
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_STRICTfalseThrow exception if strict is true, otherwise override with default and warnGMS

Platform Analytics

Environment VariableDefaultDescriptionComponents
DATAHUB_ANALYTICS_ENABLEDtrueEnable platform analyticsGMS, MAE Consumer, Frontend
DATAHUB_ANALYTICS_TRACING_ENABLEDtrueEnable backend usage tracingGMS
ANALYTICS_DATAHUB_USAGE_EVENT_TYPESCreateAccessTokenEvent,CreatePolicyEvent,UpdatePolicyEvent,CreateIngestionSourceEvent,UpdateIngestionSourceEvent,RevokeAccessTokenEvent,CreateUserEvent,UpdateUserEvent,DeletePolicyEventComma separated list of usage event types to listen toGMS
ANALYTICS_GENERIC_ASPECT_TYPES``Filter list for generic aspect eventsGMS
ANALYTICS_USER_FILTERS``Filter out specific users' events from being publishedGMS

Visual Configuration

Queries Tab

Environment VariableDefaultDescriptionComponents
REACT_APP_QUERIES_TAB_RESULT_SIZE5Queries tab result size (experimental)Frontend

Theme Configuration

Environment VariableDefaultDescriptionComponents
REACT_APP_CUSTOM_THEME_ID``Custom theme ID for rendering specific theme fileFrontend

Assets Configuration

Environment VariableDefaultDescriptionComponents
REACT_APP_LOGO_URL/assets/platforms/datahublogo.pngLogo URL for the applicationFrontend
REACT_APP_FAVICON_URL/assets/icons/favicon.icoFavicon URL for the applicationFrontend
REACT_APP_TITLE``Application titleFrontend

UI Configuration

Environment VariableDefaultDescriptionComponents
REACT_APP_HIDE_GLOSSARYfalseHide glossary in the UIFrontend
REACT_APP_SHOW_FULL_TITLE_IN_LINEAGEfalseShow full title in lineageFrontend
DOMAIN_DEFAULT_TAB``Default tab for domains (set to DOCUMENTATION_TAB to show documentation tab first)Frontend
APPLICATION_SHOW_SIDEBAR_SECTION_WHEN_EMPTYfalseShow sidebar section when empty (deprecated)Frontend
SEARCH_RESULT_NAME_HIGHLIGHT_ENABLEDtrueEnable visual highlighting on search result names/descriptionsFrontend

Storage Layer Configuration

EBean Configuration (MySQL/PostgreSQL)

Environment VariableDefaultDescriptionComponents
EBEAN_DATASOURCE_USERNAMEdatahubDatabase usernameGMS, MCE Consumer, System Update
EBEAN_DATASOURCE_PASSWORDdatahubDatabase passwordGMS, MCE Consumer, System Update
EBEAN_DATASOURCE_URLjdbc:mysql://localhost:3306/datahubJDBC URLGMS, MCE Consumer, System Update
EBEAN_DATASOURCE_DRIVERcom.mysql.jdbc.DriverJDBC DriverGMS, MCE Consumer, System Update
EBEAN_MIN_CONNECTIONS2Minimum database connectionsGMS, MCE Consumer, System Update
EBEAN_MAX_CONNECTIONS50Maximum database connectionsGMS, MCE Consumer, System Update
EBEAN_MAX_INACTIVE_TIME_IN_SECS120Maximum inactive time in secondsGMS, MCE Consumer, System Update
EBEAN_MAX_AGE_MINUTES120Maximum age in minutesGMS, MCE Consumer, System Update
EBEAN_LEAK_TIME_MINUTES15Leak time in minutesGMS, MCE Consumer, System Update
EBEAN_WAIT_TIMEOUT_MILLIS1000Wait timeout in millisecondsGMS, MCE Consumer, System Update
EBEAN_AUTOCREATEfalseAuto-create DDLGMS, MCE Consumer, System Update
EBEAN_POSTGRES_USE_AWS_IAM_AUTHfalseUse AWS IAM authentication for PostgreSQLGMS, MCE Consumer, System Update
EBEAN_BATCH_GET_METHODINBatch get method (IN or UNION)GMS, MCE Consumer, System Update

Cassandra Configuration

Environment VariableDefaultDescriptionComponents
CASSANDRA_DATASOURCE_USERNAMEcassandraCassandra usernameGMS, MCE Consumer, System Update
CASSANDRA_DATASOURCE_PASSWORDcassandraCassandra passwordGMS, MCE Consumer, System Update
CASSANDRA_HOSTScassandraCassandra hostsGMS, MCE Consumer, System Update
CASSANDRA_PORT9042Cassandra portGMS, MCE Consumer, System Update
CASSANDRA_DATACENTERdatacenter1Cassandra datacenterGMS, MCE Consumer, System Update
CASSANDRA_KEYSPACEdatahubCassandra keyspaceGMS, MCE Consumer, System Update
CASSANDRA_USE_SSLfalseUse SSL for CassandraGMS, MCE Consumer, System Update

Elasticsearch Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_HOSTlocalhostElasticsearch hostGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PORT9200Elasticsearch portGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_THREAD_COUNT2Elasticsearch thread countGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_CONNECTION_REQUEST_TIMEOUT5000Connection request timeoutGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_USERNAMEnullElasticsearch usernameGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PASSWORDnullElasticsearch passwordGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PATH_PREFIXnullElasticsearch path prefixGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_USE_SSLfalseUse SSL for ElasticsearchGMS, MAE Consumer, MCE Consumer, System Update
OPENSEARCH_USE_AWS_IAM_AUTHfalseUse AWS IAM authentication for OpenSearchGMS, MAE Consumer, MCE Consumer, System Update
AWS_REGIONnullAWS regionGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_IMPLEMENTATIONelasticsearchImplementation (elasticsearch or opensearch)GMS, MAE Consumer, MCE Consumer, System Update
ELASTIC_ID_HASH_ALGOMD5ID hash algorithmGMS, MAE Consumer, MCE Consumer, System Update

SSL Context Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_SSL_PROTOCOLnullSSL protocolGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_SECURE_RANDOM_IMPLnullSSL secure random implementationGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_FILEnullSSL truststore fileGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_TYPEnullSSL truststore typeGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORDnullSSL truststore passwordGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_FILEnullSSL keystore fileGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_TYPEnullSSL keystore typeGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_PASSWORDnullSSL keystore passwordGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEY_PASSWORDnullSSL key passwordGMS, MAE Consumer, MCE Consumer, System Update

Bulk Operations Configuration

Environment VariableDefaultDescriptionComponents
ES_BULK_DELETE_BATCH_SIZE5000Bulk delete batch sizeGMS, MAE Consumer
ES_BULK_DELETE_SLICESautoBulk delete slicesGMS, MAE Consumer
ES_BULK_DELETE_POLL_INTERVAL30Bulk delete poll intervalGMS, MAE Consumer
ES_BULK_DELETE_POLL_UNITSECONDSBulk delete poll unitGMS, MAE Consumer
ES_BULK_DELETE_TIMEOUT30Bulk delete timeoutGMS, MAE Consumer
ES_BULK_DELETE_TIMEOUT_UNITMINUTESBulk delete timeout unitGMS, MAE Consumer
ES_BULK_DELETE_NUM_RETRIES3Bulk delete number of retriesGMS, MAE Consumer
ES_BULK_ASYNCtrueEnable async bulk operationsGMS, MAE Consumer
ES_BULK_REQUESTS_LIMIT1000Bulk requests limitGMS, MAE Consumer
ES_BULK_FLUSH_PERIOD1Bulk flush periodGMS, MAE Consumer
ES_BULK_NUM_RETRIES3Bulk number of retriesGMS, MAE Consumer
ES_BULK_RETRY_INTERVAL1Bulk retry intervalGMS, MAE Consumer
ES_BULK_REFRESH_POLICYNONEBulk refresh policyGMS, MAE Consumer
ES_BULK_ENABLE_BATCH_DELETEfalseEnable batch deleteGMS, MAE Consumer

Index Configuration

Environment VariableDefaultDescriptionComponents
INDEX_PREFIX``Index prefixGMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_INDEX_DOC_IDS_SCHEMA_FIELD_HASH_ID_ENABLEDfalseEnable hash ID for schema field doc IDsGMS, MAE Consumer, MCE Consumer, System Update

Build Indices Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_BUILD_INDICES_ALLOW_DOC_COUNT_MISMATCHfalseAllow document count mismatch when clone indices is enabledSystem Update
ELASTICSEARCH_BUILD_INDICES_CLONE_INDICEStrueClone indicesSystem Update
ELASTICSEARCH_BUILD_INDICES_RETENTION_UNITDAYSRetention unit for indicesSystem Update
ELASTICSEARCH_BUILD_INDICES_RETENTION_VALUE60Retention value for indicesSystem Update
ELASTICSEARCH_BUILD_INDICES_REINDEX_OPTIMIZATION_ENABLEDtrueEnable reindex optimizationSystem Update
ELASTICSEARCH_NUM_SHARDS_PER_INDEX1Number of shards per indexSystem Update
ELASTICSEARCH_NUM_REPLICAS_PER_INDEX1Number of replicas per indexSystem Update
ELASTICSEARCH_INDEX_BUILDER_NUM_RETRIES3Index builder number of retriesSystem Update
ELASTICSEARCH_INDEX_BUILDER_REFRESH_INTERVAL_SECONDS3Index builder refresh intervalSystem Update
SEARCH_DOCUMENT_MAX_ARRAY_LENGTH1000Maximum array length in search documentsSystem Update
SEARCH_DOCUMENT_MAX_OBJECT_KEYS1000Maximum object keys in search documentsSystem Update
SEARCH_DOCUMENT_MAX_VALUE_LENGTH4096Maximum value length in search documentsSystem Update
ELASTICSEARCH_MAIN_TOKENIZERnullMain tokenizerSystem Update
ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEXfalseEnable mappings reindexSystem Update
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEXfalseEnable settings reindexSystem Update
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS0Maximum reindex hours (0 = no timeout)System Update
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_OVERRIDESnullIndex builder settings overridesSystem Update
ELASTICSEARCH_MIN_SEARCH_FILTER_LENGTH3Minimum search filter lengthSystem Update
ELASTICSEARCH_INDEX_BUILDER_ENTITY_SETTINGS_OVERRIDESnullEntity settings overridesSystem Update

Search Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_QUERY_MAX_TERM_BUCKET_SIZE60Maximum term bucket sizeGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_EXCLUSIVEfalseOnly return exact matches when using quotesGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_WITH_PREFIXtrueInclude prefix match in exact match resultsGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_FACTOR16.0Multiply by this number on true exact matchGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_PREFIX_FACTOR1.1Multiply by this number when prefix matchGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_CASE_FACTOR0.0Stacked boost multiplier when case mismatchGMS
ELASTICSEARCH_QUERY_EXACT_MATCH_ENABLE_STRUCTUREDtrueEnable exact match on structured searchGMS
ELASTICSEARCH_QUERY_TWO_GRAM_FACTOR1.2Boost multiplier when match on 2-gram tokensGMS
ELASTICSEARCH_QUERY_THREE_GRAM_FACTOR1.5Boost multiplier when match on 3-gram tokensGMS
ELASTICSEARCH_QUERY_FOUR_GRAM_FACTOR1.8Boost multiplier when match on 4-gram tokensGMS
ELASTICSEARCH_QUERY_PARTIAL_URN_FACTOR0.5Multiplier on Urn token matchGMS
ELASTICSEARCH_QUERY_PARTIAL_FACTOR0.4Multiplier on possible non-Urn token matchGMS
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLEDtrueEnable search query and ranking customizationGMS
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILEsearch_config.yamlLocation of search customization configurationGMS
ELASTICSEARCH_QUERY_SEARCH_FIELD_CONFIG_DEFAULTlegacyDefault field configuration for searchGMS
ELASTICSEARCH_QUERY_AUTOCOMPLETE_FIELD_CONFIG_DEFAULTlegacyDefault field configuration for autocompleteGMS

Graph Search Configuration

Environment VariableDefaultDescriptionComponents
ELASTICSEARCH_SEARCH_GRAPH_TIMEOUT_SECONDS50Graph DAO timeout secondsGMS
ELASTICSEARCH_SEARCH_GRAPH_BATCH_SIZE1000Graph DAO batch sizeGMS
ELASTICSEARCH_SEARCH_GRAPH_MULTI_PATH_SEARCHfalseAllow path retraversal for all pathsGMS
ELASTICSEARCH_SEARCH_GRAPH_BOOST_VIA_NODEStrueBoost graph edges with via nodesGMS
ELASTICSEARCH_SEARCH_GRAPH_STATUS_ENABLEDfalseEnable soft delete tracking of URNs on edgesGMS
ELASTICSEARCH_SEARCH_GRAPH_LINEAGE_MAX_HOPS20Maximum hops to traverse lineage graphGMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_HOPS1000Maximum hops to traverse for impact analysis (impact.maxHops)GMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_RELATIONS40000Maximum number of relationships for impact analysis (impact.maxRelations)GMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_SLICES2Number of slices for parallel search operations (impact.slices)GMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_KEEP_ALIVE5mPoint-in-Time keepAlive duration for impact analysis queries (impact.keepAlive)GMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_THREADS32Maximum parallel lineage graph queriesGMS
ELASTICSEARCH_SEARCH_GRAPH_QUERY_OPTIMIZATIONtrueReduce query nesting if possibleGMS
ELASTICSEARCH_SEARCH_GRAPH_POINT_IN_TIME_CREATION_ENABLEDtrueEnable creation of point in time snapshots for graph queriesGMS

Neo4j Configuration

Environment VariableDefaultDescriptionComponents
NEO4J_USERNAMEneo4jNeo4j usernameGMS, MAE Consumer, System Update
NEO4J_PASSWORDdatahubNeo4j passwordGMS, MAE Consumer, System Update
NEO4J_URIbolt://localhostNeo4j URIGMS, MAE Consumer, System Update
NEO4J_DATABASEgraph.dbNeo4j databaseGMS, MAE Consumer, System Update
NEO4J_MAX_CONNECTION_POOL_SIZE100Maximum connection pool sizeGMS, MAE Consumer, System Update
NEO4J_MAX_CONNECTION_ACQUISITION_TIMEOUT_IN_SECONDS60Maximum connection acquisition timeoutGMS, MAE Consumer, System Update
NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS3600Maximum connection lifetimeGMS, MAE Consumer, System Update
NEO4J_MAX_TRANSACTION_RETRY_TIME_IN_SECONDS30Maximum transaction retry timeGMS, MAE Consumer, System Update
NEO4J_CONNECTION_LIVENESS_CHECK_TIMEOUT_IN_SECONDS-1Connection liveness check timeoutGMS, MAE Consumer, System Update

Kafka Configuration

Reference Links:

Topic Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_USAGE_EVENT_NAMEDataHubUsageEvent_v1DataHub usage event topic nameGMS, MAE Consumer, MCE Consumer, Actions, Frontend

Bootstrap Servers

Environment VariableDefaultDescriptionComponents
KAFKA_BOOTSTRAP_SERVERhttp://localhost:9092Kafka bootstrap serversGMS, MAE Consumer, MCE Consumer, PE Consumer, Actions, Frontend

Producer Configuration

Environment VariableDefaultDescriptionComponents
KAFKA_PRODUCER_RETRY_COUNT3Producer retry countGMS, MCE Consumer, System Update
KAFKA_PRODUCER_DELIVERY_TIMEOUT30000Producer delivery timeoutGMS, MCE Consumer, System Update
KAFKA_PRODUCER_REQUEST_TIMEOUT3000Producer request timeoutGMS, MCE Consumer, System Update
KAFKA_PRODUCER_BACKOFF_TIMEOUT500Producer backoff timeoutGMS, MCE Consumer, System Update
KAFKA_PRODUCER_COMPRESSION_TYPEsnappyProducer compression algorithmGMS, MCE Consumer, System Update
KAFKA_PRODUCER_MAX_REQUEST_SIZE5242880Maximum bytes sent by producerGMS, MCE Consumer, System Update

Consumer Configuration

Environment VariableDefaultDescriptionComponents
KAFKA_LISTENER_CONCURRENCY1Number of Kafka consumer threadsGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MAX_PARTITION_FETCH_BYTES5242880Maximum data per partitionGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_STOP_ON_DESERIALIZATION_ERRORtrueStop on deserialization errorGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_HEALTH_CHECK_ENABLEDtrueEnable health check for consumersGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCP_AUTO_OFFSET_RESETearliestMCP consumer auto offset resetGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCL_AUTO_OFFSET_RESETearliestMCL consumer auto offset resetGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCL_FINE_GRAINED_LOGGING_ENABLEDfalseEnable fine-grained logging for MCLGMS, MAE Consumer
KAFKA_CONSUMER_MCL_ASPECTS_TO_DROP``Aspects to drop for MCLGMS, MAE Consumer
KAFKA_CONSUMER_PE_AUTO_OFFSET_RESETlatestPE consumer auto offset resetGMS, PE Consumer
KAFKA_CONSUMER_PERCENTILES0.5,0.95,0.99,0.999Consumer percentilesGMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer
KAFKA_CONSUMER_SERVICE_LEVEL_OBJECTIVES300,1800,3000,10800,21600,43200Consumer SLOs in secondsGMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer
KAFKA_CONSUMER_MAX_EXPECTED_VALUE86000Maximum expected consumer value in secondsGMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer

Consumer Pool Configuration

Environment VariableDefaultDescriptionComponents
KAFKA_CONSUMER_POOL_INITIAL_SIZE1Consumer pool initial sizeGMS
KAFKA_CONSUMER_POOL_MAX_SIZE5Consumer pool maximum sizeGMS

Schema Registry Configuration

Environment VariableDefaultDescriptionComponents
SCHEMA_REGISTRY_TYPEKAFKASchema registry type (INTERNAL, KAFKA, or AWS_GLUE)GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_SCHEMAREGISTRY_URLhttp://localhost:8081Schema registry URLGMS, MAE Consumer, MCE Consumer, PE Consumer
SCHEMA_REGISTRY_URLhttp://localhost:8081Schema registry URL (Actions)Actions
AWS_GLUE_SCHEMA_REGISTRY_REGIONus-east-1AWS Glue schema registry regionGMS, MAE Consumer, MCE Consumer, PE Consumer
AWS_GLUE_SCHEMA_REGISTRY_NAMEnullAWS Glue schema registry nameGMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_PROPERTIES_SECURITY_PROTOCOLPLAINTEXTKafka security protocolGMS, MAE Consumer, MCE Consumer, PE Consumer, Actions

Spring Configuration

Kafka Security

Environment VariableDefaultDescriptionComponents
spring.kafka.security.protocolPLAINTEXTKafka security protocolGMS, MAE Consumer, MCE Consumer, PE Consumer

Management & Monitoring

JMX Configuration

Environment VariableDefaultDescriptionComponents
spring.jmx.enabledtrueEnable JMXGMS, MAE Consumer, MCE Consumer, PE Consumer

Endpoints Configuration

Environment VariableDefaultDescriptionComponents
management.endpoints.web.exposure.includeprometheus,info,healthcheck,metricsExposed web endpointsGMS
management.endpoints.jmx.enabledtrueEnable JMX endpointsGMS

Metrics Configuration

Environment VariableDefaultDescriptionComponents
management.metrics.cache.enabledfalseEnable cache metricsGMS, MAE Consumer, MCE Consumer, PE Consumer
management.metrics.export.jmx.enabledtrueEnable JMX metrics exportGMS, MAE Consumer, MCE Consumer, PE Consumer
management.metrics.export.prometheus.enabledtrueEnable Prometheus metrics exportGMS, MAE Consumer, MCE Consumer, PE Consumer

Server Configuration

Environment VariableDefaultDescriptionComponents
server.server-headerfalseServer headerGMS

Feature Flags

Reference Links:

Environment VariableDefaultDescriptionComponents
SHOW_SIMPLIFIED_HOMEPAGE_BY_DEFAULTfalseShow simplified homepage with just datasets, charts and dashboardsGMS
LINEAGE_SEARCH_CACHE_ENABLEDtrueEnable in-memory cache for searchAcrossLineage queryGMS
GRAPH_SERVICE_DIFF_MODE_ENABLEDtrueEnable diff mode for graph writesGMS
POINT_IN_TIME_CREATION_ENABLEDfalseEnable creation of point in time snapshots for scroll APIGMS
ALWAYS_EMIT_CHANGE_LOGfalseAlways emit MCL even when no changes detectedGMS
SEARCH_SERVICE_DIFF_MODE_ENABLEDtrueEnable diff mode for search document writesGMS
READ_ONLY_MODE_ENABLEDfalseEnable read only mode for instanceGMS
SHOW_ACCESS_MANAGEMENTfalseShow AccessManagement tab in UIGMS
SHOW_SEARCH_FILTERS_V2trueShow search filters V2 experienceGMS
SHOW_BROWSE_V2trueShow browse v2 sidebar experienceGMS
PLATFORM_BROWSE_V2trueEnable platform browse experienceGMS
LINEAGE_GRAPH_V2trueEnable new lineage visualizationGMS
PRE_PROCESS_HOOKS_UI_ENABLEDtrueCircumvent Kafka for UI changesGMS
PRE_PROCESS_HOOKS_UI_ENABLEDfalseReprocess UI sourced events asynchronouslyGMS
SHOW_ACRYL_INFOfalseShow CTAs around moving to DataHub CloudGMS
ER_MODEL_RELATIONSHIP_FEATURE_ENABLEDfalseEnable Join Tables FeatureGMS
NESTED_DOMAINS_ENABLEDtrueEnable nested Domains featureGMS
SCHEMA_FIELD_ENTITY_FETCH_ENABLEDtrueEnable fetching schema field entitiesGMS
BUSINESS_ATTRIBUTE_ENTITY_ENABLEDfalseEnable business attribute entityGMS
DATA_CONTRACTS_ENABLEDtrueEnable Data Contracts featureGMS
ALTERNATE_MCP_VALIDATIONfalseEnable alternate MCP validation flowGMS
THEME_V2_ENABLEDtrueAllow theme v2 to be turned onGMS
THEME_V2_DEFAULTtrueSet default theme for usersGMS
THEME_V2_TOGGLEABLEtrueAllow theme v2 to be toggled (Acryl only)GMS
SCHEMA_FIELD_CLL_ENABLEDfalseEnable schema field-level lineage linksGMS
SCHEMA_FIELD_LINEAGE_IGNORE_STATUStrueIgnore schema field status in lineageGMS
SHOW_SEPARATE_SIBLINGSfalseSeparate siblings with no combined viewGMS
EDITABLE_DATASET_NAME_ENABLEDfalseEnable editing dataset name in UIGMS
SHOW_MANAGE_STRUCTURED_PROPERTIEStrueShow manage structured properties buttonGMS
HIDE_DBT_SOURCE_IN_LINEAGEfalseHide dbt sources in lineageGMS
SHOW_NAV_BAR_REDESIGNtrueShow newly designed nav barGMS
SHOW_AUTO_COMPLETE_RESULTStrueShow auto complete results in search barGMS
ENTITY_VERSIONING_ENABLEDfalseEnable entity versioning APIsGMS
SHOW_HAS_SIBLINGS_FILTERfalseShow "has siblings" filter in searchGMS
SHOW_SEARCH_BAR_AUTOCOMPLETE_REDESIGNfalseShow redesigned search bar autocompleteGMS
SHOW_MANAGE_TAGStrueAllow users to manage tags in UIGMS
SHOW_INTRODUCE_PAGEtrueShow introduce page in V2 UIGMS
SHOW_INGESTION_PAGE_REDESIGNfalseShow re-designed Ingestion pageGMS
SHOW_LINEAGE_EXPAND_MOREtrueShow expand more button in lineage graphGMS
SHOW_HOME_PAGE_REDESIGNfalseShow re-designed home pageGMS
LINEAGE_GRAPH_V3falseEnable redesign of lineage v2 graphGMS
SHOW_PRODUCT_UPDATEStrueShow in-product update popoverGMS
LOGICAL_MODELS_ENABLEDfalseEnable logical models featureGMS
SHOW_HOMEPAGE_USER_ROLEfalseDisplay homepage user role underneath nameGMS
VIEWS_ENABLEDtrueEnable views featureGMS

System Updates

Reference Links:

Bootstrap Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_POLICIES_FILEclasspath:boot/policies.jsonBootstrap policies fileGMS
BOOTSTRAP_SERVLETS_WAITTIMEOUT60Total waiting time for servlets to initializeGMS

System Update Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_INITIAL_BACK_OFF_MILLIS5000Initial back off for system updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_MAX_BACK_OFFS50Maximum back offs for system updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_BACK_OFF_FACTOR2Multiplicative factor for back offSystem Update
BOOTSTRAP_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATEtrueWait for system update to completeSystem Update
SYSTEM_UPDATE_BOOTSTRAP_MCP_CONFIGbootstrap_mcps.yamlBootstrap MCP configurationSystem Update

Data Job Node CLL Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLEDfalseEnable data job node CLLSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_BATCH_SIZE1000Data job node CLL batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_DELAY_MS30000Data job node CLL delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_LIMIT0Data job node CLL limitSystem Update

Domain Description Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_ENABLEDtrueEnable domain description updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_BATCH_SIZE1000Domain description batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_DELAY_MS30000Domain description delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_CLL_LIMIT0Domain description CLL limitSystem Update

Dashboard Info Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_ENABLEDtrueEnable dashboard info updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_BATCH_SIZE1000Dashboard info batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_DELAY_MS30000Dashboard info delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_CLL_LIMIT0Dashboard info CLL limitSystem Update

Browse Paths V2 Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_BROWSE_PATHS_V2_ENABLEDtrueEnable browse paths V2 updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_BROWSE_PATHS_V2_BATCH_SIZE5000Browse paths V2 batch sizeSystem Update
REPROCESS_DEFAULT_BROWSE_PATHS_V2falseReprocess default browse paths V2System Update

Ingestion Indices Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_ENABLEDtrueEnable ingestion indices updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_BATCH_SIZE5000Ingestion indices batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_DELAY_MS1000Ingestion indices delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_CLL_LIMIT0Ingestion indices CLL limitSystem Update

Policy Fields Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLEDtrueEnable policy fields updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_BATCH_SIZE5000Policy fields batch sizeSystem Update
REPROCESS_DEFAULT_POLICY_FIELDSfalseReprocess default policy fieldsSystem Update

Ownership Types Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_ENABLEDtrueEnable ownership types updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_BATCH_SIZE1000Ownership types batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_REPROCESSfalseReprocess ownership typesSystem Update

Schema Fields Configuration

Environment VariableDefaultDescriptionComponents
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_ENABLEDfalseEnable schema fields from schema metadataSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_BATCH_SIZE500Schema fields from schema metadata batch sizeSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_DELAY_MS1000Schema fields from schema metadata delaySystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_LIMIT0Schema fields from schema metadata limitSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_ENABLEDfalseEnable schema fields doc IDsSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_BATCH_SIZE500Schema fields doc IDs batch sizeSystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_DELAY_MS5000Schema fields doc IDs delaySystem Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_LIMIT0Schema fields doc IDs limitSystem Update

Process Instance Configuration

Environment VariableDefaultDescriptionComponents
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_ENABLEDtrueEnable process instance has run eventsSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_BATCH_SIZE100Process instance has run events batch sizeSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_DELAY_MS1000Process instance has run events delaySystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_TOTAL_DAYS90Process instance has run events total daysSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_WINDOW_DAYS1Process instance has run events window daysSystem Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_REPROCESSfalseReprocess process instance has run eventsSystem Update

Edge Status Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_ENABLEDfalseEnable edge status updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_BATCH_SIZE1000Edge status batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_DELAY_MS5000Edge status delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_LIMIT0Edge status limitSystem Update

Property Definitions Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_ENABLEDtrueEnable property definitions updatesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_BATCH_SIZE500Property definitions batch sizeSystem Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_DELAY_MS1000Property definitions delay in millisecondsSystem Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_CLL_LIMIT0Property definitions CLL limitSystem Update

Remove Query Edges Configuration

Environment VariableDefaultDescriptionComponents
BOOTSTRAP_SYSTEM_UPDATE_REMOVE_QUERY_EDGES_ENABLEDtrueEnable remove query edgesSystem Update
BOOTSTRAP_SYSTEM_UPDATE_REMOVE_QUERY_EDGES_RETRIES20Remove query edges retriesSystem Update

Additional Environment Variables

The following environment variables are used in the codebase but may not be explicitly defined in the application.yaml file:

Ingestion and Processing

Environment VariableDefaultDescriptionComponents
ASYNC_INGEST_DEFAULTfalseAsynchronously process ingestProposals by writing to KafkaGMS
STRICT_URN_VALIDATION_ENABLEDfalseEnable stricter URN validation logicGMS
DATAHUB_DATASET_URN_TO_LOWERnullConvert dataset URN names to lowercaseGMS
BUSINESS_ATTRIBUTE_ENTITY_ENABLEDfalseEnable business attribute entity featureGMS

REST and Servlet Configuration

Environment VariableDefaultDescriptionComponents
RESTLI_SERVLET_THREADSnullNumber of threads for REST servletGMS, MCE Consumer
RESTLI_TIMEOUT_SECONDS60REST timeout in secondsGMS, MCE Consumer

System and Version Information

Environment VariableDefaultDescriptionComponents
DATAHUB_GMS_PROTOCOLhttpGMS protocol (http/https)GMS

Upgrade and Migration

Environment VariableDefaultDescriptionComponents
SKIP_REINDEX_EDGE_STATUSfalseSkip reindexing edge statusSystem Update
SKIP_REINDEX_DATA_JOB_INPUT_OUTPUTfalseSkip reindexing data job input/outputSystem Update
SKIP_GENERATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATAfalseSkip generating schema fields from schema metadataSystem Update
SKIP_MIGRATE_SCHEMA_FIELDS_DOC_IDfalseSkip migrating schema fields doc IDsSystem Update
BACKFILL_BROWSE_PATHS_V2falseEnable backfilling browse paths V2System Update
READER_POOL_SIZEnullReader pool size for restore operationsSystem Update
WRITER_POOL_SIZEnullWriter pool size for restore operationsSystem Update

OpenTelemetry Configuration

Environment VariableDefaultDescriptionComponents
OTEL_METRICS_EXPORTERnoneOpenTelemetry metrics exporterGMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_TRACES_EXPORTERnoneOpenTelemetry traces exporterGMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_LOGS_EXPORTERnoneOpenTelemetry logs exporterGMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_PROPAGATORSnullOpenTelemetry propagatorsGMS, MAE Consumer, MCE Consumer, PE Consumer

Secret Service Configuration

Environment VariableDefaultDescriptionComponents
SECRET_SERVICE_ENCRYPTION_KEYENCRYPTION_KEYSecret service encryption keyGMS
SECRET_SERVICE_V1_ALGORITHM_ENABLEDtrueEnable v1 algorithm for secret serviceGMS

Health Check Configuration

Environment VariableDefaultDescriptionComponents
HEALTH_CHECK_CACHE_DURATION_SECONDS5Health check cache durationGMS

Metadata Tests Configuration

Environment VariableDefaultDescriptionComponents
METADATA_TESTS_ENABLEDfalseEnable metadata testsGMS

Hooks Configuration

Environment VariableDefaultDescriptionComponents
ENABLE_SIBLING_HOOKtrueEnable automatic sibling associationsGMS, MAE Consumer
SIBLINGS_HOOK_CONSUMER_GROUP_SUFFIX``Siblings hook consumer group suffixGMS, MAE Consumer
ENABLE_UPDATE_INDICES_HOOKtrueEnable update indices hookGMS, MAE Consumer
UPDATE_INDICES_CONSUMER_GROUP_SUFFIX``Update indices consumer group suffixGMS, MAE Consumer
ENABLE_INGESTION_SCHEDULER_HOOKtrueEnable ingestion schedulingGMS, MAE Consumer
INGESTION_SCHEDULER_HOOK_CONSUMER_GROUP_SUFFIX``Ingestion scheduler hook consumer group suffixGMS, MAE Consumer
ENABLE_INCIDENTS_HOOKtrueEnable incidents hookGMS, MAE Consumer
MAX_INCIDENT_HISTORY100Maximum incident historyGMS, MAE Consumer
INCIDENTS_HOOK_CONSUMER_GROUP_SUFFIX``Incidents hook consumer group suffixGMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_HOOKtrueEnable structured properties mappingsGMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_WRITEtrueEnable writing structured property valuesGMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_SYSTEM_UPDATEfalseEnable structured property mappings in system updateGMS, MAE Consumer
ENABLE_ENTITY_CHANGE_EVENTS_HOOKtrueEnable entity change events hookGMS, MAE Consumer
ECE_CONSUMER_GROUP_SUFFIX``Entity change events consumer group suffixGMS, MAE Consumer
ECE_ENTITY_EXCLUSIONSschemaFieldEntities to exclude from ECE hookGMS, MAE Consumer
FORMS_HOOK_ENABLEDtrueEnable forms hookGMS, MAE Consumer
FORMS_HOOK_CONSUMER_GROUP_SUFFIX``Forms hook consumer group suffixGMS, MAE Consumer

Search and API Configuration

Environment VariableDefaultDescriptionComponents
SEARCH_BAR_API_VARIANTAUTOCOMPLETE_FOR_MULTIPLESearch bar API variantFrontend
FIRST_IN_PERSONAL_SIDEBARYOUR_ASSETSFirst item in personal sidebarFrontend

Client Configuration

Environment VariableDefaultDescriptionComponents
ENTITY_CLIENT_RETRY_INTERVAL2Entity client retry intervalGMS
ENTITY_CLIENT_NUM_RETRIES3Entity client number of retriesGMS
ENTITY_CLIENT_JAVA_GET_BATCH_SIZE375Entity client Java get batch sizeGMS
ENTITY_CLIENT_JAVA_INGEST_BATCH_SIZE375Entity client Java ingest batch sizeGMS
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE100Entity client RESTli get batch sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY2Entity client RESTli get batch concurrencyGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_QUEUE_SIZE500Entity client RESTli get batch queue sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_THREAD_KEEP_ALIVE60Entity client RESTli get batch thread keep aliveGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_SIZE50Entity client RESTli ingest batch sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_CONCURRENCY2Entity client RESTli ingest batch concurrencyGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_QUEUE_SIZE500Entity client RESTli ingest batch queue sizeGMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_THREAD_KEEP_ALIVE60Entity client RESTli ingest batch thread keep aliveGMS, MAE Consumer, PE Consumer
USAGE_CLIENT_RETRY_INTERVAL2Usage client retry intervalGMS, MAE Consumer, PE Consumer
USAGE_CLIENT_NUM_RETRIES0Usage client number of retriesGMS, MAE Consumer, PE Consumer
USAGE_CLIENT_TIMEOUT_MS3000Usage client timeout in millisecondsGMS, MAE Consumer, PE Consumer

Cache Configuration

Environment VariableDefaultDescriptionComponents
CACHE_TTL_SECONDS600Default cache time to liveGMS
CACHE_MAX_SIZE10000Maximum number of items to cacheGMS
CACHE_ENTITY_COUNTS_TTL_SECONDS600Homepage entity count time to liveGMS
CACHE_SEARCH_LINEAGE_TTL_SECONDS86400Search lineage cache time to liveGMS
CACHE_SEARCH_LINEAGE_LIGHTNING_THRESHOLD300Lineage graphs exceeding this limit will use local cacheGMS
CACHE_CLIENT_USAGE_CLIENT_ENABLEDtrueEnable usage client cacheGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_STATS_ENABLEDtrueEnable usage client cache statsGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_STATS_INTERVAL_SECONDS120Usage client cache stats intervalGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_TTL_SECONDS86400Usage client cache TTLGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_MAX_BYTES52428800Usage client cache max bytes (50MB)GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_ENABLEDtrueEnable entity client cacheGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_STATS_ENABLEDtrueEnable entity client cache statsGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_STATS_INTERVAL_SECONDS120Entity client cache stats intervalGMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_TTL_SECONDS0Entity client cache TTL (0 = no cache)GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_MAX_BYTES104857600Entity client cache max bytes (100MB)GMS, MAE Consumer, PE Consumer

GraphQL Configuration

Environment VariableDefaultDescriptionComponents
GRAPHQL_CONCURRENCY_SEPARATE_THREAD_POOLfalseEnable separate thread pool for GraphQLGMS
GRAPHQL_CONCURRENCY_STACK_SIZE256000GraphQL thread pool stack sizeGMS
GRAPHQL_CONCURRENCY_CORE_POOL_SIZE-1GraphQL core pool size (default 5 * cores)GMS
GRAPHQL_CONCURRENCY_MAX_POOL_SIZE-1GraphQL max pool size (default 100 * cores)GMS
GRAPHQL_CONCURRENCY_KEEP_ALIVE60GraphQL thread keep alive timeGMS
GRAPHQL_QUERY_COMPLEXITY_LIMIT2000GraphQL query complexity limitGMS
GRAPHQL_QUERY_DEPTH_LIMIT50GraphQL query depth limitGMS
GRAPHQL_QUERY_INTROSPECTION_ENABLEDtrueEnable GraphQL introspectionGMS
GRAPHQL_METRICS_ENABLEDtrueEnable GraphQL metrics collectionGMS
GRAPHQL_PERCENTILES0.5,0.75,0.95,0.98,0.99,0.999GraphQL percentilesGMS
GRAPHQL_METRICS_FIELD_LEVEL_ENABLEDfalseEnable field-level GraphQL metricsGMS
GRAPHQL_METRICS_FIELD_LEVEL_OPERATIONSgetSearchResultsForMultiple,searchAcrossLineageStructureGraphQL field-level operationsGMS
GRAPHQL_METRICS_FIELD_LEVEL_PATH_ENABLEDfalseInclude field path in GraphQL metricsGMS
GRAPHQL_METRICS_FIELD_LEVEL_PATHS``GraphQL field-level pathsGMS
GRAPHQL_METRICS_TRIVIAL_DATA_FETCHERS_ENABLEDfalseInclude trivial data fetchers in GraphQL metricsGMS

Chrome Extension Configuration

Environment VariableDefaultDescriptionComponents
CHROME_EXTENSION_ENABLEDtrueEnable Chrome extensionFrontend
CHROME_EXTENSION_LINEAGE_ENABLEDtrueEnable Chrome extension lineageFrontend

Business Attribute Configuration

Environment VariableDefaultDescriptionComponents
BUSINESS_ATTRIBUTE_RELATED_ENTITIES_COUNT20000Business attribute related entities countGMS
BUSINESS_ATTRIBUTE_RELATED_ENTITIES_BATCH_SIZE1000Business attribute related entities batch sizeGMS
BUSINESS_ATTRIBUTE_PROPAGATION_CONCURRENCY_THREAD_COUNT-1Business attribute propagation thread count (default 2 * cores)GMS
BUSINESS_ATTRIBUTE_PROPAGATION_CONCURRENCY_KEEP_ALIVE60Business attribute propagation keep alive timeGMS

Metadata Change Proposal Configuration

Environment VariableDefaultDescriptionComponents
MCP_CONSUMER_BATCH_ENABLEDfalseEnable MCP consumer batch processingGMS, MCE Consumer
MCP_CONSUMER_BATCH_SIZE15744000MCP consumer batch sizeGMS, MCE Consumer
MCP_VALIDATION_IGNORE_UNKNOWNtrueIgnore unknown fields in MCP validationGMS, MCE Consumer
MCP_VALIDATION_PRIVILEGE_CONSTRAINTStrueEnable privilege constraints in MCP validationGMS, MCE Consumer
MCP_VALIDATION_EXTENSIONS_ENABLEDfalseEnable extensions in MCP validationGMS, MCE Consumer
MCP_SIDE_EFFECTS_SCHEMA_FIELD_ENABLEDfalseEnable schema field side effectsGMS, MCE Consumer
MCP_SIDE_EFFECTS_DATA_PRODUCT_UNSET_ENABLEDtrueEnable data product unset side effectsGMS, MCE Consumer
MCP_THROTTLE_UPDATE_INTERVAL_MS60000MCP throttle update intervalGMS, MCE Consumer
MCP_MCE_CONSUMER_THROTTLE_ENABLEDfalseEnable MCE consumer throttlingGMS, MCE Consumer
MCP_API_REQUESTS_THROTTLE_ENABLEDfalseEnable API requests throttlingGMS, MCE Consumer
MCP_VERSIONED_THROTTLE_ENABLEDfalseEnable versioned MCL topic throttlingGMS, MCE Consumer
MCP_VERSIONED_THRESHOLD4000Versioned throttle thresholdGMS, MCE Consumer
MCP_VERSIONED_MAX_ATTEMPTS1000Versioned max attemptsGMS, MCE Consumer
MCP_VERSIONED_INITIAL_INTERVAL_MS100Versioned initial intervalGMS, MCE Consumer
MCP_VERSIONED_MULTIPLIER10Versioned multiplierGMS, MCE Consumer
MCP_VERSIONED_MAX_INTERVAL_MS30000Versioned max intervalGMS, MCE Consumer
MCP_TIMESERIES_THROTTLE_ENABLEDfalseEnable timeseries MCL topic throttlingGMS, MCE Consumer
MCP_TIMESERIES_THRESHOLD4000Timeseries throttle thresholdGMS, MCE Consumer
MCP_TIMESERIES_MAX_ATTEMPTS1000Timeseries max attemptsGMS, MCE Consumer
MCP_TIMESERIES_INITIAL_INTERVAL_MS100Timeseries initial intervalGMS, MCE Consumer
MCP_TIMESERIES_MULTIPLIER10Timeseries multiplierGMS, MCE Consumer
MCP_TIMESERIES_MAX_INTERVAL_MS30000Timeseries max intervalGMS, MCE Consumer

Events API Configuration

Environment VariableDefaultDescriptionComponents
EVENTS_API_ENABLEDtrueEnable events APIGMS

Iceberg Catalog Configuration

Environment VariableDefaultDescriptionComponents
ENABLE_PUBLIC_READfalseEnable public read for Iceberg catalogGMS
PUBLICLY_READABLE_TAGPUBLICLY_READABLEPublicly readable tag for Iceberg catalogGMS

Change Data Capture (CDC) Configuration

Reference Links:

DataHub supports CDC mode for MetadataChangeLog generation, which guarantees ordered MCL events matching the order of database writes. CDC mode is optional and disabled by default.

CDC Processing (Common)

Environment VariableDefaultDescriptionComponents
CDC_MCL_PROCESSING_ENABLEDfalseEnable CDC mode for MCL generationGMS, MCE Consumer, System Update
CDC_CONFIGURE_SOURCEfalseAuto-configure Debezium connector (recommended false for production)System Update
CDC_DB_TYPEmysqlDatabase type for CDC (mysql or postgres)System Update
DATAHUB_CDC_CONNECTOR_NAMEdatahub-cdc-connectorName of the Debezium connectorSystem Update
CDC_KAFKA_CONNECT_URLhttp://kafka-connect:8083Kafka Connect REST API URLSystem Update
CDC_KAFKA_CONNECT_REQUEST_TIMEOUT10000Request timeout for Kafka Connect API calls in millisecondsSystem Update
CDC_USERdatahub_cdcDatabase username for CDC connectorSystem Update
CDC_PASSWORDdatahub_cdcDatabase password for CDC connectorSystem Update
CDC_TOPIC_NAMEdatahub.metadata_aspect_v2Kafka topic name for CDC eventsGMS, MCE Consumer, System Update
CDC_URN_KEY_SPECdatahub.metadata_aspect_v2:urnPartitioning key specification (table:column format)System Update

CDC MySQL Configuration

Environment VariableDefaultDescriptionComponents
DEBEZIUM_CONNECTOR_CLASSio.debezium.connector.mysql.MySqlConnectorDebezium connector class for MySQLSystem Update
DEBEZIUM_PLUGIN_NAMEdecoderbufsLogical decoding plugin for MySQLSystem Update
CDC_SERVER_ID184001Unique server ID for MySQL CDC connectorSystem Update

CDC PostgreSQL Configuration

Environment VariableDefaultDescriptionComponents
DEBEZIUM_CONNECTOR_CLASSio.debezium.connector.postgresql.PostgresConnectorDebezium connector class for PostgreSQLSystem Update
DEBEZIUM_PLUGIN_NAMEpgoutputPostgreSQL logical decoding pluginSystem Update
CDC_INCLUDE_TABLEpublic.metadata_aspect_v2Tables to include in CDC captureSystem Update
CDC_INCLUDE_SCHEMApublicSchemas to include in CDC captureSystem Update

Component Configuration

VariableDefaultDescriptionComponents
MCP_CONSUMER_ENABLEDtrueWhen running in standalone mode, disabled on GMS and enable on separate MCE Consumer.GMS, MCE Consumer
MCL_CONSUMER_ENABLEDtrueWhen running in standalone mode, disabled on GMS and enable on separate MAE Consumer.GMS, MAE Consumer
PE_CONSUMER_ENABLEDtrueWhen running in standalone mode, disabled on GMS and enable on separate MAE Consumer.GMS, PE Consumer

DataHub Frontend

Play Framework Configuration

Secret Key Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_SECRETnullSecret key used to secure cryptographic functionsFrontend

HTTP Parser Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_PLAY_MEM_BUFFER_SIZE10MBMaximum memory buffer size for HTTP parserFrontend

Server Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_AKKA_MAX_HEADER_COUNT64Maximum number of headers allowedFrontend
DATAHUB_AKKA_MAX_HEADER_VALUE_LENGTH32kMaximum header value lengthFrontend

Session Configuration

Environment VariableDefaultDescriptionComponents
AUTH_COOKIE_SAME_SITELAXSameSite attribute for authentication cookiesFrontend
AUTH_COOKIE_SECUREfalseWhether authentication cookies should be secureFrontend

Authentication Configuration

OIDC Configuration

Reference Links:

Required OIDC Configuration

Environment VariableDefaultDescriptionComponents
AUTH_OIDC_ENABLEDfalseEnable OIDC authenticationFrontend
AUTH_OIDC_CLIENT_IDnullUnique client ID issued by the identity providerFrontend
AUTH_OIDC_CLIENT_SECRETnullUnique client secret issued by the identity providerFrontend
AUTH_OIDC_DISCOVERY_URInullThe IdP OIDC discovery URLFrontend
AUTH_OIDC_BASE_URLnullThe base URL associated with your DataHub deploymentFrontend

Optional OIDC Configuration

Environment VariableDefaultDescriptionComponents
AUTH_OIDC_USER_NAME_CLAIMpreferred_usernameThe attribute/claim used to derive the DataHub usernameFrontend
AUTH_OIDC_USER_NAME_CLAIM_REGEX(.*)The regex used to parse the DataHub username from the user name claimFrontend
AUTH_OIDC_SCOPEoidc email profileString representing the requested scope from the IdPFrontend
AUTH_OIDC_CLIENT_AUTHENTICATION_METHODclient_secret_basicAuthentication method to pass credentials to token endpointFrontend
AUTH_OIDC_JIT_PROVISIONING_ENABLEDtrueWhether DataHub users should be provisioned on login if they don't existFrontend
AUTH_OIDC_PRE_PROVISIONING_REQUIREDfalseWhether the user should already exist in DataHub on loginFrontend
AUTH_OIDC_EXTRACT_GROUPS_ENABLEDtrueWhether groups should be extracted from a claim in the OIDC profileFrontend
AUTH_OIDC_GROUPS_CLAIMgroupsThe OIDC claim to extract groups information fromFrontend
AUTH_OIDC_RESPONSE_TYPEnullOIDC response typeFrontend
AUTH_OIDC_RESPONSE_MODEnullOIDC response modeFrontend
AUTH_OIDC_USE_NONCEnullWhether to use nonce in OIDC flowFrontend
AUTH_OIDC_CUSTOM_PARAM_RESOURCEnullCustom resource parameter for OIDCFrontend
AUTH_OIDC_READ_TIMEOUTnullOIDC read timeoutFrontend
AUTH_OIDC_CONNECT_TIMEOUTnullOIDC connect timeoutFrontend
AUTH_OIDC_EXTRACT_JWT_ACCESS_TOKEN_CLAIMSfalseWhether to extract claims from JWT access tokenFrontend
AUTH_OIDC_PREFERRED_JWS_ALGORITHMnullWhich JWS algorithm to useFrontend
AUTH_OIDC_ACR_VALUESnullOIDC ACR valuesFrontend
AUTH_OIDC_GRANT_TYPEnullOIDC grant typeFrontend

Authentication Methods Configuration

Environment VariableDefaultDescriptionComponents
AUTH_JAAS_ENABLEDtrueEnable JAAS authenticationFrontend
AUTH_NATIVE_ENABLEDtrueEnable native authenticationFrontend
GUEST_AUTHENTICATION_ENABLEDfalseEnable guest authenticationFrontend
GUEST_AUTHENTICATION_USERguestThe name of the guest user IDFrontend
GUEST_AUTHENTICATION_PATHnullThe path to bypass login page and get logged in as guestFrontend
ENFORCE_VALID_EMAILtrueEnforce the usage of a valid email for user sign upFrontend

Authentication Logging

Environment VariableDefaultDescriptionComponents
AUTH_VERBOSE_LOGGINGfalseEnable verbose authentication loggingFrontend

Session Configuration

Environment VariableDefaultDescriptionComponents
AUTH_SESSION_TTL_HOURS24Login session expiration time in hoursFrontend
MAX_SESSION_TOKEN_AGE24hMaximum age of session tokenFrontend

Metadata Service Configuration

Connection Configuration

Environment VariableDefaultDescriptionComponents
DATAHUB_GMS_HOSTlocalhostMetadata service hostFrontend
DATAHUB_GMS_PORT8080Metadata service portFrontend
DATAHUB_GMS_USE_SSLfalseWhether to use SSL for metadata service connectionFrontend

Authentication Configuration

Environment VariableDefaultDescriptionComponents
METADATA_SERVICE_AUTH_ENABLEDfalseEnable metadata service authenticationFrontend
DATAHUB_SYSTEM_CLIENT_SECRETJohnSnowKnowsNothingSystem client secret for metadata serviceFrontend

Entity Client Configuration

Environment VariableDefaultDescriptionComponents
ENTITY_CLIENT_RETRY_INTERVAL2Entity client retry intervalFrontend
ENTITY_CLIENT_NUM_RETRIES3Entity client number of retriesFrontend
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE50Entity client RESTli get batch sizeFrontend
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY2Entity client RESTli get batch concurrencyFrontend

Notes

  • Environment variables follow the pattern of converting YAML property paths to uppercase with underscores
  • Default values are shown in the table above
  • For Kafka configuration, refer to the official Spring Kafka documentation for additional properties
  • Feature flags control experimental or optional functionality
  • System update configurations control various background maintenance tasks
  • Cache configurations help optimize performance for different use cases
  • GraphQL configurations control query complexity and performance monitoring
  • OpenTelemetry variables control observability and tracing behavior
  • Play Framework properties are converted to environment variables by:
    • Converting dots (.) to underscores (_)
    • Converting to uppercase