Skip to main content
Version: Next

Elasticsearch & OpenSearch Multi-Client Shim

This guide explains how to use DataHub's multi-client search engine shim to support different versions of Elasticsearch and OpenSearch through a unified interface.

Overview

DataHub's search client shim provides seamless support for:

  • Elasticsearch 7.17
  • Elasticsearch 8.17+
  • OpenSearch 2.x with full REST high-level client support

This enables smooth migrations between different search engine versions while maintaining backward compatibility with existing DataHub deployments.

Architecture

Core Components

The shim consists of several key components:

  1. SearchClientShim - Main abstraction interface
  2. SearchClientShimFactory - Factory for creating appropriate client implementations
  3. Implementation Classes - Concrete implementations for each search engine:
    • Es7CompatibilitySearchClientShim - ES 7.17
    • Es8SearchClientShim - ES 8.17+
    • OpenSearch2SearchClientShim - OpenSearch 2.x

Supported Configurations

Source EngineTarget EngineShim ImplementationStatus
DataHub → ES 7.17ES 7.17Es7CompatibilitySearchClientShim✅ Complete
DataHub → ES 8.17+ES 8.17+Es8SearchClientShim✅ Complete
DataHub → OpenSearch 2.xOpenSearch 2.xOpenSearch2SearchClientShim✅ Complete

Configuration

Environment Variables

Configure the shim using these environment variables:

# Enable the search client shim (required)
ELASTICSEARCH_SHIM_ENABLED=true

# Specify engine type (or use AUTO_DETECT)
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# Options: AUTO_DETECT, ELASTICSEARCH_7, ELASTICSEARCH_8, OPENSEARCH_2

# Enable auto-detection (recommended)
ELASTICSEARCH_SHIM_AUTO_DETECT=true

application.yaml Configuration

Alternatively, configure via application.yaml:

elasticsearch:
host: localhost
port: 9200
username: ${ELASTICSEARCH_USERNAME:#{null}}
password: ${ELASTICSEARCH_PASSWORD:#{null}}
useSSL: false
# Standard Elasticsearch configuration...

# Multi-client shim configuration
shim:
enabled: true # Enable shim
engineType: AUTO_DETECT # or specific type
autoDetectEngine: true # Auto-detect cluster type

Migration Scenarios

Scenario 1: Elasticsearch 7.17 → Elasticsearch 8.x

This is the most common migration path.

Step 1: Enable the shim

ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8

Step 2: Verify connection

# Check logs for successful connection

Scenario 2: Elasticsearch 7.17 → OpenSearch 2.x

Direct migration from Elasticsearch to OpenSearch 2.x.

Configuration:

ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=OPENSEARCH_2
ELASTICSEARCH_SHIM_AUTO_DETECT=true

Let DataHub automatically detect your search engine type:

ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
ELASTICSEARCH_SHIM_AUTO_DETECT=true

The shim will:

  1. Connect to your search cluster
  2. Identify the engine type and version
  3. Select the appropriate client implementation

Deployment Guide

Docker Compose

Update your docker-compose.yml:

services:
datahub-gms:
environment:
- ELASTICSEARCH_SHIM_ENABLED=true
- ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# ... other ES config

Kubernetes

Update your deployment manifests:

apiVersion: apps/v1
kind: Deployment
metadata:
name: datahub-gms
spec:
template:
spec:
containers:
- name: datahub-gms
env:
- name: ELASTICSEARCH_SHIM_ENABLED
value: "true"
- name: ELASTICSEARCH_SHIM_ENGINE_TYPE
value: "AUTO_DETECT"
# ... other configuration

Helm

Update your values.yaml:

global:
elasticsearch:
shim:
enabled: true
engineType: "AUTO_DETECT"
autoDetectEngine: true

Validation and Testing

Verify Shim Configuration

  1. Check logs for shim initialization:
docker logs datahub-gms | grep -i "shim\|search"

Look for messages like:

INFO  Creating SearchClientShim for engine type: ELASTICSEARCH_7
INFO Auto-detected search engine type: ELASTICSEARCH_7
  1. Test search functionality in DataHub UI:
  • Search for datasets
  • Browse data assets
  • Check that lineage is working
  1. Monitor performance during transition:
  • Watch for connection errors
  • Check response times
  • Monitor resource usage

Common Validation Steps

# 1. Check DataHub health endpoint
curl http://localhost:8080/health

# 2. Verify search index access
curl -u user:pass "http://elasticsearch:9200/_cat/indices?v"

# 3. Test search functionality
curl -X POST "http://localhost:8080/api/graphql" \
-H "Content-Type: application/json" \
-d '{"query": "{ search(input: {type: DATASET, query: \"*\"}) { total }}"}'

Troubleshooting

Common Issues

1. Connection Failures

ERROR: Unable to connect to search cluster

Solutions:

  • Verify ELASTICSEARCH_HOST and ELASTICSEARCH_PORT
  • Check network connectivity between DataHub and search cluster
  • Ensure credentials are correct
  • Verify SSL/TLS configuration (ES8 Containers use SSL by default so if you previously weren't this may cause issues)

2. Auto-Detection Failures

ERROR: Unable to detect search engine type

Solutions:

  • Manually specify engine type: ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
  • Check cluster health: curl http://elasticsearch:9200/_cluster/health
  • Verify authentication credentials

3. API Compatibility Issues

ERROR: Incompatible API version

Solutions:

  • Check Elasticsearch version compatibility
  • Review deprecation warnings in ES logs

4. Dependency Issues

ERROR: ClassNotFoundException for ES client

Solutions:

  • Ensure correct client dependencies are included in classpath
  • Check build.gradle for required dependencies
  • Rebuild DataHub with appropriate client libraries

Debug Mode

Enable debug logging to troubleshoot issues:

# Add to environment
DATAHUB_LOG_LEVEL=DEBUG
ELASTICSEARCH_SHIM_DEBUG=true

Performance Monitoring

Monitor key metrics during migration:

# Connection pool metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.connections"

# Search operation metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.search"

# Error rates
curl "http://localhost:8080/actuator/metrics/elasticsearch.errors"

Best Practices

Pre-Migration

  1. Backup your data before changing search engine configuration
  2. Test in staging with representative data volumes
  3. Monitor resource usage patterns in current deployment
  4. Document current configuration for rollback scenarios

During Migration

  1. Enable auto-detection initially for smooth transition
  2. Monitor logs closely for connection and performance issues
  3. Test all search functionality after configuration changes

Post-Migration

  1. Update documentation with new configuration
  2. Monitor performance metrics for several days
  3. Plan for future upgrades (ES 8.x native support)
  4. Train team members on new configuration options

Future Enhancements

Planned Features

  1. OpenSearch 3.x support when available
  2. Enhanced AWS IAM authentication for all client types
  3. Advanced feature detection and capability querying

Contributing

To extend the shim for additional search engines:

  1. Implement SearchClientShim interface
  2. Add engine type to SearchEngineType enum
  3. Update factory logic in SearchClientShimFactory
  4. Add configuration options to application.yaml
  5. Write tests and documentation

Support Matrix

DataHub VersionES 7.17ES 8.xOpenSearch 2.x
0.3.15+✅ Full✅ 8.17+✅ Full
Future✅ Full✅ Full✅ Full

FAQ

Q: Can I use the shim with existing deployments?

A: Yes, the shim is backward compatible. It is a thin abstraction layer over the existing code

Q: Can I use multiple search engines simultaneously?

A: No, DataHub connects to one search cluster at a time. Use the shim to switch between different engine types.

For additional support, please refer to the DataHub community forums or file an issue in the GitHub repository.