Elasticsearch & OpenSearch Multi-Client Shim
This guide explains how to use DataHub's multi-client search engine shim to support different versions of Elasticsearch and OpenSearch through a unified interface.
Overview
DataHub's search client shim provides seamless support for:
- Elasticsearch 7.17
- Elasticsearch 8.17+
- OpenSearch 2.x with full REST high-level client support
This enables smooth migrations between different search engine versions while maintaining backward compatibility with existing DataHub deployments.
Architecture
Core Components
The shim consists of several key components:
SearchClientShim
- Main abstraction interfaceSearchClientShimFactory
- Factory for creating appropriate client implementations- Implementation Classes - Concrete implementations for each search engine:
Es7CompatibilitySearchClientShim
- ES 7.17Es8SearchClientShim
- ES 8.17+OpenSearch2SearchClientShim
- OpenSearch 2.x
Supported Configurations
Source Engine | Target Engine | Shim Implementation | Status |
---|---|---|---|
DataHub → ES 7.17 | ES 7.17 | Es7CompatibilitySearchClientShim | ✅ Complete |
DataHub → ES 8.17+ | ES 8.17+ | Es8SearchClientShim | ✅ Complete |
DataHub → OpenSearch 2.x | OpenSearch 2.x | OpenSearch2SearchClientShim | ✅ Complete |
Configuration
Environment Variables
Configure the shim using these environment variables:
# Enable the search client shim (required)
ELASTICSEARCH_SHIM_ENABLED=true
# Specify engine type (or use AUTO_DETECT)
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# Options: AUTO_DETECT, ELASTICSEARCH_7, ELASTICSEARCH_8, OPENSEARCH_2
# Enable auto-detection (recommended)
ELASTICSEARCH_SHIM_AUTO_DETECT=true
application.yaml Configuration
Alternatively, configure via application.yaml:
elasticsearch:
host: localhost
port: 9200
username: ${ELASTICSEARCH_USERNAME:#{null}}
password: ${ELASTICSEARCH_PASSWORD:#{null}}
useSSL: false
# Standard Elasticsearch configuration...
# Multi-client shim configuration
shim:
enabled: true # Enable shim
engineType: AUTO_DETECT # or specific type
autoDetectEngine: true # Auto-detect cluster type
Migration Scenarios
Scenario 1: Elasticsearch 7.17 → Elasticsearch 8.x
This is the most common migration path.
Step 1: Enable the shim
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
Step 2: Verify connection
# Check logs for successful connection
Scenario 2: Elasticsearch 7.17 → OpenSearch 2.x
Direct migration from Elasticsearch to OpenSearch 2.x.
Configuration:
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=OPENSEARCH_2
ELASTICSEARCH_SHIM_AUTO_DETECT=true
Scenario 3: Auto-Detection (Recommended)
Let DataHub automatically detect your search engine type:
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
ELASTICSEARCH_SHIM_AUTO_DETECT=true
The shim will:
- Connect to your search cluster
- Identify the engine type and version
- Select the appropriate client implementation
Deployment Guide
Docker Compose
Update your docker-compose.yml
:
services:
datahub-gms:
environment:
- ELASTICSEARCH_SHIM_ENABLED=true
- ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# ... other ES config
Kubernetes
Update your deployment manifests:
apiVersion: apps/v1
kind: Deployment
metadata:
name: datahub-gms
spec:
template:
spec:
containers:
- name: datahub-gms
env:
- name: ELASTICSEARCH_SHIM_ENABLED
value: "true"
- name: ELASTICSEARCH_SHIM_ENGINE_TYPE
value: "AUTO_DETECT"
# ... other configuration
Helm
Update your values.yaml
:
global:
elasticsearch:
shim:
enabled: true
engineType: "AUTO_DETECT"
autoDetectEngine: true
Validation and Testing
Verify Shim Configuration
- Check logs for shim initialization:
docker logs datahub-gms | grep -i "shim\|search"
Look for messages like:
INFO Creating SearchClientShim for engine type: ELASTICSEARCH_7
INFO Auto-detected search engine type: ELASTICSEARCH_7
- Test search functionality in DataHub UI:
- Search for datasets
- Browse data assets
- Check that lineage is working
- Monitor performance during transition:
- Watch for connection errors
- Check response times
- Monitor resource usage
Common Validation Steps
# 1. Check DataHub health endpoint
curl http://localhost:8080/health
# 2. Verify search index access
curl -u user:pass "http://elasticsearch:9200/_cat/indices?v"
# 3. Test search functionality
curl -X POST "http://localhost:8080/api/graphql" \
-H "Content-Type: application/json" \
-d '{"query": "{ search(input: {type: DATASET, query: \"*\"}) { total }}"}'
Troubleshooting
Common Issues
1. Connection Failures
ERROR: Unable to connect to search cluster
Solutions:
- Verify
ELASTICSEARCH_HOST
andELASTICSEARCH_PORT
- Check network connectivity between DataHub and search cluster
- Ensure credentials are correct
- Verify SSL/TLS configuration (ES8 Containers use SSL by default so if you previously weren't this may cause issues)
2. Auto-Detection Failures
ERROR: Unable to detect search engine type
Solutions:
- Manually specify engine type:
ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
- Check cluster health:
curl http://elasticsearch:9200/_cluster/health
- Verify authentication credentials
3. API Compatibility Issues
ERROR: Incompatible API version
Solutions:
- Check Elasticsearch version compatibility
- Review deprecation warnings in ES logs
4. Dependency Issues
ERROR: ClassNotFoundException for ES client
Solutions:
- Ensure correct client dependencies are included in classpath
- Check
build.gradle
for required dependencies - Rebuild DataHub with appropriate client libraries
Debug Mode
Enable debug logging to troubleshoot issues:
# Add to environment
DATAHUB_LOG_LEVEL=DEBUG
ELASTICSEARCH_SHIM_DEBUG=true
Performance Monitoring
Monitor key metrics during migration:
# Connection pool metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.connections"
# Search operation metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.search"
# Error rates
curl "http://localhost:8080/actuator/metrics/elasticsearch.errors"
Best Practices
Pre-Migration
- Backup your data before changing search engine configuration
- Test in staging with representative data volumes
- Monitor resource usage patterns in current deployment
- Document current configuration for rollback scenarios
During Migration
- Enable auto-detection initially for smooth transition
- Monitor logs closely for connection and performance issues
- Test all search functionality after configuration changes
Post-Migration
- Update documentation with new configuration
- Monitor performance metrics for several days
- Plan for future upgrades (ES 8.x native support)
- Train team members on new configuration options
Future Enhancements
Planned Features
- OpenSearch 3.x support when available
- Enhanced AWS IAM authentication for all client types
- Advanced feature detection and capability querying
Contributing
To extend the shim for additional search engines:
- Implement
SearchClientShim
interface - Add engine type to
SearchEngineType
enum - Update factory logic in
SearchClientShimFactory
- Add configuration options to application.yaml
- Write tests and documentation
Support Matrix
DataHub Version | ES 7.17 | ES 8.x | OpenSearch 2.x |
---|---|---|---|
0.3.15+ | ✅ Full | ✅ 8.17+ | ✅ Full |
Future | ✅ Full | ✅ Full | ✅ Full |
FAQ
Q: Can I use the shim with existing deployments?
A: Yes, the shim is backward compatible. It is a thin abstraction layer over the existing code
Q: Can I use multiple search engines simultaneously?
A: No, DataHub connects to one search cluster at a time. Use the shim to switch between different engine types.
For additional support, please refer to the DataHub community forums or file an issue in the GitHub repository.