tutorials3 Min Read

How to sync Mongodb data to ElasticSearch by using MongoConnector

Gorav Singal

September 06, 2019

TL;DR

Use MongoConnector to continuously replicate MongoDB collections to Elasticsearch indexes, enabling full-text search and Grafana visualizations on your MongoDB data.

How to sync Mongodb data to ElasticSearch by using MongoConnector

Introduction

This post is about syncing your mongodo database data to ElasticSearch. There might be several scenarios where you want to quickly search some data, or expose a search api, or running Grafana to visualize your data.

Mongo-Connector

MongoConnector is an open source tool which is to sync MongoDB data to ElasticSearch. You can run it periodically or continously. And, it can sync all the changes in your MongoDB data. ElasticSearch will have a replica of the MongoDB data.

You can configure which are the MongoDB collections you want to sync and with what names their indexes should be made in ElasticSearch.

Requirements

  1. MongoDB Replica Set You need a MongoDB replica set. A standalone instance will not work.

  2. ElasticSearch Cluster

  3. mongo-connector utility

How to create MongoDB replica set with Docker

See: Run MongoDB replica set with Docker

How to create ElasticSearch cluster with Docker

See: Run Elastic Search Cluster with Docker

How to get mongo-connector

You need to have python installed, and install it via pip:

pip install 'mongo-connector[elastic5]' 'elastic2-doc-manager[elastic5]'

Or, you can prepare its docker image too. See below Dockerfile:

FROM python:3-alpine
RUN apk add --no-cache curl sed && pip install 'mongo-connector[elastic5]' 'elastic2-doc-manager[elastic5]'
ENTRYPOINT ["mongo-connector"]

To build docker image:

docker build -t my_mongoconnector .

Run MongoConnector

MongoConnector Config

You should prepare a config file(name=mongoconnector.json):

{
   "oplogFile": "<your desired path>/oplog.timestamp",
   "noDump": false,
   "batchSize": 50,
   "verbosity": 2,
   "continueOnError": true,
   "logging": {
       "type": "stream"
   },
   "namespaces": {
        "mydb.coll1": {
            "rename": "mydb_coll1._doc"
        },
        "mydb.trainings": {
            "rename": "mydb_trainings._doc"
        }
    },
   "docManagers": [
       {
           "docManager": "elastic2_doc_manager",
           "targetURL": "<elastic search hostname>:9200",
           "bulkSize": 10,
           "uniqueKey": "_id",
           "args": {
              "clientOptions": {"timeout": 5000}
           }
       }
   ]
}

In above config file:

  • oplogFile - Its a file where mongo-connector will write a timestamp where it left syncing. So that even if it stopped, it can start syncing from the place where it left.
  • namespaces - which are the collections you want to sync, and with what names they will go in Elastic Search
  • docManagers - Configuration about your elastic search cluster.

Run

mongo-connector -m "mongodb://<mongoset1>:27017,<mongoset2>:27018,<mongoset3>:27019/<your db>?replicaSet=your-replicaset-name" -c ./mongoconnector.json

If everything is fine, it will start syncing your MongoDB data to ElasticSearch you specified.

Sample output

2019-09-06 08:17:05,189 [ALWAYS] mongo_connector.connector:50 - Starting mongo-connector version: 3.1.1
2019-09-06 08:17:05,189 [ALWAYS] mongo_connector.connector:50 - Python version: 3.6.8 (default, Apr 25 2019, 21:02:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
2019-09-06 08:17:05,190 [ALWAYS] mongo_connector.connector:50 - Platform: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
2019-09-06 08:17:05,191 [ALWAYS] mongo_connector.connector:50 - pymongo version: 3.9.0
2019-09-06 08:17:05,204 [ALWAYS] mongo_connector.connector:50 - Source MongoDB version: 4.2.0
2019-09-06 08:17:05,204 [ALWAYS] mongo_connector.connector:50 - Target DocManager: mongo_connector.doc_managers.elastic2_doc_manager version: 1.0.0
2019-09-06 08:17:05,225 [INFO] mongo_connector.oplog_manager:137 - OplogThread: Initializing oplog thread
2019-09-06 08:17:05,227 [INFO] mongo_connector.connector:402 - MongoConnector: Starting connection thread MongoClient(host=['mongoset1:27018', 'mongoset1:27017', 'mongoset1:27019'], document_class=dict, tz_aware=False, connect=True, replicaset='your-replica-set')
2019-09-06 08:17:05,241 [INFO] elasticsearch:83 - GET http://<es-hostname>:9200/_mget?realtime=true [status:200 request:0.007s]
2019-09-06 08:17:05,356 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_bulk [status:200 request:0.110s]
2019-09-06 08:17:05,477 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_refresh [status:200 request:0.121s]
2019-09-06 08:17:05,484 [INFO] elasticsearch:83 - GET http://<es-hostname>:9200/_mget?realtime=true [status:200 request:0.006s]
2019-09-06 08:17:05,616 [INFO] elasticsearch:83 - POST http://<es-hostname>9200/_bulk [status:200 request:0.129s]
2019-09-06 08:17:05,744 [INFO] elasticsearch:83 - POST http://<es-hostname>:9200/_refresh [status:200 request:0.128s]
.
.
.
.
.
2019-09-06 08:18:35,294 [INFO] mongo_connector.oplog_manager:78 - OplogThread for replica set 'your replica set' is up to date with the oplog.
2019-09-06 08:19:05,324 [INFO] mongo_connector.oplog_manager:78 - OplogThread for replica set 'your replica set' is up to date with the oplog.

And it will update the timestmap in that oplog file.

Share

Related Posts

How to connect Php docker container with Mongo DB docker container

How to connect Php docker container with Mongo DB docker container

Goto your command terminal. Type: This will expose port: 27017 by default. You…

How to Copy Local Docker Image to Another Host Without Repository and Load

How to Copy Local Docker Image to Another Host Without Repository and Load

Introduction Consider a scenario where you are building a docker image on your…

How to connect to a running mysql service on host from a docker container on same host

How to connect to a running mysql service on host from a docker container on same host

Introduction I have a host running mysql (not on a container). I have to run an…

How to run MongoDB replica set on Docker

How to run MongoDB replica set on Docker

Introduction This post is about hosting MongoDB replica set cluster with…

Docker: unauthorized: incorrect username or password.

Docker: unauthorized: incorrect username or password.

While running docker commands with some images, I started getting error: The…

Common used Elastic Search queries

Common used Elastic Search queries

Listing down the commonly used Elastic Search queries. You can get search…

Latest Posts

AI Video Generation in 2025 — Models, Costs, and How to Build a Cost-Effective Pipeline

AI Video Generation in 2025 — Models, Costs, and How to Build a Cost-Effective Pipeline

AI video generation went from “cool demo” to “usable in production” in 2024-202…

AI Models in 2025 — Cost, Capabilities, and Which One to Use

AI Models in 2025 — Cost, Capabilities, and Which One to Use

Choosing the right AI model is one of the most impactful decisions you’ll make…

AI Image Generation in 2025 — Models, Costs, and How to Optimize Spend

AI Image Generation in 2025 — Models, Costs, and How to Optimize Spend

Generating one image with AI costs between $0.002 and $0.12. That might sound…

AI Coding Assistants in 2025 — Every Tool Compared, and Which One to Actually Use

AI Coding Assistants in 2025 — Every Tool Compared, and Which One to Actually Use

Two years ago, AI coding meant one thing: GitHub Copilot autocompleting your…

AI Agents Demystified — It's Just Automation With a Better Brain

AI Agents Demystified — It's Just Automation With a Better Brain

Let’s cut through the noise. If you read Twitter or LinkedIn, you’d think “AI…

Supply Chain Security — Protecting Your Software Pipeline

Supply Chain Security — Protecting Your Software Pipeline

In 2024, a single malicious contributor nearly compromised every Linux system on…