Migrate MySQL custom query data to Mongo Database

Subin Babukuttan
8 min readJan 15, 2021

PREREQUISITES

  • Mongo installed on a local machine
  • Python installed on a local machine
  • MySQL installed and PHPMyAdmin or Adminer installed on a local machine.
  • Basic knowledge of command line.
  • Knowledge of SQL and procedures.
  • Basic knowledge of Python.
  • Knowledge of Mongo

This blog is on how to migrate data from MySQL to the mongo database. There are various tools for migration of data from MySQL to mongo that are paid versions and some that are open source are not customizable to our needs.

We will learn how to migrate data through our own custom data using the select query.

This isn’t a full-fledged automated process but still can help you to migrate data or an entire database to mongo at ease and at a much faster speed.

There are 3 basic steps

  1. We would create a procedure that iterates over a table, that contains SQL queries for our custom data, and the procedure stores the data in a CSV format file in our local machine.
  2. A python script that transforms these CSV files to JSON data files.
  3. A shell script that imports all the JSON files to the mongo database.

Step one:

So you might be thinking about procedure, queries, CSV .

Let me explain in detail.

Let's first talk about the query, it's a simple select statement to fetch data from the table that needs to be migrated to the mongo database as a collection, but wait isn't collection is an array of JSON data??.

Indeed collection is an array of JSON data. Mongo is all about JSON.

So how can a select statement that yields result in columnar format can be dumped into the mongo database?

There is a function in MySQL JSON_OBJECT(), converts the data passed into the function as JSON string, the function accepts a comma separated JSON key field name and corresponding value, a value can be a string, number, or boolean, or even we can pass column name of the table.

So we have the tool to convert the MySQL table data to rows containing JSON string data.

For example

select json_object(‘_id’,customerNumber,’customerName’,customerName,’contactLastName’,contactLastName,’contactFirstName’,contactFirstName,’phone’,phone,’address’,json_object( ‘address1’, addressLine1, ‘address2’, addressLine2, ‘city’, city, ‘state’, state, ‘postalCode’, postalCode, ‘country’, country),’salesRepEmployeeNumber’,salesRepEmployeeNumber,’creditLimit’,creditLimit

) `rows`

from customers;

In above query, we are obtaining data from table customers and we used json_object to convert every row of the query to rows containing json string .

The json_object takes the key-value pair, in this case, consider _id as key and customerNumber column name as its value.

Okay so our JSON string is ready using this json_object function, but what about the nested objects ???

How can we do that ??

Simple we use the same function and pass a nested object to the key.

To clarify, look at the same above query, check the address key it's a nested object.

address’,json_object( ‘address1’, addressLine1, ‘address2’, addressLine2, ‘city’, city, ‘state’, state, ‘postalCode’, postalCode, ‘country’, country)

Here the key is the address and the value is defined in the json_object() function.

Snapshot of data how it looks like.

Just remember that mongo needs an _id field if not passed then it will create its own unique _id field.

So we have the query, now we want to write a procedure that can fetch these queries stored in a table.

Let's create such a table

CREATE TABLE `migrate_tables` ( `id` int(10) NOT NULL AUTO_INCREMENT, `table_to_migrate` varchar(500) NOT NULL, `migration_table` varchar(500) NOT NULL, `query` text NOT NULL, `is_migrated_table_created` enum(‘0’,’1',’2',’3') NOT NULL DEFAULT ‘0’, `is_migrated_file_created` enum(‘0’,’1',’2',’3') NOT NULL DEFAULT ‘0’, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

So let's look at the table structure,table_to_migrate contains table name which needs to be migrated for example customers table,migration_table contains a column named as migration_table where we want to store the JSON string data for example customers_migrate, the query column contains the query that fetches our custom data into rows of JSON string, is_migrated_table_created is a flag which determines the stages of migration data created

for instance, 0 indicates that the migrated table has not yet been created, 1 indicates the migrated table created and is_migrated_file_created is also a flag that tells about whether CSV file created or JSON file created, 0 means no file created 1 means CSV file created 2 means JSON file created.

The procedure fetches rows from the migrate_tables table where the is_migrated_table_created column set to 0 and then executes the query stored in the query column, creates and dump the data into the table with the name mentioned in the column migration_table, and then sets the flag is_migration_table_created to 1 .

The procedure then dumps the data from the newly created table to a CSV file with the name as the same as the table name. The procedure then updates the flag is_migrated_file_created to 1.

Query to dump data into csv file

select json_object(‘_id’,customerNumber,’customerName’,customerName,’contactLastName’,contactLastName,’contactFirstName’,contactFirstName,’phone’,phone,’address’,json_object( ‘address1’, addressLine1, ‘address2’, addressLine2, ‘city’, city, ‘state’, state, ‘postalCode’, postalCode, ‘country’, country),’salesRepEmployeeNumber’,salesRepEmployeeNumber,’creditLimit’,creditLimit

) `rows` from customers

into outfile ‘/mysql-files/customers_migrate.csv’ fields terminated by ‘$’ lines terminated by ‘\n’”

For the above query to work few things needs to be setup in our centos machine

  • create a folder named mysql-files under root directory and change its owner ship to mysql using chown command.
  • Edit mysql cnf file ,in centos its situated in /etc/my.cnf , add this statement secure_file_priv = /mysql-files,which sets mysql outfiles and load data file from this specified folder ,here our case the folder is mysql-files.
  • Now set SELinux security to permissive by using this command on centos machine, sudo setenforce 0
  • Restart MySQL service and apache service.

Once the above steps are done we can create CSV files in the mentioned directory.

Snapshot of CSV files in the mentioned directory

Step 2:

We have our data stored in our CSV files that are needed to be imported into the mongo database, in order to do that we need to first convert these files into JSON files.

Pretty straightforward just change the extension file to JSON that's it ..Na… that won't work.

We have rows of JSON string data, that's need to be converted into an array of JSON string data.

So we use python that is pretty much fast in I/O operations like reading and writing to a file and also easy to use because of its simple syntax.

The main concept is pretty much below lines

row = json.loads(rows[0])

jsonArray.append(row)

# Open a json writer, and use the json.dumps()

with open(jsonFilePath, ‘w’, encoding=’utf-8') as jsonf:

jsonString = json.dumps(jsonArray, indent=4)

jsonf.write(jsonString)

So in python script ,it connects to mysql database and fetches all records from a table called migrate_tables which earlier we have created and filters those records where is_migrated_table_created and is_migrated_file_created columns are set to 1.

So those records are iterated and from the migrated_table name we open the CSV file and read its contents and that contents is stored in variable named rows.

So rows[0] get data from the first column of the CSV file and the data is in JSON string format .

The code json.loads convert the JSON string to JSON object and then the object is appended to an array named as a row.

Once the file has been read and all the data has been converted to JSON object and appended to the array now we have to write the array containing JSON object to a JSON file .

The following code does that

# Open a json writer, and use the json.dumps()

with open(jsonFilePath, ‘w’, encoding=’utf-8') as jsonf:

jsonString = json.dumps(jsonArray, indent=4)

jsonf.write(jsonString)

Open the file in write mode with encoding set to utf8 and now we cannot directly dump our array of JSON objects to the file, as mongo import needs a JSON string data. So we have to convert our array to string,to do that we import a built-in package of python called json and use a function called dumps that accepts as JSON formatted object and optional options like the indent is used to just format the content in the readable format while writing to the file, just like the pretty() method in the mongo.

There is a small part that needs to be highlighted, that is the encoding while reading the CSV file using the file open method, wherein we can pass an argument named as the encoding of the file that we are reading. To know the encoding of a file we can use a built-in package called chardet which has a function called detect that returns the encoding of a file.

rawdata = open(file, ‘rb’).read()

result = chardet.detect(rawdata)

print(result)

return result

The above process is repeated for all the tables that has been migrated and the flags is_migrated_table_created and is_migrated_file_created columns are set to 1.

That's it, we are almost near to migrate our MySQL data to the mongo database, the last step is to import all these JSON files created to mongo DB.

You can find all the code and even the sample database that I have used for this project in the below GitHub link

https://github.com/mark38igor/mysql_mongo_migration.git

if any issues please comment in the comment box and will reply as early as possible.

Just one more step, and then we are done.

Step 3:

We are going to write some shell scripting, it's not that hard don't worry.

Any shell script must begin with this one line to recognize as a shell script file running

#!/bin/bash

And the file must be given executable rights by using this command

chmod +x filename

So why we want a shell script ??

We can import our data just using the mongo import command on the mongo shell but we have to manually execute the command for each file to imported, so if we have 8 tables in MySQL there will be 8 JSON files respectively and 8 times to execute the command and also we have to rename our collection name every time we execute the import command.

We are just automating the import process through our shell script, by getting each file name in our directory and then importing to mongo and creating collection name using our file name.

So script looks like

#!/bin/bash

# path where json files are stored

target=”/home/sbabukuttan/Documents/mysql to mongo migration/json/$1"

let count=0

# iterate over the directory mentioned above

for f in “$target”/*

do

# get filename with extension

echo $(basename “$f”)

let count=count+1

# split the filename using the delimiter character _ and use first set of characters for the collection name in mongo

collection_name=$(echo $(basename “$f”)| cut -d’_’ -f 1)

echo $collection_name

# execute mongo import command

mongoimport /home/sbabukuttan/Documents/mysql\ to\ mongo\ migration/json/$(basename “$f”) -d classicModels -c $collection_name — jsonArray — drop

done

echo “”

echo “Count: $count”

Run the above script and we are done importing and migrating MySQL custom query data to our mongo DB

Here is the snapshot of migrated data to our mongo and snapshot of our MySQL data

Thanks for staying so far, would love to hear your feedback and anything can be improved.

Please note, change the pathname to the location where you want to store the JSON files in python script as well as the pathname in shell script where the JSON files are located.

Below Link where you can find the full code as well as the sample database that I have used.

https://github.com/mark38igor/mysql_mongo_migration.git

This is my first blog content, so my apologies in case you did not find it professional. Kindly suggest any improvements in the comments section.

--

--