Sqoop export optimization

Sqoop export command getting the data from HDFS to RDBMS. This export consists of below three tasks.

1. Sqoop export command generates Java class
2. Map Reduce Job submitted to input split
3. Each Mapper – transfer the data to RDBMS


Image Courtesy: http://www.cnblogs.com/byrhuangqiang/p/3922594.html

Sqoop export in Batch option
Sqoop exports each row at a time comparatively it is slow. We can optimized the speed of insertion by utilizing Sqoop JDBC interface batch (insert multiple rows together) insertion option.

Sqoop export \
–connect jdbc:mysql://hostname-or-ip/<db-name> \
–username <username> \
–password <password> \
–export-dir <path-in-rdbms> –batch

Sqoop export in Staging Table Option
This option is nothing but atomic export concept. Atomic export concept is known as it will do the complete job if anything happen in between and not able to complete the job then it will do nothing (completely exit).

In this stating table export option the export operation is happening in staging table. When the export of all the data is successful, then it will write the data in the table. If any parallel job by sqoop fails, then the data won’t transfer any data to the destination table.

sqoop export \
–connect jdbc:mysql://<host name-or-ip>/<db-name> \
–username <user-name>  \
–password <password> \
— export-dir <directory-path-in-DBMS-side>  –staging-table  <staging_tablename>

Leave a Reply