Sqoop import and export

JDBC Drivers
The Sqoop import or export operations (The Data from RDBMS import to Sqoop or Data from HDFS export to RDBMS) are done by help of JDBC drivers. In Sqoop the drivers are not bundled because of licensing issue. However most of the providers come with free drivers in their site. Here are some URLs for JDBC driver download.

MS SQL server – Hadoop JDBC connector
http://www.microsoft.com/en-us/download/details.aspx?id=27584

MySQL – Hadoop JDBC connector
http://dev.mysql.com/downloads/connector/j/

Oracle SQL – Hadoop JDBC connector
http://www.oracle.com/technetwork/database/database-technologies/bdc/big-data-connectors/downloads/index.html

The JDBC driver file (typically Jar file) needs to be placed lib folder under your Sqoop installation.

sqoop-import-export

Image courtesy: http://www.slideshare.net/aaamase/sqoop2-refactoring-for-generic-data-transfer-hadoop-strata

Sqoop import – From RDBMS to HDFS

From MySQL to HDFS
-If you have primary key in the source table

sqoop import --connect jdbc:mysql://hostname or ip/<database-name> --username <username> --password <password> --table <tablename>
target-dir <target directory in HDFS>


-If you don’t have primary key in the source table

sqoop import --connect jdbc:mysql://hostname or ip/<database-name> --username <username> --password <password> --table <tablename>
target-dir <target directory in HDFS> -m 1

-m 1 indicates output should be in one mapper – output in one directory with one part-00000 file.

From DB2 to HDFS

sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://hostnameorip:50000/ --username db2user --db2pwd --table db2tbl --split-by tbl_primarykey --target-dir /hdfs-target-dir

From MS SQL Server to HDFS

sqoop import --connect 'jdbc:sqlserver://hostname or ip;username=<dbusername>;password=<dbpasswd>;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 1

From Oracle to HDFS

sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=)(port=1521))(connect_data=(service_name=)))" \
    --username USER --table SCHEMA.TABLE_NAME –target-dir /hdfs-target-dir --num-mappers 1

 

Exporting data from HDFS to RDBMS
Sqoop connect to RDBMS to gather metadata of the table – use by sqoop to generate the java class – get the map only mapreduce job submitted into input split (in import it is boundary query) and each mapper transfer the data to RDBMS.

HDFS to My SQL
Sqoop HDFS tp My SQL ExportHDFS to DB2

sqoop export --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/myDB --username <username> --password <password> --table <table-name>

From HDFS to Oracle

sqoop export --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=<hostname>)(port=1521))(connect_data=(service_name=<your-service-name>)))" \   --username USER --table SCHEMA.TABLE_NAME

Leave a Reply