soletechie: Hive(EMR) and DynamoDB Integration

Follow below steps to move data from Hive to Dynamo DB.
Source link: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/EMRforDynamoDB.html

Below is just a quick summary from the above documentation link.

1. Create a hive table. (Ignore this step if hive table is already created)

CREATE TABLE hive_features
(feature_id BIGINT,
feature_name STRING ,
feature_class STRING ,
state_alpha STRING,
prim_lat_dec DOUBLE ,
prim_long_dec DOUBLE ,
elev_in_ft BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n';

2. Now create another external hive table which just references to the hive table (but provides the mapping between Hive and DDB tables)

CREATE EXTERNAL TABLE ddb_features
(feature_id BIGINT,
feature_name STRING,
feature_class STRING,
state_alpha STRING,
prim_lat_dec DOUBLE,
prim_long_dec DOUBLE,
elev_in_ft BIGINT)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES(
"dynamodb.table.name" = "Features",
"dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
);

3. Now insert data into Dynamo DB table from Hive table (for which Hive internally uses Map-Reduce Job to do this copy)

INSERT OVERWRITE TABLE ddb_features
SELECT
feature_id,
feature_name,
feature_class,
state_alpha,
prim_lat_dec,
prim_long_dec,
elev_in_ft
FROM hive_features;

4. we can use hive QL to query DDB table

SELECT DISTINCT feature_class
FROM ddb_features
ORDER BY feature_class;

SELECT feature_name, state_alpha
FROM ddb_features
WHERE feature_class = 'Lake'
AND feature_name LIKE 'M%'
ORDER BY feature_name;

SELECT state_alpha, feature_class, COUNT(*)
FROM ddb_features
WHERE elev_in_ft > 5280
GROUP by state_alpha, feature_class
HAVING COUNT(*) >= 3
ORDER BY state_alpha, feature_class;

Source link:

https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/EMRforDynamoDB.html

soletechie

Tuesday, August 13, 2019

Hive(EMR) and DynamoDB Integration

No comments:

Post a Comment

Pages

Search This Blog