Tuesday, August 11, 2020

Copy data from S3 to Aurora Postgres

Scenario 1: To copy data from S3 to Aurora Postgres (greater than v9 or latest)

How?: We can use aws_s3.table_import_from_s3 function (to migrate the data from S3 to Aurora Postgres). 

Steps:

A sample file with columns - id, prefix, mstr_id is copied to S3. 

Create schema on Aurora Postgres (with the required columns). 

drop table core.mstr;

CREATE TABLE core.mstr (
	id varchar(300) NULL,
	prefix varchar(300) NULL,
	mstr_id float8 NULL
);

Copy command to transfer the data from S3 to Aurora Postgres

SELECT aws_s3.table_import_from_s3(
   'core.MSTR', 
   'id,prefix,mstr_id', 
   '(format csv, header true)',
   '<bucket-name>',
   'MSTR_DATA/part_file_00.csv',
   'us-east-2',
   '<secret key>', '<access key>'
);


Note: If IAM roles are given, we need not specify access keys. 

SELECT aws_s3.table_import_from_s3(
   'core.MSTR', 
   'id,prefix,mstr_id', 
   '(format csv, header true)',
   '<bucket-name>',
   'MSTR_DATA/part_file_00.csv',
   'us-east-2'
);


Note: aws_s3.table_import_from_s3 function is not supported for versions earlier than Aurora Postgres Version 9.


Scenario 2: To copy data from S3 to Aurora Postgres (older versions than 9)





How? : Use SQOOP Job to export the data from S3 to Aurora Postgres. 

Steps

Upload data files to s3

Create schema on Aurora Postgres (with the required columns). 

drop table core.mstr;

CREATE TABLE core.mstr (
	id varchar(300) NULL,
	prefix varchar(300) NULL,
	mstr_id float8 NULL
);

Spin-up EMR Cluster
  • Must have access to S3 and Aurora Postgres
  • Check for connectivity: 
    • Whether EMR cluster and RDS reside in same VPC
      • dig <Aurora hostname>
      • nc -vz <hostname> (must get a message: connectivity looks good)
  • Make sure the security groups are properly assigned to the EMR Cluster. 
    • The security group must ALLOW traffic from CORE node of EMR Cluster
    • Must be added to the RDS Security Group as a source

Sqoop Job

echo "Exporting MSTR table from S3 to Postgres"
echo "Sqoop Job in Progress"

sqoop export 
--connect <jdbc URL>
 --input-null-string NULL 
 --input-null-non-string NULL 
 --table mstr 
 --export-dir s3://<bucket_name>/MSTR/ 
 --username user_etl 
 --P
 -- --schema test
echo "Sqoop Job is executed successfully"
echo "Data will now be available on Aurora Postgres"


No comments:

Post a Comment