Copy command that I have used for a while:
copy <redshift table name>
from '<s3 file location>'
iam_role '<iam role arn>'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
IGNOREHEADER 1;;
Problem 1: Faced issue with delimiters on CSV file. As we know the delimiter for any CSV file will be comma separators, but the problem on keeping this as a delimiter is that it will fail with below error if the data also have commas in it.
Error Code: 1213, Error Description: Missing data for not-null field
AWS Documentation link that helped in solving this issue:
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#load-from-csv
Solution 1.1: As the issue is with comma as a delimiter, I have specified the file format as CSV and ignored the headers. This fixed the issue.
copy <Redshift Table Name>
from 's3://<location of the file>'
iam_role '<iam role arn>'
csv
IGNOREHEADER 1;;
Solution 1.2: Using ESCAPE, and remove quotes should work as well
copy <redshift table name>
from '<s3 file location>'
iam_role '<iam role arn>'
delimiter ','
escape
REMOVEQUOTES
IGNOREHEADER 1;
Problem 2: Redshift doesn't support materialized views. I have tried copying views directly to Redshift from S3, which resulted in below error:
Error: psycopg2.errors.InternalError_: Invalid table DETAIL: ----------------------------------------------- error: Invalid table code: 8001 context: ************ is not a table. Views cannot be loaded with the COPY command. query: 0 location: compression_analyzer.cpp:432 process: padbmaster [pid=16256] -----------------------------------------------
Solution 2.1: Create views as a table and then COPY, if you don't care its either a view or a table on Redshift.
Solution 2.2: First COPY all the underlying tables, and then CREATE VIEW on Redshift.
FYI, ***Problem with JSON format****
The format of the file which you are trying to copy also plays an important role in fixing the issues related to the COPY command. I'm using CSV files for most of the test runs because I faced a limitation of 4MB while dealing with JSON formats. JSON parsers related to redshift have more problems to deal with. FYI, here is the error info I got related to JSON:
psycopg2.errors.InternalError_: The total size of the JSON object exceeds the max limit of 4194304 bytes
DETAIL:
----------------------------------------------
error: The total size of the JSON object exceeds the max limit of 4194304 bytes
code: 8001
--------------------------------------------------
Problem 3: Delimiter not found while running copy command
Try with FILLRECORD - where missing columns will be with zero-length strings or NULLs
copy <redshift table name>
from '<s3 file location>'
iam_role '<iam role arn>'
csv
IGNOREHEADER 1
FILLRECORD;
Problem 4: When there are issues with timestamps and the file format is not csv but comma separated
solution:
copy <redshift table name>
from 's3://location'
iam_role 'iam role arn'
delimiter ','
escape
NULL as 'null'
REMOVEQUOTES
TIMEFORMAT 'auto';
I was able to differentiate Aurora RDS and Redshift Limitations at much granular detail with the help of this practical blog. I can imagine the number of hours I saved by reading this.
ReplyDelete