Thursday, April 21, 2011

Postgres csv import duplicate key error?

Hi, I am importing a CSV file to postgres.

copy product from '/tmp/a.csv' DELIMITERS ',' CSV;
ERROR:  duplicate key value violates unique constraint "product_pkey"
CONTEXT:  COPY product, line 13: "1,abcd,100 pack"

What is the best way to avoid this error.. Would I have to write a python script to handle this error..

From stackoverflow
  • Well, the best way would be to filter the data not to contain duplicates. It's usually pretty easy, and doesn't require a lot of programming.

    For example:

    Assuming 1st column of your data is data for primary key and the file is not very large (let's say les than 60% of your ram), you could:

    awk -F, '(!X[$1]) {X[$1]=1; print $0}' /tmp/a.csv > /tmp/b.csv
    

    and load /tmp/b.csv instead.

    If the file is larger, then I would suggest something like this:

    sort /tmp/a.csv | awk -F, 'BEGIN {P="\n"} ($1 != P) {print $0; P=$1}' > /tmp/b.csv
    

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.