Search This Blog

Friday, March 19, 2010

How to remove duplicate rows from a teradata table

This method uses a temporary table to remove the duplicate rows.
if a table contains duplicate rows means
It should be
1. MultiSet tabel.
2. should not contain UPI.

let us assume we need to remove duplicate rows from tab1.
the table is created as
create multiset table tab1(col1 integer,col2 integer);
populate the table with duplicate rows.

STEP1:
create a temporary SET/MULTISET table () with the following query.
create set table temp_tab1 as tab1 with no data;
this will create a temporary set table which does not allow duplicates.

STEP2:
populate the temporary temporary set table by removing the duplicate rows from the original table. this can be done using the following query.

insert into temp_tab1 select * from tab1 group by col1,col2 having count(*) > 1;
(or)
insert into temp_tab1 select distinct * from tab1 ;

STEP3:
delete all the duplicate rows from the original table.(this query rows all the duplicate rows including the base row).
 delete from tab1
where (col1,col2)
in (select col1,col2 from temp_tab1);

STEP4:
insert the rows from the temp table to the base table using the following query.
insert into tab1
sel * from temp_tab1;

there are two more methods to delete duplicate rows without using the temporary
table using the Teradata Fastload Utility and Teradata Warehouse Builder.

Method1 :
Use Fast Export to write the rows from the source table into a file.
Then Delete all rows from the source table.
then use FastLoad to load the target table.The file to load the table is generated from the FastExport.

Method 2:
Use Teradata WareHouseBuilder (TWB) with 2 steps.
in the first step use data connector as consumer and Export operator as producer.
in the second step use Dataconnector as producer and load operator as consumer.

these are based on the fact that Fastload cannot load duplicate rows.


No comments:

Post a Comment