Finding Duplicate records and Deleting Duplicate records in TERADATA

Requirement:

Finding duplicates and removing duplicate records by retaining original record in TERADATA

Suppose I am working in an office and My boss told me to enter the details of a person who entered in to office. I have below table structure.

Create Table DUP_EXAMPLE

(

PERSON_NAME VARCHAR2(50),

PERSON_AGE INTEGER,

ADDRS VARCHAR2(150),

PURPOSE VARCHAR2(250),

ENTERED_DATE DATE

)

If a person enters more than once then I have to insert his details more than once.

First time, I inserted below records.

INSERT INTO DUP_EXAMPLE VALUES('Krishna reddy','25','BANGALORE','GENERAL',TO_DATE('01-JAN-2014','DD-MON-YYYY'))

INSERT INTO DUP_EXAMPLE VALUES('Anirudh Allika','25','HYDERABAD','GENERAL',TO_DATE('01-JAN-2014','DD-MON-YYYY'))

INSERT INTO DUP_EXAMPLE VALUES('Ashok Vunnam','25','CHENNAI','INTERVIEW',TO_DATE('01-JAN-2014','DD-MON-YYYY'))

And on same day the person named Ashok came again to office and I entered once again into table.

INSERT INTO DUP_EXAMPLE VALUES ('Ashok Vunnam','25','CHENNAI','INTERVIEW',TO_DATE('01-JAN-2014','DD-MON-YYYY'))

Now, I have below data in the table.

SELECT * FROM DUP_EXAMPLE

PERSON_NAME	PERSON_AGE	ADDRS	PURPOSE	ENTERED_DATE
Krishna reddy	25	BANGALORE	GENERAL	01-JAN-2014
Anirudh Allika	25	HYDERABAD	GENERAL	01-JAN-2014
Ashok Vunnam	25	CHENNAI	INTERVIEW	01-JAN-2014
Ashok Vunnam	25	CHENNAI	INTERVIEW	01-JAN-2014

I have a requirement to get the person details that who entered more than once in a day. So, now I have to run below query to get correct result set.

We can write this query in two ways.

1) First Option:

SELECT

PERSON_NAME,

PERSON_AGE,

ADDRS,

PURPOSE,

ENTERED_DATE,

COUNT(*)

FROM DUP_EXAMPLE

GROUP BY 1,2,3,4,5

HAVING COUNT(*)>1

2) Second Option:

SELECT

PERSON_NAME,

PERSON_AGE,

ADDRS,

PURPOSE,

ENTERED_DATE,

ROW_NUMBER() OVER(PARTITION BY PERSON_NAME,PERSON_AGE,ADDRS,PURPOSE,ENTERED_DATE ORDER BY PERSON_NAME,PERSON_AGE,ADDRS,PURPOSE,ENTERED_DATE) AS RECORD_NUMBER

FROM DUP_EXAMPLE

WHERE RECORD_NUMBER > 1

And we can delete duplicate records by retaining original record using below query.

DELETE FROM DUP_EXAMPLE

WHERE ROW_NUMBER() OVER(PARTITION BY PERSON_NAME,PERSON_AGE,ADDRS,PURPOSE,ENTERED_DATE ORDER BY PERSON_NAME,PERSON_AGE,ADDRS,PURPOSE,ENTERED_DATE) > 1

Note: Wherever you go for interview, you will face this question How to find duplicates and how to delete duplicate records by retaining original record.

Comments

UnknownJuly 12, 2017 at 2:29 PM
Ordered analytical functions are not allowed in WHERE Clause anymore in teradata
ReplyDelete
Replies
UnknownNovember 7, 2017 at 1:51 AM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

IT Pandora (ETL,DB,UNIX,AWS,NodeJS,Github etc.)

Search This Blog

Finding Duplicate records and Deleting Duplicate records in TERADATA

Comments

Post a Comment

Popular posts from this blog

Comparing Objects in Informatica

Target Load Type - Normal or Bulk in Session Properties

SCD Type 2 Implementation in Informatica using dynamic lookup