Difference between revisions of "Oracle/Find duplicates"
| Line 5: | Line 5: | ||
The most effective way to detect duplicate rows is to join the table against itself as shown below. | The most effective way to detect duplicate rows is to join the table against itself as shown below. | ||
<source lang=" | <source lang="oracle11"> | ||
SELECT BOOK_UNIQUE_ID | SELECT BOOK_UNIQUE_ID | ||
, PAGE_SEQ_NBR | , PAGE_SEQ_NBR | ||
| Line 20: | Line 20: | ||
Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below: | Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below: | ||
<source lang=" | <source lang="oracle11"> | ||
DELETE FROM table_name A | DELETE FROM table_name A | ||
WHERE A.rowid > ANY | WHERE A.rowid > ANY | ||
| Line 31: | Line 31: | ||
You can also detect and delete duplicate rows using Oracle analytic functions: | You can also detect and delete duplicate rows using Oracle analytic functions: | ||
<source lang=" | <source lang="oracle11"> | ||
DELETE FROM customer | DELETE FROM customer | ||
WHERE rowid IN | WHERE rowid IN | ||
| Line 46: | Line 46: | ||
Match null values. Instead of the following: | Match null values. Instead of the following: | ||
<source lang=" | <source lang="oracle11"> | ||
DELETE FROM table_name A | DELETE FROM table_name A | ||
WHERE A.rowid > ANY | WHERE A.rowid > ANY | ||
| Line 57: | Line 57: | ||
I needed to do the following to get rid of all the dupes: | I needed to do the following to get rid of all the dupes: | ||
<source lang=" | <source lang="oracle11"> | ||
DELETE FROM table_name A | DELETE FROM table_name A | ||
WHERE A.rowid > ANY | WHERE A.rowid > ANY | ||
Revision as of 09:42, 7 December 2009
Find and remove duplicate rows from a table
One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place primary key referential integrity constraints on a table, many shops do not use RI because they need the flexibility.
The most effective way to detect duplicate rows is to join the table against itself as shown below.
SELECT BOOK_UNIQUE_ID
, PAGE_SEQ_NBR
, IMAGE_KEY
FROM page_image A
WHERE rowid >
(SELECT min(rowid)
FROM page_image B
WHERE B.key1 = A.key1
AND B.key2 = A.key2
AND B.key3 = A.key3);
Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:
DELETE FROM table_name A
WHERE A.rowid > ANY
(SELECT B.rowid
FROM table_name B
WHERE A.col1 = B.col1
AND A.col2 = B.col2);
You can also detect and delete duplicate rows using Oracle analytic functions:
DELETE FROM customer
WHERE rowid IN
(SELECT rowid FROM
(SELECT rowid
, row_number()
OVER (PARTITION BY custnbr ORDER BY custnbr) dup
FROM customer)
WHERE dup > 1);
As we see, there are several ways to detect and delete duplicate rows from Oracle tables.
Match null values. Instead of the following:
DELETE FROM table_name A
WHERE A.rowid > ANY
(SELECT B.rowid
FROM table_name B
WHERE A.col1 = B.col1
AND A.col2 = B.col2);
I needed to do the following to get rid of all the dupes:
DELETE FROM table_name A
WHERE A.rowid > ANY
(SELECT B.rowid
FROM table_name B
WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));