Difference between revisions of "Oracle/Find duplicates"
m |
|||
| Line 6: | Line 6: | ||
<source lang="oracle11"> | <source lang="oracle11"> | ||
SELECT | SELECT * | ||
FROM my_table A | |||
FROM | |||
WHERE rowid > | WHERE rowid > | ||
(SELECT min(rowid) | (SELECT min(rowid) | ||
FROM | FROM my_table B | ||
WHERE B.key1 = A.key1 | WHERE B.key1 = A.key1 | ||
AND B.key2 = A.key2 | AND B.key2 = A.key2 | ||
Revision as of 13:56, 25 September 2012
Find and remove duplicate rows from a table
One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place primary key referential integrity constraints on a table, many shops do not use RI because they need the flexibility.
The most effective way to detect duplicate rows is to join the table against itself as shown below.
SELECT *
FROM my_table A
WHERE rowid >
(SELECT min(rowid)
FROM my_table B
WHERE B.key1 = A.key1
AND B.key2 = A.key2
AND B.key3 = A.key3);
Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:
DELETE FROM table_name A
WHERE A.rowid > ANY
(SELECT B.rowid
FROM table_name B
WHERE A.col1 = B.col1
AND A.col2 = B.col2);
You can also detect and delete duplicate rows using Oracle analytic functions:
DELETE FROM customer
WHERE rowid IN
(SELECT rowid FROM
(SELECT rowid
, row_number()
OVER (PARTITION BY custnbr ORDER BY custnbr) dup
FROM customer)
WHERE dup > 1);
As we see, there are several ways to detect and delete duplicate rows from Oracle tables.
Match null values. Instead of the following:
DELETE FROM table_name A
WHERE A.rowid > ANY
(SELECT B.rowid
FROM table_name B
WHERE A.col1 = B.col1
AND A.col2 = B.col2);
I needed to do the following to get rid of all the dupes:
DELETE FROM table_name A
WHERE A.rowid > ANY
(SELECT B.rowid
FROM table_name B
WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));