Difference between revisions of "Oracle/Find duplicates"

From YavInWiki
Jump to navigation Jump to search
m
Line 6: Line 6:


<source lang="oracle11">
<source lang="oracle11">
SELECT BOOK_UNIQUE_ID
SELECT *
    , PAGE_SEQ_NBR
   FROM my_table A
    , IMAGE_KEY
   FROM page_image A
  WHERE rowid >
  WHERE rowid >
   (SELECT min(rowid)
   (SELECT min(rowid)
       FROM page_image B
       FROM my_table B
     WHERE B.key1 = A.key1
     WHERE B.key1 = A.key1
       AND B.key2 = A.key2
       AND B.key2 = A.key2

Revision as of 13:56, 25 September 2012

Find and remove duplicate rows from a table

One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place primary key referential integrity constraints on a table, many shops do not use RI because they need the flexibility.

The most effective way to detect duplicate rows is to join the table against itself as shown below.

SELECT *
  FROM my_table A
 WHERE rowid >
   (SELECT min(rowid)
      FROM my_table B
     WHERE B.key1 = A.key1
       AND B.key2 = A.key2
       AND B.key3 = A.key3);

Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:

DELETE FROM table_name A
 WHERE A.rowid > ANY
   (SELECT B.rowid
      FROM table_name B
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);

You can also detect and delete duplicate rows using Oracle analytic functions:

DELETE FROM customer
 WHERE rowid IN
   (SELECT rowid FROM 
     (SELECT rowid
           , row_number()
        OVER (PARTITION BY custnbr ORDER BY custnbr) dup
        FROM customer)
     WHERE dup > 1);

As we see, there are several ways to detect and delete duplicate rows from Oracle tables.

Match null values. Instead of the following:

DELETE FROM table_name A
 WHERE A.rowid > ANY
   (SELECT B.rowid
      FROM table_name B
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);

I needed to do the following to get rid of all the dupes:

DELETE FROM table_name A
 WHERE A.rowid > ANY
  (SELECT B.rowid
     FROM table_name B
    WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
      AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));