Difference between revisions of "Oracle/Find duplicates"

From YavInWiki
Jump to navigation Jump to search
m
 
(3 intermediate revisions by the same user not shown)
Line 5: Line 5:
The most effective way to detect duplicate rows is to join the table against itself as shown below.
The most effective way to detect duplicate rows is to join the table against itself as shown below.


<source lang="oracle8">
<syntaxhighlight lang="sql">
SELECT BOOK_UNIQUE_ID
SELECT *
    , PAGE_SEQ_NBR
   FROM my_table A
    , IMAGE_KEY
   FROM page_image A
  WHERE rowid >
  WHERE rowid >
   (SELECT min(rowid)
   (SELECT min(rowid)
       FROM page_image B
       FROM my_table B
     WHERE B.key1 = A.key1
     WHERE B.key1 = A.key1
       AND B.key2 = A.key2
       AND B.key2 = A.key2
       AND B.key3 = A.key3);
       AND B.key3 = A.key3);
</source>
</syntaxhighlight>


Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:
Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:


<source lang="oracle8">
<syntaxhighlight lang="sql">
DELETE FROM table_name A
DELETE FROM table_name A
  WHERE A.rowid > ANY
  WHERE A.rowid > ANY
Line 27: Line 25:
     WHERE A.col1 = B.col1
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);
       AND A.col2 = B.col2);
</source>
</syntaxhighlight>


You can also detect and delete duplicate rows using Oracle analytic functions:
You can also detect and delete duplicate rows using Oracle analytic functions:


<source lang="oracle8">
<syntaxhighlight lang="sql">
DELETE FROM customer
DELETE FROM customer
  WHERE rowid IN
  WHERE rowid IN
Line 40: Line 38:
         FROM customer)
         FROM customer)
     WHERE dup > 1);
     WHERE dup > 1);
</source>
</syntaxhighlight>


As we see, there are several ways to detect and delete duplicate rows from Oracle tables.
As we see, there are several ways to detect and delete duplicate rows from Oracle tables.
Line 46: Line 44:
Match null values. Instead of the following:
Match null values. Instead of the following:


<source lang="oracle8">
<syntaxhighlight lang="sql">
DELETE FROM table_name A
DELETE FROM table_name A
  WHERE A.rowid > ANY
  WHERE A.rowid > ANY
Line 53: Line 51:
     WHERE A.col1 = B.col1
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);
       AND A.col2 = B.col2);
</source>
</syntaxhighlight>


I needed to do the following to get rid of all the dupes:
I needed to do the following to get rid of all the dupes:


<source lang="oracle8">
<syntaxhighlight lang="sql">
DELETE FROM table_name A
DELETE FROM table_name A
  WHERE A.rowid > ANY
  WHERE A.rowid > ANY
Line 64: Line 62:
     WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
     WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
       AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));
       AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));
</source>
</syntaxhighlight>
 
[[Category:Oracle]]

Latest revision as of 11:11, 15 November 2022

Find and remove duplicate rows from a table

One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place primary key referential integrity constraints on a table, many shops do not use RI because they need the flexibility.

The most effective way to detect duplicate rows is to join the table against itself as shown below.

SELECT *
  FROM my_table A
 WHERE rowid >
   (SELECT min(rowid)
      FROM my_table B
     WHERE B.key1 = A.key1
       AND B.key2 = A.key2
       AND B.key3 = A.key3);

Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:

DELETE FROM table_name A
 WHERE A.rowid > ANY
   (SELECT B.rowid
      FROM table_name B
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);

You can also detect and delete duplicate rows using Oracle analytic functions:

DELETE FROM customer
 WHERE rowid IN
   (SELECT rowid FROM 
     (SELECT rowid
           , row_number()
        OVER (PARTITION BY custnbr ORDER BY custnbr) dup
        FROM customer)
     WHERE dup > 1);

As we see, there are several ways to detect and delete duplicate rows from Oracle tables.

Match null values. Instead of the following:

DELETE FROM table_name A
 WHERE A.rowid > ANY
   (SELECT B.rowid
      FROM table_name B
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);

I needed to do the following to get rid of all the dupes:

DELETE FROM table_name A
 WHERE A.rowid > ANY
  (SELECT B.rowid
     FROM table_name B
    WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
      AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));