Difference between revisions of "Oracle/Find duplicates"

From YavInWiki
Jump to navigation Jump to search
(New page: Find and remove duplicate rows from a table One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place prima...)
 
Line 6: Line 6:


<source lang="oracle8">
<source lang="oracle8">
SELECT
SELECT BOOK_UNIQUE_ID
  BOOK_UNIQUE_ID,
    , PAGE_SEQ_NBR
  PAGE_SEQ_NBR,
    , IMAGE_KEY
  IMAGE_KEY
  FROM page_image A
FROM
WHERE rowid >
  page_image A
  (SELECT min(rowid) FROM page_image B
WHERE
    WHERE B.key1 = A.key1
  rowid >
      AND B.key2 = A.key2
    (SELECT min(rowid) FROM page_image B
      AND B.key3 = A.key3);
      WHERE
        B.key1 = A.key1
      and
        B.key2 = A.key2
      and
        B.key3 = A.key3
      );
</source>
</source>


Line 27: Line 20:


<source lang="oracle8">
<source lang="oracle8">
DELETE FROM
DELETE FROM table_name A
  table_name A
WHERE A.rowid > ANY
WHERE
   (SELECT B.rowid
  A.rowid >
      FROM table_name B
   ANY (SELECT B.rowid
    WHERE A.col1 = B.col1
  FROM
      AND A.col2 = B.col2);
      table_name B
  WHERE
      A.col1 = B.col1
  AND
      A.col2 = B.col2
  )
;
</source>
</source>


Line 45: Line 31:


<source lang="oracle8">
<source lang="oracle8">
delete from
DELETE FROM customer
  customer
WHERE rowid IN
where rowid in
  (SELECT rowid FROM
(select rowid from
    (SELECT rowid
  (select
          , row_number()
    rowid,
      OVER (PARTITION BY custnbr ORDER BY custnbr) dup
    row_number()
      FROM customer)
    over
    WHERE dup > 1);
    (partition by custnbr order by custnbr) dup
    from customer)
  where dup > 1);
</source>
</source>


Line 63: Line 46:


<source lang="oracle8">
<source lang="oracle8">
DELETE FROM
DELETE FROM table_name A
  table_name A
WHERE A.rowid > ANY
WHERE
   (SELECT B.rowid
  A.rowid >
      FROM table_name B
   ANY (SELECT B.rowid
    WHERE A.col1 = B.col1
  FROM
      AND A.col2 = B.col2);
      table_name B
  WHERE
      A.col1 = B.col1
  AND
      A.col2 = B.col2
  )
;
</source>
</source>


Line 81: Line 57:


<source lang="oracle8">
<source lang="oracle8">
DELETE FROM
DELETE FROM table_name A
  table_name A
WHERE A.rowid > ANY
WHERE
  (SELECT B.rowid
  A.rowid >
    FROM table_name B
  ANY (SELECT B.rowid
    WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
  FROM
      AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));
      table_name B
  WHERE
      (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
  AND
      (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null))
  )
;
</source>
</source>

Revision as of 08:32, 30 October 2009

Find and remove duplicate rows from a table

One of the most important features of Oracle is the ability to detect and remove duplicate rows from a table. While many Oracle DBA place primary key referential integrity constraints on a table, many shops do not use RI because they need the flexibility.

The most effective way to detect duplicate rows is to join the table against itself as shown below.

SELECT BOOK_UNIQUE_ID
     , PAGE_SEQ_NBR
     , IMAGE_KEY
  FROM page_image A
 WHERE rowid >
   (SELECT min(rowid) FROM page_image B
     WHERE B.key1 = A.key1
       AND B.key2 = A.key2
       AND B.key3 = A.key3);

Please note that you must specify all of the columns that make the row a duplicate in the SQL where clause. Once you have detected the duplicate rows, you may modify the SQL statement to remove the duplicates as shown below:

DELETE FROM table_name A
 WHERE A.rowid > ANY
   (SELECT B.rowid
      FROM table_name B
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);

You can also detect and delete duplicate rows using Oracle analytic functions:

DELETE FROM customer
WHERE rowid IN
  (SELECT rowid FROM 
    (SELECT rowid
          , row_number()
       OVER (PARTITION BY custnbr ORDER BY custnbr) dup
       FROM customer)
    WHERE dup > 1);

As we see, there are several ways to detect and delete duplicate rows from Oracle tables.

Match null values. Instead of the following:

DELETE FROM table_name A
 WHERE A.rowid > ANY
   (SELECT B.rowid
      FROM table_name B
     WHERE A.col1 = B.col1
       AND A.col2 = B.col2);

I needed to do the following to get rid of all the dupes:

DELETE FROM table_name A
 WHERE A.rowid > ANY
  (SELECT B.rowid
     FROM table_name B
    WHERE (A.col1 = B.col1 OR (A.col1 is null AND B.col1 is null))
      AND (A.col2 = B.col2 OR (A.col2 is null AND B.col2 is null)));