Difference between revisions of "Oracle/Shuffle data"

From YavInWiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following statement(s) creates a copy of an existing table (address), randomly shuffles each (selected) column within the temporary table and puts them back into the original table. Afterwards the data within the original table (selected columns only) are randomly shuffled. The individual random min/max values are mandatory since without them all columns (here name and street) would be 'shifted' together (the pair would stay together).
The following statement shuffles to content of columns (name and street):


<source lang="oracle11">
<source lang="oracle11">
CREATE TABLE address_tmp as SELECT * FROM address;
-- shuffle the columns name and street (separately)
MERGE INTO address t1
MERGE INTO address t1
  USING (SELECT address_id,
  USING (SELECT address_id,
Line 9: Line 9:
               nvl(lead(street) over (order by dbms_random.value(2,2)),
               nvl(lead(street) over (order by dbms_random.value(2,2)),
                   first_value(street) over (order by dbms_random.value(2,2))) street
                   first_value(street) over (order by dbms_random.value(2,2))) street
           FROM address_tmp) t2
           FROM address) t2
     ON (t1.address_id = t2.address_id)
     ON (t1.address_id = t2.address_id)
WHEN matched THEN
WHEN matched THEN
UPDATE
UPDATE
   SET t1.name = t2.name, t1.street = t2.street;
   SET t1.name = t2.name, t1.street = t2.street;
--drop table address_tmp;
</source>
</source>


If no temp table is used in the above example, data will (most likely) be lost since the select statement will work a changing table so that some information will be multiplied and some will be lost.
The individual random values are used so that both columns are shuffled separately (else they would stay together as a pair).
 
'''Note: NULL values will be lost (will be set to the same random value).'''
 
Initial idea came from [http://asktom.oracle.com/pls/apex/f?p=100:11:0::NO::P11_QUESTION_ID:6382864029082 Ask Tom].
 
[[Category:Oracle]]

Latest revision as of 14:58, 9 March 2018

The following statement shuffles to content of columns (name and street):

-- shuffle the columns name and street (separately)
MERGE INTO address t1
 USING (SELECT address_id,
               nvl(lead(name) over (order by dbms_random.value(1,1)),
                   first_value(name) over (order by dbms_random.value(1,1))) name,
               nvl(lead(street) over (order by dbms_random.value(2,2)),
                   first_value(street) over (order by dbms_random.value(2,2))) street
          FROM address) t2
    ON (t1.address_id = t2.address_id)
WHEN matched THEN
UPDATE
   SET t1.name = t2.name, t1.street = t2.street;

The individual random values are used so that both columns are shuffled separately (else they would stay together as a pair).

Note: NULL values will be lost (will be set to the same random value).

Initial idea came from Ask Tom.