Wednesday, 12 February 2014

how to permanently remove duplicate values in table to Oracle DB



How to remove duplicate records from a large table containing about 5 million
records in a single run and with a lesser time.
i tried it with following query but it takes 10 hours of time.

delete from test1 where rowid not in (select min(rowid) from test1 group by rc_no);

even after incraesing the rollback segs tablespace to 7gb
we are not getting desired results and while using not in clause and cursor we generally
come across this kind of problem

thanks 

and we said...
I'd generate the set of rowids to delete using analytics and then delete them..  like
this:


ops$tkyte@ORA9IR2> create table t as select * from cust;

Table created.

Elapsed: 00:00:03.64
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;

  COUNT(*) COUNT(DISTINCTCUST_SEG_NBR)
---------- ---------------------------
   1871652                      756667

Elapsed: 00:00:05.30
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> delete from t
  2    where rowid in ( select rid
  3                       from ( select rowid rid,
  4                                     row_number() over
  5                                       (partition by <column_name> order by rowid) rn
  6                                from t
  7                            )
  8                     where rn <> 1 )
  9  /

1114985 rows deleted.

Elapsed: 00:01:46.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>  select count(*), count(distinct cust_seg_nbr) from t;

  COUNT(*) COUNT(DISTINCTCUST_SEG_NBR)
---------- ---------------------------
    756667                      756667

Elapsed: 00:00:02.48



As for the RBS -- it'll get as big as it needs to be in order to process the delete --
every index will make it "larger" and take longer as well (index maintainence is
expensive)

if you are deleting "alot of the rows" you might be better off disabling indexes, doing
the delete and rebuilding them.


OR, creating a new table that just keeps the "right records" and dropping the old table:


ops$tkyte@ORA9IR2> create table t as select * from cust;

Table created.

Elapsed: 00:00:02.41
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;

  COUNT(*) COUNT(DISTINCTCUST_SEG_NBR)
---------- ---------------------------
   1871652                      756667

Elapsed: 00:00:04.60
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t2
  2  as
  3  select cust_seg_nbr
  4    from ( select t.*, row_number() over (partition by cust_seg_nbr order by rowid) rn
  5             from t
  6             )
  7   where rn = 1
  8  /

Table created.

Elapsed: 00:00:10.93
ops$tkyte@ORA9IR2> drop table t;

Table dropped.

Elapsed: 00:00:00.56
ops$tkyte@ORA9IR2> rename t2 to t;

Table renamed.

Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;

  COUNT(*) COUNT(DISTINCTCUST_SEG_NBR)
---------- ---------------------------
    756667                      756667

Elapsed: 00:00:01.18
ops$tkyte@ORA9IR2>
==============================================================================================

No comments:

Post a Comment