Tuesday, December 09, 2008

To Exist or Not to Exist

My how time seems to get away. I have wanted to add to my blog for some time.

The other day I was reviewing the load on the primary RDBMS (Sun E25K: 16 Quad Core Boards; 256gbs Ram; 45tbs useable Disk (RAID5); 15tbs 10R2 Database) and saw a CTAS that had been churning for over 4.5 hours. Here is what the CTAS looked like:

create table newtab

as select *

from tab1 c1

where exists (select 'x' from tab2 c2 where c1.field1 = c2.field2);

Table tab1 has 38,835,141 records

Table tab2 has 6,412,588 records

I killed off that create statement and rewrote it as follows:

create table sam_newtab

nologging

parallel(degree 6)

as

select *

from owner.tab1

where field1 in (select field1 from owner.tab2);

This new CTAS took 3 minutes and 5 seconds to complete; much better if I say so myself.

The reason this ran so much faster was because an “IN” subquery is only executed once, while an “EXISTS” subquery is executed once per row of the parent query. If there was an index involved on the subquery table, then and “EXISTS” would be able to take advantage of it. However, there were no indexes on these two tables and therefore negated any advantages of using an “EXISTS”.

No comments: