Quantcast
Channel: Teradata Forums - All forums
Viewing all articles
Browse latest Browse all 27759

Teradata 12 and Subqueries - response (8) by SPOLISETTI

$
0
0

dnoeth,
We are on TD 13.10. Can you please go through Query 1 and Query 2 and answer my questions below? Thanks in advance!!
Query 1: (Scalar )
------------------------------------------
sel a1,b1, (select    sum(c2)from    t2  where     a2=a1  ) as totalamount
from    t1 ;
 
Query 2: ( Re-written the query by removing Scalar query ).
-------------------------------------------------------------------------
select a1,b1, sum(c2) from t1 , t2 where a1=a2 group by 1,2;
 
Query plans for above two queries are totally different. Though both queries should yield the same results in every scenario, I did not understand why Query1 is joining t1 and t2 two time. Can the optimizer not re-write the query1 as query 2 and execute?
 
Query 1 Plan:
Explain sel a1,b1, (
select    sum(c2)
from    t2  
where     a2=a1  ) as totalamount
from    t1 ;

  1) First, we lock a distinct PERF."pseudo table" for read on a
     RowHash to prevent global deadlock for PERF.t2.
  2) Next, we lock a distinct PERF."pseudo table" for read on a RowHash
     to prevent global deadlock for PERF.t1.
  3) We lock PERF.t2 for read, and we lock PERF.t1 for read.
  4) We do an all-AMPs JOIN step (Global sum) from PERF.t2 by way of a
     RowHash match scan, which is joined to PERF.t1 by way of a RowHash
     match scan.  PERF.t2 and PERF.t1 are joined using a merge join,
     with a join condition of ("PERF.t2.a2 = PERF.t1.a1").  The result
     goes into Spool 4 (all_amps), which is built locally on the AMPs
     with Field1 ("-5438").  The size of Spool 4 is estimated with low
     confidence to be 192 rows (4,608 bytes).  Spool Asgnlist: "-5438",
     "Spool_4.Field_2" = "{ Copy }{RightTable}.ROWID",
     "Spool_4.c2" = "{ Copy }{LeftTable}.c2".
     The estimated time for this step is 0.04 seconds.
  5) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
     way of an all-rows scan, and the grouping identifier in field 2.
     Aggregate Intermediate Results are computed globally, then placed
     in Spool 5.  The size of Spool 5 is estimated with low confidence
     to be 144 rows (3,888 bytes).  The estimated time for this step is
     0.04 seconds.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by
          way of an all-rows scan into Spool 1 (all_amps), which is
          built locally on the AMPs with hash fields ("Spool_5.Field_2")
          and Field1 ("Spool_5.Field_2").  The size of Spool 1 is
          estimated with low confidence to be 144 rows (4,896 bytes).
          Spool Asgnlist:
          "Field_1" = "Spool_5.Field_2",
          "Field_2" = "Field_3",
          "Field_3" = "Field_2".
          The estimated time for this step is 0.03 seconds.
       2) We do an all-AMPs RETRIEVE step from PERF.t1 by way of an
          all-rows scan with no residual conditions into Spool 7
          (all_amps), which is redistributed by hash code to all AMPs
          with hash fields ("PERF.t1.ROWID") and Field1 (
          "PERF.t1.ROWID").  Then we do a SORT to order Spool 7 by row
          hash.  The size of Spool 7 is estimated with low confidence
          to be 144 rows (3,744 bytes).  Spool Asgnlist:
          "Field_1" = "PERF.t1.ROWID",
          "a1" = "a1",
          "b1" = "b1".
          The estimated time for this step is 0.01 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
     an all-rows scan into Spool 8 (all_amps), which is redistributed
     by hash code to all AMPs with hash fields ("Spool_1.Field_3") and
     Field1 ("Spool_1.Field_1").  Then we do a SORT to order Spool 8 by
     row hash.  The size of Spool 8 is estimated with low confidence to
     be 144 rows (4,896 bytes).  Spool Asgnlist:
     "Field_1" = "Spool_1.Field_1",
     "Field_2" = "Field_2",
     "Field_3" = "Spool_1.Field_3".
     The estimated time for this step is 0.01 seconds.
  8) We do an all-AMPs JOIN step (No Sum) from Spool 7 (Last Use) by
     way of a RowHash match scan, which is joined to Spool 8 (Last Use)
     by way of a RowHash match scan.  Spool 7 and Spool 8 are
     left outer joined using a merge join, with a join condition of (
     "Spool_7.Field_1 = Spool_8.Field_3").  The result goes into Spool
     2 (group_amps), which is built locally on the AMPs with Field1 (
     "UniqueId").  The size of Spool 2 is estimated with low confidence
     to be 144 rows (4,752 bytes).  Spool Asgnlist: "UniqueId",
     "{LeftTable}.a1 ,{LeftTable}.b1 ,{RightTable}.Field_2,".
     The estimated time for this step is 0.04 seconds.
  9) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 2 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.18 seconds.
 
====
The real issue we have on our system is that in Step 4, it assigns some constant value to Field 1. In Step 5, when it does the SUM by gouping on Field 1, the estimates are going down from 5M rows to 5 rows. Because of this, in sub-sequent steps, the optimizer is going for product joins :(
I opened incident with Teradata, but would like to know your thoughts on this.  You show up on google when I search for scalar queries in Teradata. So, I am assuming that you may be one of the scalar query feature developers.
===
 
 


Viewing all articles
Browse latest Browse all 27759

Trending Articles