Table: RETAIL_SHIPMENT / Records - 7 Billion / Partitions - 120 Weekly Partitions
Table: CURR_YAGO_CAL - Has Weekend Dates and their Corresponding Year Ago Dates. Sample 2 records
Cal_Dt Yago_Dt
12/26/2009 12/27/2008
12/26/2009 12/26/2009
1/2/2010 1/3/2009
1/2/2010 1/2/2010
I have a query which looking at some metrics for a date and its Year Ago date. The Problem I am facing is when I pass the filter on the CAL_DT and Join the Shipment Table in CAL_DT it accesses only one partition of the RETAIL_SHIPMENT which is what we need.
Problem arises when we filter on the CAL_DT and join on the YAGO_DT with the RETAIL_SHIPMENT then the optimizer is doing a complete table scan which is taking hell lot of time.
Below are the two versions of the query and their corresponding Explain Plans. PLease see explain step no 4 and the last 2 statements of the query. Can anyone please share a way to resolve this or may be propose a different design to just read 2 partitions of RETAIL_SHIPMENT. Please ignore the low/high confidences for now as we have collected all the relevant stats and also the explain is from DEV hence low no of records. Any help would be appreciated.
SEL count(1) FROM
MIM_TBL.CURR_YAGO_CAL TMP
INNER JOIN
(
SELECT RTL_ID RetailID,
CATG_ID CATGID,
Shipment_Dt ,
MAX(Catg_Store_Selling_13Wk_Cnt) AS CatgStoreCount
FROM Mim_tbl.retail_shipment_t
GROUP BY 1,2,3
) CATG_STORE_AGG
ON
CATG_STORE_AGG.SHipment_DT = TMP.CAL_DT
WHERE TMP.CAL_DT = '2013-05-11'
1) First, we lock a distinct MIM_TBL."pseudo table" for read on a
RowHash to prevent global deadlock for MIM_TBL.TMP.
2) Next, we lock a distinct Mim_tbl."pseudo table" for read on a
RowHash to prevent global deadlock for Mim_tbl.retail_shipment_t.
3) We lock MIM_TBL.TMP for read, and we lock
Mim_tbl.retail_shipment_t for read.
4) We do an all-AMPs SUM step to aggregate from a single partition of
Mim_tbl.retail_shipment_t with a condition of (
"Mim_tbl.retail_shipment_t.Shipment_Dt = DATE '2013-05-11'") with
a residual condition of ("Mim_tbl.retail_shipment_t.Shipment_Dt =
DATE '2013-05-11'") , grouping by field1 (
Mim_tbl.retail_shipment_t.Rtl_Id
,Mim_tbl.retail_shipment_t.Catg_Id
,Mim_tbl.retail_shipment_t.Shipment_Dt). Aggregate Intermediate
Results are computed locally, then placed in Spool 3. The size of
Spool 3 is estimated with low confidence to be 91,369 rows (
3,746,129 bytes). The estimated time for this step is 0.11
seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan into Spool 1 (used to materialize
view, derived table or table function CATG_STORE_AGG)
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. The size of Spool 1 is estimated with
low confidence to be 91,369 rows (2,284,225 bytes). The
estimated time for this step is 0.02 seconds.
2) We do an all-AMPs RETRIEVE step from MIM_TBL.TMP by way of a
traversal of index # 4 without accessing the base table with
a residual condition of ("MIM_TBL.TMP.Cal_Dt = DATE
'2013-05-11'") into Spool 8 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs. The size of Spool
8 is estimated with high confidence to be 12 rows (204 bytes).
The estimated time for this step is 0.01 seconds.
6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to Spool 1 (Last Use) by way of an
all-rows scan with a condition of ("CATG_STORE_AGG.SHIPMENT_DT =
DATE '2013-05-11'"). Spool 8 and Spool 1 are joined using a
product join, with a join condition of ("SHIPMENT_DT = Cal_Dt").
The result goes into Spool 7 (all_amps) (compressed columns
allowed), which is built locally on the AMPs. The size of Spool 7
is estimated with low confidence to be 91,369 rows (1,370,535
bytes). The estimated time for this step is 0.03 seconds.
7) We do an all-AMPs SUM step to aggregate from Spool 7 (Last Use) by
way of a cylinder index scan. Aggregate Intermediate Results are
computed globally, then placed in Spool 9. The size of Spool 9 is
estimated with high confidence to be 1 row (23 bytes). The
estimated time for this step is 0.02 seconds.
8) We do an all-AMPs RETRIEVE step from Spool 9 (Last Use) by way of
an all-rows scan into Spool 5 (group_amps), which is built locally
on the AMPs. The size of Spool 5 is estimated with high
confidence to be 1 row (25 bytes). The estimated time for this
step is 0.00 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 5 are sent back to the user as the result of
statement 1. The total estimated time is 0.18 seconds.
SEL count(1) FROM
MIM_TBL.CURR_YAGO_CAL TMP
INNER JOIN
(
SELECT RTL_ID RetailID,
CATG_ID CATGID,
Shipment_Dt ,
MAX(Catg_Store_Selling_13Wk_Cnt) AS CatgStoreCount
FROM Mim_tbl.retail_shipment_t
GROUP BY 1,2,3
) CATG_STORE_AGG
ON
CATG_STORE_AGG.SHipment_DT = TMP.YAGO_DT
WHERE TMP.CAL_DT = '2013-05-11'
1) First, we lock a distinct MIM_TBL."pseudo table" for read on a
RowHash to prevent global deadlock for MIM_TBL.TMP.
2) Next, we lock a distinct Mim_tbl."pseudo table" for read on a
RowHash to prevent global deadlock for Mim_tbl.retail_shipment_t.
3) We lock MIM_TBL.TMP for read, and we lock
Mim_tbl.retail_shipment_t for read.
4) We do an all-AMPs SUM step to aggregate from
Mim_tbl.retail_shipment_t by way of an all-rows scan with no
residual conditions , grouping by field1 (
Mim_tbl.retail_shipment_t.Rtl_Id
,Mim_tbl.retail_shipment_t.Catg_Id
,Mim_tbl.retail_shipment_t.Shipment_Dt). Aggregate Intermediate
Results are computed locally, then placed in Spool 3. The input
table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 3 is estimated with high
confidence to be 7,381,353 rows (302,635,473 bytes). The
estimated time for this step is 5 minutes and 16 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan into Spool 1 (used to materialize
view, derived table or table function CATG_STORE_AGG)
(all_amps) (compressed columns allowed), which is built
locally on the AMPs. The size of Spool 1 is estimated with
high confidence to be 7,381,353 rows (184,533,825 bytes).
The estimated time for this step is 1.31 seconds.
2) We do an all-AMPs RETRIEVE step from MIM_TBL.TMP by way of an
all-rows scan with a condition of ("MIM_TBL.TMP.Cal_Dt = DATE
'2013-05-11'") into Spool 8 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs. The size of Spool
8 is estimated with high confidence to be 12 rows (204 bytes).
The estimated time for this step is 0.01 seconds.
6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to Spool 1 (Last Use) by way of an
all-rows scan. Spool 8 and Spool 1 are joined using a single
partition hash_ join, with a join condition of ("SHIPMENT_DT =
Yago_Dt"). The result goes into Spool 7 (all_amps) (compressed
columns allowed), which is built locally on the AMPs. The size of
Spool 7 is estimated with low confidence to be 102,518 rows (
1,537,770 bytes). The estimated time for this step is 0.39
seconds.
7) We do an all-AMPs SUM step to aggregate from Spool 7 (Last Use) by
way of a cylinder index scan. Aggregate Intermediate Results are
computed globally, then placed in Spool 9. The size of Spool 9 is
estimated with high confidence to be 1 row (23 bytes). The
estimated time for this step is 0.02 seconds.
8) We do an all-AMPs RETRIEVE step from Spool 9 (Last Use) by way of
an all-rows scan into Spool 5 (group_amps), which is built locally
on the AMPs. The size of Spool 5 is estimated with high
confidence to be 1 row (25 bytes). The estimated time for this
step is 0.00 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 5 are sent back to the user as the result of
statement 1. The total estimated time is 5 minutes and 17 seconds.