I have a table with 700m rows, evenly distributed 1-column integer NUPI plus 3-column integer USI (1st col us the also PI). Queries run fine, but when we push updates or deletes to this table in ETL jobs, we have problems. We use a stage table with identical structure to the 700m target table, and use a joined Update statement to update corresponding records in the target table each night. There are an everage of 1-3m rows for updating in the stage table. The single update statement runs 3+ hours despite Explains estimating 10-12 minutes We have looked at partitions but the wide variety of incoming 1-3m rows prevents partition elimination.
From a conceptual point of view, what would be a good approach for pushing millions of updates/deletes to a 700m row table like this?
The incoming stage and target tables have the same structure:
CREATE MULTISET TABLE deck_prod_tbls.inventory_Event_usage ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( Event_Gnbr DECIMAL(15,0) NOT NULL, Inv_Gnbr DECIMAL(15,0) NOT NULL, Usage_Parm_Code VARCHAR(8) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, ...(other cols)... PRIMARY INDEX XPI_Inventory_Event_Usage ( Inv_Gnbr ) UNIQUE INDEX ( Event_Gnbr ,Inv_Gnbr ,Usage_Parm_Code ) INDEX ( Source_Data_Samp_Ord ,Source_Data_Samp_Part_Ord ,Source_Event_Db_Id , Source_Event_Id ,Source_Name );
Stage table with 1-3m rows in STAGEDB.INV_EVENT_USAGE_CH, the 700m row target table is TABLESDB.INVENTORY_EVENT_USAGE. Similar issue with deletes.
update t from TABLESDB.INVENTORY_EVENT_USAGE t, STAGEDB.INV_EVENT_USAGE_CH c set Incremental_Usage_Qty = c.Incremental_Usage_Qty, Rec_Status_Code = c.Rec_Status_Code, Source_Create_Dt = c.Source_Create_Dt, Source_Data_Samp_Ord = c.source_data_samp_ord, Source_Data_Samp_Part_Ord = c.source_data_samp_part_ord, Source_Data_Type_DB_ID = c.source_data_type_db_id, Source_Data_Type_ID = c.source_data_type_id, Source_Event_DB_ID = c.source_event_db_id, Source_Event_Id = c.source_event_id, Source_Event_Inv_Id = c.source_event_inv_id, Source_Inv_DB_ID = c.source_inv_db_id, Source_Inv_Id = c.source_inv_id, Source_Rev_DB_ID = c.source_rev_db_id, Source_Rev_Dt = c.source_rev_dt, Usage_Since_New_Qty = c.usage_since_new_qty, Usage_Since_Overhaul_Qty = c.usage_since_overhaul_qty, Source_Name = c.source_name, Last_Alter_Ts = c.last_alter_ts where t.event_gnbr = c.event_gnbr and t.inv_gnbr = c.inv_gnbr and t.usage_parm_code = c.usage_parm_code; -- -- Delete -- delete TABLESDB.INVENTORY_EVENT_USAGE where TABLESDB.INVENTORY_EVENT_USAGE.event_gnbr = STAGEDB.INV_EVENT_USAGE_DEL.event_gnbr and TABLESDB.INVENTORY_EVENT_USAGE.inv_gnbr = STAGEDB.INV_EVENT_USAGE_DEL.inv_gnbr and TABLESDB.INVENTORY_EVENT_USAGE.usage_parm_code = STAGEDB.INV_EVENT_USAGE_DEL.usage_parm_code;
Each of the 2 statements above runs 3+ hours when about 2m rows are being updated... insanely slow.
Thanks in advance,
Rich