We have one business requirment where they want to convert rows to columns. The data we are talking about is 2-3 TB of data. Data looks something in this format.
Table Structure
Date_1 Unit_Number Data_ID Data_Value
2013-01-02 00:00:00 100013 123 671
2013-01-02 00:00:00 100014 131 771
2013-01-02 00:00:00 100015 281 812
2013-01-02 00:00:00 100016 712 979
2013-01-02 00:00:00 100017 715 719
.
.
Pivoted table
Date_1 RY XY HJ KD IK GH HH KK TK RT ...
2013-01-02 00:00:00 671 771 812 979 719 979 719 980 799 79
2013-01-02 00:10:00 671 771 812 979 719 979 719 980 799 79
and so on
We are pivoting this data using query and creating view using query something like below
select a.date_1 date_1 a.unit_number system_number,
max(CASE WHEN a.data_id= 123 then a.DATA_Value END) RY,
max(CASE WHEN a.data_id= 281 then a.DATA_Value END) XY,
max(CASE WHEN a.data_id=712 then a.DATA_Value END) HJ,
max(CASE WHEN a.data_id=715 then a.DATA_Value END) KD,
max(CASE WHEN a.data_id=666 then a.DATA_Value END) IK,
max(CASE WHEN a.data_id=231 then a.DATA_Value END) GH,
max(CASE WHEN a.data_id=881 then a.DATA_Value END) HH,
max(CASE WHEN a.data_id=734 then a.DATA_Value END) KK,
max(CASE WHEN a.data_id=734 then a.DATA_Value END) TK,
max(CASE WHEN a.data_id=724 then a.DATA_Value END) TK,
.
.
.
from FROM table_name group by 1,2
There are about 40 such rows that we are trying to convert into columns. We created indexes, Primary, Secondary and referencial ones to tune this overall query. Also added paritions, We find this conversion works fine for smaller query, however for larger set of data we get high CPU and IO and also often runs into spool error. The data set we are runing on this query are about 3-4 TB. There is another table where we have about 100 rows which we need to convert into one row and again size goes to 4-5 TB
Questions I have
1) Is there better way to convert columns to Rows?
2) What options can be use to reduce CPU, IO and Spool error?
3) Does Hybrid Columanar in TD 14 help in any way to improve performance specially for larger data sets?
4) Any other appproach or design suggested here that will help us?
This design is pushed by our business due to flexibility this design offers and we are runing into performance and we are trying to make this work without impacting CPU, Spool and IO
Any suggestions would help us here, thanks for reading this
Forums: