UnifiedSQL
Unified SQL is a method I developed to transform the historized DBQL SQL Statements into a pattern and assigning a surrogate unique key named unifiedsqlid to that pattern. This key with the DBQL sql_id is uploaded to a reference table. Via the reference table iit is possible to compare and measure resource usage of this sql regardless of repetitions with different values which for ex. might be useful in finding deterioration of callcenter query performance. The reference is updated daily, so sql patterns can be tracked over time.
In the system I administer the DBQL daily data kept for 3 months with all details and around a year with loss of detail in history.
TheDBQL SQL gets normalized, byte wise identical SQLS are stored once with a sql_id and linked with the queryid.
Here is where the unifiedsql approach comes into play.
The unified SQL patterns are created by unloading the DBQL Sql history data to a flat file via fastexport. The single SQL’S are processed by a perl script, which replaces all occurences of constants in where ,order by, group by,in clauses with a % sign. This end up in an unique pattern for identical sqls which differ only in the constants replaced. By the way the SQL is hashed against reference data which is build up by the actual database and tablenames. Recognized tablenames are stored with the unifiedsqlid and uploaded to a reference table too. The procedure runs every night and new sql from dbql is matched to the patterns caught so far. In case of match the reference is written otherwise the new pattern is stored and linked.
Since pattern matching and replacement with perls regular expressions might be a hard job, not every item is yet recognized and incorrect replacements have been found and corrected. To come up with details I currently probe the detection of database objects in analyzing the objectaccess data and linking the hereby found tables to the unifiedsqlid for comparision. Multline SQLS , SQL exceeding 31000 bytes in lenght and therefore creating 2 entries, are treared as single ones. This might be improved in the future.
The unified executions are linked to the department and user “executing” that pattern, providing number of pattern execution, sum ampcputime, min and max logdate, number of detections. date of insert and update of this numbers.
For example the linkage between SQL and table makes it quiet easy to perform analytics for improving collect stats which is often performed without knowledge of what sql (pattern) might benefit most from proper stats.
Some figures.
Queries per month avg…………..: 86.402.221
Number of sql_ids (DBQL)……….: 103.682.663 (2011-07-17 – 2013-04-11)
Number of unified patterns…….: 1.966.010 ( 2011-12-08 . 2014-04-10)
Number of unified links to sql_id:.. 66.057.776
The unified structure in conjunction with the DBQL builds a more and more consistent analytical platform to measure the ressource usage on sql and helps identifying hot spots of table access and most important repetition with different assignments and value lists.
Guido Schasiepen
Technical Specialist Data Warehouse
Vodafone D2 Gmbh, Germany
Forums: