Does a query sampling a certain number of rows from a table, such as the following, really scan all rows? The information from Explain below seems to indicate that all rows are scanned.
Or, does teradata implement the scan in a more efficient manner skipping rows? I appreciate any information on this. The table is not large enough for me to understand the difference by running different queries which force a scan.
explain
select * from test_table_2_81 sample 10;
*** Help information returned. 13 rows.
*** Total elapsed time was 1 second.
Explanation
---------------------------------------------------------------------------
1) First, we lock a distinct TEST_SCHEMA."pseudo table" for
read on a RowHash to prevent global deadlock for
TEST_SCHEMA.test_table_2_81.
2) Next, we lock TEST_SCHEMA.test_table_2_81 for read.
3) We do an all-AMPs SAMPLING step from
TEST_SCHEMA.test_table_2_81 by way of an all-rows scan
with no residual conditions into Spool 1 (group_amps), which is
built locally on the AMPs. Samples are specified as a number of
rows.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.