**SQL Apprentice Question**

I've tried to construct an algorithm myself in vain for quite some time

now for the following problem. Could anyone help me out?

Given a table R with a primary key consisting of columns { K1, K2, ...,

Kp }, p >= 1, and two arbitrary rows a and b, I want to query all rows

"between" a and b. I mean, if I list all possible rows of R ordered by

the primary key, I want all rows between a and b (inclusive). And those

rows may be really arbitrary, the K1 value of a might be less than the

K2 value of b, while the K2 value of a might be greater than the K2

value of b, if you see what I mean. I want to express that query

without using any fancy SQL stuff.

The algorithm I'd like to find would give me the smallest set of

expression, such that the expression is on the form (using SQL syntax):

(K1 = v_1) and (K2 = v_2) and ... and (Kn = v_n) and (Km between v_m_a

and v_m_b)

where 0 <= n < p, m = n+1, and v_xxx are column values.

**Celko Answers**

Here are the rules for row comparisons in Standard SQL taken from SQL

FOR SMARTIES 3-rd edition.

09.02. Row Comparisons in SQL

Standard SQL generalized the theta operators so they would work on row

expressions and not just on scalars. This is not a popular feature

yet, but it is very handy for situations where a key is made from more

than one column, and so forth. This makes SQL more orthogonal and it

has an intuitive feel to it. Take three row constants:

A = (10, 20, 30, 40);

B = (10, NULL, 30, 40);

C = (10, NULL, 30, 100);

It seems reasonable to define a row comparison as valid only when the

data types of each corresponding column in the rows are

union-compatible. If not, the operation is an error and should report

a warning. It also seems reasonable to define the results of the

comparison to the AND-ed results of each corresponding column using the

same operator. That is, (A = B) becomes:

((10, 20, 30, 40) = (10, NULL, 30, 40));

becomes:

((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 40))

becomes:

(TRUE AND UNKNOWN AND TRUE AND TRUE);

becomes:

(UNKNOWN);

This seems to be reasonable and conforms to the idea that a NULL is a

missing value that we expect to resolve at a future date, so we cannot

draw a conclusion about this comparison just yet. Now consider the

comparison (A = C), which becomes:

((10, 20, 30, 40) = (10, NULL, 30, 100));

becomes:

((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 100));

becomes:

(TRUE AND UNKNOWN AND TRUE AND FALSE);

becomes:

(FALSE);

There is no way to pick a value for column 2 of row C such that the

UNKNOWN result will change to TRUE because the fourth column is always

FALSE. This leaves you with a situation that is not very intuitive.

The first case can resolve to TRUE or FALSE, but the second case can

only go to FALSE.

Standard SQL decided that the theta operators would work as shown in

the table below. The expression RX

RX compared to a row RY; likewise, RXi means the i-th column in the row

RX. The results are still TRUE, FALSE, or UNKNOWN, if there is no

error in type matching. The rules favor solid tests for TRUE or FALSE,

using UNKNOWN as a last resort.

The idea of these rules is that as you read the rows from left to

right, the values in one row are always greater than or less than)

those in the other row after some column. This is how it would work if

you were alphabetizing words.

The rules are

1. RX = RY is TRUE if and only if RXi = RYi for all i.

2. RX <> RY is TRUE if and only if RXi <> RYi for some i.

3. RX < RY is TRUE if and only if RXi = RYi for all i < n and

RXn < RYn for some n.

4. RX > RY is TRUE if and only if RXi = RYi for all i < n and

RXn > RYn for some n.

5. RX <= RY is TRUE if and only if Rx = Ry or Rx < Ry.

6. RX >= RY is TRUE if and only if Rx = Ry or Rx > Ry.

7. RX = RY is FALSE if and only if RX <> RY is TRUE.

8. RX <> RY is FALSE if and only if RX = RY is TRUE.

9. RX < RY is FALSE if and only if RX >= RY is TRUE.

10. RX > RY is FALSE if and only if RX <= RY is TRUE.

11. RX <= RY is FALSE if and only if RX > RY is TRUE.

12. RX >= RY is FALSE if and only if RX < RY is TRUE.

13. RX

neither TRUE nor FALSE.

The negations are defined so that the NOT operator will still have its

usual properties. Notice that a NULL in a row will give an UNKNOWN

result in a comparison. Consider this expression:

(a, b, c) < (x, y, z)

which becomes

((a < x)

OR ((a = x) AND (b < y))

OR ((a = x) AND (b = y) AND (c < z)))

The standard allows a single-row expression of any sort, including a

single-row subquery, on either side of a comparison. Likewise, the

BETWEEN predicate can use row expressions in any position in Standard

SQL.

## No comments:

Post a Comment