Joe Celko The SQL Apprentice: Which values do NOT appear within a tolerance value Options

SQL Apprentice Question
I have a table of servers that receive updates on a regular basis

CREATE TABLE [dbo].[tbl_ServerUpdate](
[ServerUpdateID] [int] NOT NULL,
[Server] [nvarchar](256) NOT NULL,
[UpdateStamp] [datetime] NOT NULL,
)

I then have tolerance values in another table which I can get like this
(they are in terms of minutes):

DECLARE @Interval INT

SELECT @Interval = IntervalValue
FROM tbl_Interval (NOLOCK)
WHERE IntervalID = 1

I want to know which servers have NOT received an update within the
tolerance value. For example, give me the set of Server that have not
received an update in the last 5 (@Interval) minutes.

Celko Answers
What is an update_id? What would it mean in a logical data model?
Surely, you did NOT just physically number the rows in a table!

This table also had no key, so I made an assumption that you want to
use the (originally very long AND possibly in Chinese!) server names.
But isn't the update interval logically an attribute of each server?
Shouldn't it be in the Servers table? I cannot see an interval
floating around as an entity in itself.

CREATE TABLE Servers
(server_name VARCHAR(25) NOT NULL PRIMARY KEY,
update_stamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
update_interval INTEGER NOT NULL
CHECK(update_interval > 0),
etc.
);

If you need this a lot, put it in a VIEW.

CREATE VIEW ExpiredServers (server_name)
AS
SELECT server_name
FROM Servers
WHERE update_stamp
< DATEADD(MI, -update_interval, CURRENT_TIMESTAMP);

SQL Apprentice Question
In the real world, everything is not a beautifully static picture of
relational data. I didn't include the key because I don't think its relevant
to the solution. But since you brought it up: is it better to have a natural
key on a varchar or an identity key as an integer that doesn't really have
any intrinsic business meaning

? In theory, it should be the varchar, but in practice its more performant
to index an int rather than a varchar(25). Is that not true?

The interval is not an attribute of the server, but rather an attribute of
the update type. In my case, there are seven different minute intervals. So,
in the end, I need to know - for each update type - which server did not
receive it within the last [interval] (in terms of minutes). The interval is
configurable by the operations group - sometimes they may want to be alerted
when a server didn't receive an update in the last 3 minutes, sometimes in
the last 10 minutes. They can change this real-time in the database.

I am not a TSQL guru - thats why I use this newsgroup from time to time. I
also have every one of your books on my shelf. However, the corporate world
is NOT academic and there is a tradeoff between having an understandable
data model that is easy to work with and having a data model that can serve
as a model in CS405 (or whatever). Sometimes we have to denormalize or
flatten data to be able to run reports in an acceptable time; sometimes we
have to add bit columns that could easily be derived from another to improve
query performance.

The balance you give is valuable, so I hope you continue to berate us when
our designs violate, but I'll bet - despite all the companies you have
worked with - you can't name one that had a data model that gave you a
boner.

Celko Answers
>> is it better to have a natural key on a VARCHAR(n) or an IDENTITY key [SIC] as an integer that doesn't really have any intrinsic business meaning? <<

How important is data integrity? Why not use a pointer chain DB
instead of mimicking it in SQL? At least they have features to do
garbage collection, restore chains, etc. Eventually, denormalization
and short-cuts will come back and bite you.

>> In theory, it should be the VARCHAR, but in practice its more performant to index an INTEGER rather than a VARCHAR(25). Is that not true? <<

It depends on the product. You automatically assumed indexing; look
at hashing in Teradata. Longer strings for keys lead to easy perfect
hashing, which is always one probe as opposed to multiple probes with
an index when the database gets large. If you are using hidden
pointer chains like SQL Anywhere to implement PK-FK, there is no
difference. SQL Server does have a lot of "code museum" problems and
this is one of them; one type of simple B+ Tree index is used for all
data types and distributions. However, until you get to large DBs, it
works fine and it will get a shot in the arm from 64-bit hardware,
too.

>> The interval is not an attribute of the server, but rather an attribute of the update type. In my case, there are seven different minute intervals. So, in the end, I need to know - for each update type - which server did not receive it within the last [interval] (in terms of minutes). The interval is configurable by the operations group - sometimes they may want to be alerted when a server didn't receive an update in the last 3 minutes, sometimes in the last 10 minutes. They can change this real-time in the database. <<

A type is an attribute by definition, so what entity does the
update_type belong to, if not a server? I would think from this
description, you would have seven columns for the logically different
updates, and seven columns for their corresponding intervals. Or is
this a repeating group where not all update types apply to all servers
and the update types can be changed?

>> I also have every one of your books on my shelf. <<

Neat! Number 7 is out in 2008 February.

>> .. I'll bet - despite all the companies you have worked with - you can't name one that had a data model that gave you a boner. <<

LOL! Remember what I do for a living -- this is like asking a doctor
why all his patients are sick!

The best one I can remember just off hand was for a credit management
company. They had just designed it and wanted a two day review from
me. The only real problem I found was that a bunch of the columns
were stubbed in with a magical CHAR(1) NOT NULL data type; they were
still working on the encodings and waiting for advise from legal.

Another one was a software company with a portal product that manages
corporate software access. We replaced ~60 tables with a nested set
model that lead to ~6 tables for the core processes. In fairness, the
original model had grown over time from a denormalized model on a
small platform to a mainframe tool. People kept adding tables to it
as a work-around and it had become a jungle.

Original source

Joe Celko The SQL Apprentice

Thursday, December 27, 2007

Which values do NOT appear within a tolerance value Options

No comments:

Blog Archive

Labels