SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Thursday, August 31, 2006

Indexing question

SQL Apprentice Question
I have a database table that stores the history of a data readings
taken from hardware devices. The hardware device is queried by software
once a minute (or more) and the value stored in the database for
trending an analysis.

The table structure is:
CREATE TABLE DeviceData (
[DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
[DeviceID] [int] NOT NULL ,
[DataValue] [decimal](18, 4) NOT NULL ,
[DataTimestamp] [datetime] NOT NULL
)


After running for a few weeks, the number of rows in the table is
getting large as expected. I started to notice the queries against this
table are taken longer to run. There is currently a primary key
(DataID) and an index on DeviceID. I later realized that since most of
the queries search for a particular date range, an index on
DataTimestamp would definitely help.


Once I added the index, query times for newly added data greatly
improved. The older data, however, still takes longer then I would
like. My question is, once I added the index, is only new data that
gets added to the table indexed? Is there any way to optimize the
queries for the older data.


Being a software developer, not a DBA, any other adivce is greatly
appeciated.


Celko Answers
What have seen, but apparently do not realize is that your IDENTITY
column is redundant and not a key at all. It is an attribute of the
hardware and has nothing to do with the data model. The data_timestamp
is the natural key and should be so delcared:

CREATE TABLE DeviceData
(reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
NOT NULL PRIMARY KEY,
device_id INTEGER NOT NULL
REFERENCES Devices(device_id),
device_reading DECIMAL (18,4) NOT NULL );


Names like :"data_value" are vague; isn't what you really have the
device readings in that column?



>> Once I added the index, query times for newly added data greatly improved. The older data, however, still takes longer then I would like. My question is, once I added the index, is only new data that gets added to the table indexed? Is there any way to optimize the queries for the older data.<<


If all the data are in the same table, they all are indexed. First
check the data to be sure that you really have unique timestamps. Drop
the redundant, exposed locator IDENTITY column. Drop your indexes.
Add a primary key constraint.

I think that you have the wrong model of the world. In SQL, there are
DRI constraints which you **must have** if you want to have data
integrity. In SQL Server the uniqueness constraints are enforced by
indexes. Other products do it in other ways (hashing, bit vectors,
etc.) These are called primary indexes. But implement a LOGICAL
concept -- a relational key.


To improve performance, you can add optional indexes (secondary
indexes) to a table. This is vendor dependent and not part of the SQL
Standard at all.


You are still thinking about a file system which does not have the
concept of relational keys and whose indexes are all of the same kind.
In a sequentail file, you locate records (which are not rows) by a
physical position number (which newbies fake with IDENTITY).


It takes at least a year of full time SQL programming, a lot of reading
and a good teacher to get the hang of it. Could be worse; could be APL
or LISP :)

No comments: