SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Wednesday, August 30, 2006

Filtering a query by date threshold

SQL Apprentice Question
The locations of vehicles are received and stored in a database as a lat/lon
value, and accompanied by the datetime timestamp of when the position was
taken. Because of the technology, sometimes vehicles will submit their
position multiple times within a minute, sometimes they are unable to report
(because of visibility) for a few minutes.

I need to write a query that only contains records of vehicle location that
are more than a minute older than the previous record. To clarify, heres
the DDL for my example:


CREATE TABLE [Locations] (
[location_id] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[vehicle_id] [int] NOT NULL ,
[timestamp] [datetime] NOT NULL ,
[latitude] [numeric](12, 9) NOT NULL ,
[longitude] [numeric](12, 9) NOT NULL ,
CONSTRAINT [PK_Locations] PRIMARY KEY CLUSTERED
(
[location_id]
) ON [PRIMARY]
) ON [PRIMARY]
GO


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:21:00', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:21:15', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:21:19', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:24:00', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:24:49', 1.1, 2.3)


insert into locations(vehicle_id, timestamp, latitude, longitude)
values(1, '1/1/2006 11:27:11', 1.1, 2.3)
go


My intended query for this sample data would only return the records 1, 4
and 6, because the others would be within a minute of the previous record
used. Is this possible within a query or will I need to make use of a
stored procedure for such filtering?



Celko Answers
Time is best modeled as durations, not chronons. IDENTITY cannot ever
be a relational key by definition. TIMESTAMP is a reserved word in SQL
as well as too vague. This is one of the few times I would use FLOAT
over NUMERIC(s,p) because the trig libraries are all in floating point.
I hope the vehicle id is really the VIN, so you can verify and
validate it that will be CHAR(17) with a fancy constraint.


>> I need to write a query that only contains records [sic] of vehicle location that are more than a minute older than the previous record [sic]. <<


Rows are not records and when you use the wrong mental model, you are
going to have problems. The column pairs (arrive_time, depart_time) and
(latitude, longitude) are atomic, but not scalar -- that, they make
sense only as pairs. Some products would let you create or use such
built-in data types; we have to fake it in SQL server.

You are mimicing a log in a procedural system, not the fact you want to
capture. Try this schema, with a more accurate table name:


CREATE TABLE LocationHistory
(vehicle_id INTEGER NOT NULL, -- the VIN, I hope
arrive_time DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
depart_time DATETIME,
CHECK (arrive_time <= DATEADD (mm, -5, depart_time)),
vehicle_latitude FLOAT NOT NULL,
vehicle_longitude FLOAT NOT NULL,
PRIMARY KEY (arrive_time, depart_time, latitude, longitude)
);


Now you need a procedure that will close out the prior vehicle location
(i.e. the row with the (depart_time IS NULL; use a VIEW to display
these rows as "last known location") and create a new row for the now
current vehicle location. A simply UPDATE and INSERT in one
transaction -- no fancy self-joins at all.


A constraint simply prevents you from storing data that you did not
want to have anyway. Think in non-procedural terms, not in
step-by-step "capture data, filter data" procedures.


Do not whine about the looooong primary key; without it, you would have
no data integrity at all. If it does not have to be right, the answer
is always 42 :)

No comments: