SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Friday, October 06, 2006

Cursor Performance

SQL Apprentice Question
Cursor performance for a real world application.

I have a table that holds records for a stops along a bus route. Each
records has a number of people getting on, number of people getting
off, a spot check, and a current load column.
The spot check column is the to verify that the sensors on the bus are
working correctly; the most reliable number to be use in the is
SPOT_CHECK.


Table Structure as follows:


ID (identity column)
ROUTE (Description of the bus route)
ONS (# of people getting on)
OFFS (# of people getting off)
SPOT_CHECK (visual spot count of people on the bus)
LOAD (Calculated load on the bus)


ID ROUTE ONS OFFS SPOT_CHECK LOAD
1 AAAA 5 0 null
2 AAAA 0 0 null
3 AAAA 2 1 null
4 AAAA 0 2 5
5 AAAA 6 3 8
6 AAAA 0 5 null
7 AAAA 1 2 null
8 AAAA 0 1 null
9 AAAA 0 1 null
10 AAAA 0 0 null


Now, I want to calculate the load at each stop along the way.


Load = Previous stops load + current stop ONS - Current stop's OFFS if
SPOT_CHECK is null, otherwise LOAD = SPOT_CHECK


So the results of the above table will be as follows:


ID ROUTE ONS OFFS SPOT_CHECK LOAD
1 AAAA 5 0 null 5
2 AAAA 0 0 null 5
3 AAAA 2 1 null 6
4 AAAA 0 2 5 5
5 AAAA 6 3 10 10
6 AAAA 0 5 null 5
7 AAAA 1 2 4 4
8 AAAA 0 1 null 3
9 AAAA 0 1 2 2
10 AAAA 0 2 null 0


Current programming, we using a cursor and seeing much much slower
performance than in our Oracle app. Does anyone see a way of doing
this load calculation without using a cursor and improving performance?


I can't do simple sums of ons and offs to calculate load, because of
the SPOT CHECK column.


Celko Answers
>> I have a table that holds records [sic] for a stops along a bus route. <<


Please post DDL and do not confuse records and rows.


>> records [sic] has a number of people getting on, number of people getting off, a spot check, and a current load column. <<


The current load is a computed column and we do not store such
computations in a good schema design.


>> The spot check column is the to verify that the sensors on the bus are working correctly; the most reliable number to be use in the is spot_check<<


Why did you use a vague, magical, non-relational, universal "id" column
instead of a relational key? The stops are a known attribute of a
route, but you left them out of the data model in spite of talking
about them.

CREATE TABLE BusTraffic
(route_nbr INTEGER NOT NULL ,
stop_nbr INTEGER NOT NULL,
bus_nbr INTEGER NOT NULL
REFERENCES Buses(bus_nbr),
PRIMARY KEY (route_nbr, stop_nbr),
boarding_cnt INTEGER DEFAULT 0 NOT NULL,
disembark_cnt INTEGER DEFAULT 0 NOT NULL,
spotcheck_cnt INTEGER -- null means not done
);


other stuff not shown might be ..


CREATE TABLE Buses
(bus_nbr INTEGER NOT NULL PRIMARY KEY,
bus_capacity INTEGER NOT NULL,
etc.);



>> Now, I want to calculate the load at each stop along the way... I can't do simple sums of ons and offs to calculate load, because of the SPOT CHECK column. <<


Probably something like this

SELECT T2.route_nbr, T2.stop_nbr,
SUM (CASE WHEN spotcheck_cnt IS NULL
THEN (boarding_cnt - disembark_cnt)
ELSE spotcheck_cnt END) AS bus_load,
(CASE WHEN spotcheck_cnt IS NULL
THEN 'spot checked'
ELSE 'computed' END) AS verification
FROM BusTraffic AS T1, BusTraffic AS T2
WHERE T1.route_nbr = T2.route_nbr
AND T1.stop_nbr <= T2.stop_nbr;


If you can use the OLAP SUM() OVER() functions in SQL, this will be
easier and faster than a self-join.

No comments: