SQL Apprentice Question
Cursor performance for a real world application.
I have a table that holds records for a stops along a bus route. Each
records has a number of people getting on, number of people getting
off, a spot check, and a current load column.
The spot check column is the to verify that the sensors on the bus are
working correctly; the most reliable number to be use in the is
SPOT_CHECK.
Table Structure as follows:
ID (identity column)
ROUTE (Description of the bus route)
ONS (# of people getting on)
OFFS (# of people getting off)
SPOT_CHECK (visual spot count of people on the bus)
LOAD (Calculated load on the bus)
ID ROUTE ONS OFFS SPOT_CHECK LOAD
1 AAAA 5 0 null
2 AAAA 0 0 null
3 AAAA 2 1 null
4 AAAA 0 2 5
5 AAAA 6 3 8
6 AAAA 0 5 null
7 AAAA 1 2 null
8 AAAA 0 1 null
9 AAAA 0 1 null
10 AAAA 0 0 null
Now, I want to calculate the load at each stop along the way.
Load = Previous stops load + current stop ONS - Current stop's OFFS if
SPOT_CHECK is null, otherwise LOAD = SPOT_CHECK
So the results of the above table will be as follows:
ID ROUTE ONS OFFS SPOT_CHECK LOAD
1 AAAA 5 0 null 5
2 AAAA 0 0 null 5
3 AAAA 2 1 null 6
4 AAAA 0 2 5 5
5 AAAA 6 3 10 10
6 AAAA 0 5 null 5
7 AAAA 1 2 4 4
8 AAAA 0 1 null 3
9 AAAA 0 1 2 2
10 AAAA 0 2 null 0
Current programming, we using a cursor and seeing much much slower
performance than in our Oracle app. Does anyone see a way of doing
this load calculation without using a cursor and improving performance?
I can't do simple sums of ons and offs to calculate load, because of
the SPOT CHECK column.
Celko Answers
>> I have a table that holds records [sic] for a stops along a bus route. <<
Please post DDL and do not confuse records and rows.
>> records [sic] has a number of people getting on, number of people getting off, a spot check, and a current load column. <<
The current load is a computed column and we do not store such
computations in a good schema design.
>> The spot check column is the to verify that the sensors on the bus are working correctly; the most reliable number to be use in the is spot_check<<
Why did you use a vague, magical, non-relational, universal "id" column
instead of a relational key? The stops are a known attribute of a
route, but you left them out of the data model in spite of talking
about them.
CREATE TABLE BusTraffic
(route_nbr INTEGER NOT NULL ,
stop_nbr INTEGER NOT NULL,
bus_nbr INTEGER NOT NULL
REFERENCES Buses(bus_nbr),
PRIMARY KEY (route_nbr, stop_nbr),
boarding_cnt INTEGER DEFAULT 0 NOT NULL,
disembark_cnt INTEGER DEFAULT 0 NOT NULL,
spotcheck_cnt INTEGER -- null means not done
);
other stuff not shown might be ..
CREATE TABLE Buses
(bus_nbr INTEGER NOT NULL PRIMARY KEY,
bus_capacity INTEGER NOT NULL,
etc.);
>> Now, I want to calculate the load at each stop along the way... I can't do simple sums of ons and offs to calculate load, because of the SPOT CHECK column. <<
Probably something like this
SELECT T2.route_nbr, T2.stop_nbr,
SUM (CASE WHEN spotcheck_cnt IS NULL
THEN (boarding_cnt - disembark_cnt)
ELSE spotcheck_cnt END) AS bus_load,
(CASE WHEN spotcheck_cnt IS NULL
THEN 'spot checked'
ELSE 'computed' END) AS verification
FROM BusTraffic AS T1, BusTraffic AS T2
WHERE T1.route_nbr = T2.route_nbr
AND T1.stop_nbr <= T2.stop_nbr;
If you can use the OLAP SUM() OVER() functions in SQL, this will be
easier and faster than a self-join.
Friday, October 06, 2006
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment