SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, July 31, 2006

What data type should I use, Varchar or Decimal/Numeric

SQL Apprentice Question
I have a table which has the following fields:
ID (Int, IDENTITY)
SerialNumber (?)
OrderID (Integer)


The SerialNumber is a 15 digit number, and each Order may contain
10,000+ rows. The table will be populated through ADO.Net, the data
being appended from Excel spreadsheets that the user will upload.


Will I need need to do is produce reports highlighting which Serial
Numbers are duplicates that have appeared in previous orders, and also
allow the user to remove just those serial numbers from the current
OrderID. I think(!) I know how to do this bit in SQL, it's become a bit
rusty...


What I am not sure about is what the data type should be for Serial
Number, in terms of both size and speed. The number will always be 15
digits long, so should I use use. I've put down what the sizes of field
would be, correct me if I am wrong, I've never been good at figuring
these out!


Bigint = 8bytes
Char(15) = 15bytes
Varchar(15) = 30bytes
Numeric = ?
Decimal = ?


I am guessing that Bigint is the way to go, since it uses up the least
amount of space, but will it be the quickest for my comparison needs.
Bear in mind that I am expecting a few million records in this table.


But maybe I am hoping for too much and querying against this amount of
data will be extremely slow?


Your thoughts are greatly appreaciated.



Celko Answers
>> I have a table which has the following fields [sic]:


ID (Int, IDENTITY)
SerialNumber (?)
OrderID (Integer) <<

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.



>From just what you posted, it looks like you need to drop that silly


IDENTITY pseudo-column and that (order_id, serial_nbr) is the natural
key.

That is one of the constraints you can have on a column, which does not
apply to a field.



>> The SerialNumber is a 15 digit number, and each Order may contain 10,000+ rows. <<


Describe this serial number; check digits or validation rules? how do
you verify it? First design the encoding, then pick the data type. As
a practical observation, long identification numbers tend to be built
in sub-fields by a set of verifiable rules, so CHAR(15) might be a good
choice. For example, the 17 character VIN on automobiles. But without
any more specs, we are just guessing.

The one thing we are sure about is that you would never just rattle off
a simple sequence and accept that kind of data error rate.



>> Will I need need to do is produce reports highlighting which SerialNumbers are duplicates that have appeared in previous orders, and also allow the user to remove just those serial numbers from the current OrderID. <<


If your DDL was declared properly, then you would never have redundant
duplicates in the schema.

No comments: