SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Friday, October 06, 2006

scramble ssn with sql server

SQL Apprentice Question
I'm trying to scramble the ssn#s within our database. I would need a 9
digit number to be converted into another 9 digit number in our dev
database.


Example #1:
ssn: 123456789 converts to 987654321


Also there is a catch, there is a possibility that there could be
duplicate ssn within a table due to bad data. I was the 2 records with
the same actual ssn# to be converted into the same scrambled ssn# using
sql server (so that the scrambled ssn#s match) for this issue.


Is there a way to do this?



Celko Answers
>> create table SSN_MASK


(EW_SSN integer identity primary key not null,
OLD_SSN varchar(9)); -- not sure how you defined it <<

SSN is always CHAR(9). Simply numbering it with a proprietary feature
is not that good; you have destroyed the data type. Here is one we
used in procedural code with arrays.


CREATE TABLE SSN_masks
(shift_id INTEGER NOT NULL PRIMARY KEY -- cols 8 & 9
CHECK (shift_id BETWEEN 00 AND 99),
col1 INTEGER NOT NULL,
col2 INTEGER NOT NULL,
col3 INTEGER NOT NULL,
col4 INTEGER NOT NULL,
col5 INTEGER NOT NULL,
col6 INTEGER NOT NULL,
col7 INTEGER NOT NULL);


You take the last two digits of the SSN (fastest changing values) and
look up a vector that tells you how to shift the remaining seven
digits.


shift(x) = ABS((x + col_n) % 11-10)


Since you have 100 different masks, the data gets scrambled pretty good
and it destroys the area digits which would otherwise repeat and give
information about the population geographic distribution. This is also
reversible because we preserve the shift identifier in the output;
shift all of them and it is a bitch to unscramble without the shift
array. Then you change the array when you get the next sample.

No comments: