|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
looking for equivalent to alphanumeric identity columni'm looking for a solution that would generate the equivalent of a case-inspecific, alphanumeric identity column. which is to say a column that iterates through guaranteed unique values, but using case-inspecific alphanumeric characters instead of just an integer value. or, another way to think of it would be a base 36 number, perhaps. i have written a method that will turn turn a 32 bit integer value into the appropriate 6-character string, shaving off some of the higher order bits which are excess. so i was thinking i could just keep a table that stores the "last used integer" and when a new alphanumeric identity is needed, get the last used integer value, increment it, convert it to a string for use as the alphanumeric, then update the table with the new integer value. would that be transactionally sound, though? for example, could someone come in after another person read the value, but before the the table was updated with the incremented value, and end up with duplicate alphanumerics? any other solutions also welcome. NOTE: one possibly complicating factor is that while most of the alphanumeric strings can just be the alphanumeric version of the integer, one of the strings actually has to be a bit-scrambled version of the integer. i've already written the routine to do the bit-scrambling as well, but that's something to consider when making suggestions. if there's a better way to a random or semi-random looking sequence of strings out of what is originally an incrementing integer value, feel free to suggest that as well! thanks for any help, jason >> if there's a better way to a random or semi-random looking sequence of strings out of what is originally an incrementing integer value, feel free to suggest that as well! << Here is an implementation of the additive congruential method ofgenerating values in pseudo-random order and is due to Roy Hann of Rational Commerce Limited, a CA-Ingres consulting firm. It is based on a shift-register and an XOR-gate, and it has its origins in cryptography. While there are other ways to do this, this code is nice because: 1) The algorithm can be written in C or another low level language for speed. But math is fairly simple even in base ten. 2) The algorithm tends to generate successive values that are (usually) "far apart", which is handy for improving the performance of tree indexes. You will tend to put data on separate physical data pages in storage. 3) The algorithm does not cycle until it has generated every possible value, so we don't have to worry about duplicates. Just count how many calls have been made to the generator. 4) The algorithm produces uniformly distributed values, which is a nice mathematical property to have. It also does not include zero. Generalizing the algorithm to arbitrary binary word sizes, and therefore longer number sequences, is not as easy as you might think. Finding the "tap" positions where bits are extracted for feedback varies according to the word-size in an extremely non-obvious way. Choosing incorrect tap positions results in an incomplete and usually very short cycle, which is unusable. If you want the details and tap positions for words of one to 100 bits, see E. J. Watson, "Primitive Polynomials (Mod 2)", Mathematics of Computation, v.16, 1962, p.368-369. Here is code for a 31-bit integer, which you can use: UPDATE Generator31 SET keyval = keyval/2 + MOD(MOD(keyval, 2) + MOD(keyval/8, 2), 2)*2^30; Or if you prefer, the algorithm in C: int Generator31 () {static int n = 1; n = n >> 1 | ((n^n >> 3) & 1) << 30; return n; } wow, this looks perfect. much better than the NOT and bit array
shuffling approach i have been tinkering with so far. thanks so much for all the information! a question about the transactional timing of the SQL version of this algorithm: what is the most reliable way to get the value of keyval in such a way as to ensure that the procedure calling the update statement is the only procedure that receives the corresponding keyval value. sort of like a classic semaphore, read-and-set? will putting the update and select statements into a transaction accomplish this singularity of selected keyval values per update? thanks again, jason hmm, having trouble finding the E. J. Watson article you mention at my
local college. the only reference i've found to it is in a mathsci journal database online, which costs a few thousand per year to subscribe to :) i assume this article would give me the key "tap" points with which to modify the algorithm to apply to bit patterns of different size (such as 32 instead of 31)? 1) Do updates and the audit stuff with a SERIALIZABLE isolation level
for the procedure. And look at tabel lockign options. 2) Look up "Roy Hann" and "Ingres" for an article on this. You do not want to do 32 bits -- it is messy and causes an overflaow on a 32 bit machine. We are talking about over 2 billion numbers already; are you McDonald's? thanks for the feedback.
re: 2) hehe, well, i was hoping for a 6 character alphanumeric string. that's inherently 36^6 possibilities, which is just a tiny bit more than what can fit in a 31 bit pattern. though i suppose i wouldn't have to drop a whole character from the string. i could simply stick to whatever 6 string values are available in the 31 bit pattern. ideally, i could find a way to incorporate the restriction i'm imposing on the possible string values, which is no more than 2 consecutive characters (to avoid accidental curse words and such). that significantly reduces the number of combinations to well within the 31 bit range, but i'm not sure how i could collapse that reduced range into the bit pattern. it seems like i could only iterate through the bit patterns with this algorithm, test if it is a "valid" bit pattern with regard to the character restriction, and if so, use it, if not, iterate again. to to somehow iterate through the bit pattern, and have every number correspond to a known valid string, would involve coming up with some kind of lookup-table where the valid values have been collapsed, and are available by some kind of iterable index? that seems like a lot of work, and all it would do is: (1) let me reduce the size of the bit pattern from 31 to 30 bits, and (2) allow me to get ALL of the possible values in the range, instead of those within the 31 bit pattern range. i doubt it's worth it. |
|||||||||||||||||||||||