|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Finding Correct Primary KeyI could not reach a conclusion on this. Maybe someone helps me to understand better. I have a consolidated text extract from other systems and want to upload the data to an SQL table. Below is the fields in extract. OrderNo LineNumber SellingCompany PurcCtry FinalizeDt ItemNo CustomerID SalesRep Quantity Sales_USD At first analysis, I thought OrderNo and LineNumber are the PK of the table and designed the table. When I receive the data I realized that since the data is coming from different sources. Some of them has duplicate Orderno LineNumber combination. Now my data has two different of type of data one has unique key of Orderno LineNumber combination. On the other hand second part of data needs OrderNo LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key. Question is : Should I keep these data in two different table, or should I keep them in a single table by adding more columns on PK ? What is the affect of defining larger PK even it is not needed for some of the data? My table has data has 1.5 million record and increasing 75000 each month. 70 % of data has Unique Orderno Line Number Data is used by sql server reporting services, I have also some plans to use analysis server in the future. Thanks in advance. erdal, You can use up to 16 columns for a composite primary key, however, make sure
you only use as many as you actually need, based on your data. If you expect frequent and large increases in table size, make the primary key nonclustered, and find a more appropriate candidate for the clustered index. An ideal candidate for this kind of situation is a column with non-nullable unique values, that increase for each new inserted row (e.g. an identity column). Using several tables is an option, yet it may complicate querying for the reporting purposes. ML > Question is : Should I keep these data in two different table, or should I It seems to me that the data from the 2 sources represents different > keep them in a single table by adding more columns on PK ? entities so you should store these in different tables. > On the other hand second part of data needs OrderNo You need to do some analysis to determine the appropriate PK. What does the > LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key. data from this data source actually represent (e.g. order details history or different order entry systems)? Simply adding columns until you come up with a unique combination isn't the right approach. Data may be unique based on your current data sample but not necessarily in the future. Perhaps the data is bad and you are only masking an underlying problem. Maybe some data scrubbing is needed before import. These are all issues that need to be addressed in order to make intelligent schema design decisions. -- Show quoteHope this helps. Dan Guzman SQL Server MVP "Erdal Akbulut" <erdali***@yahoo.com> wrote in message news:ukME7Fv1FHA.2348@TK2MSFTNGP15.phx.gbl... > Hello, > > I could not reach a conclusion on this. Maybe someone helps me to > understand > better. > > I have a consolidated text extract from other systems and want to upload > the data to an SQL table. > > Below is the fields in extract. > > OrderNo > LineNumber > SellingCompany > PurcCtry > FinalizeDt > ItemNo > CustomerID > SalesRep > Quantity > Sales_USD > > > At first analysis, I thought OrderNo and LineNumber are the PK of the > table > and designed the table. > > When I receive the data I realized that since the data is coming from > different sources. Some of them has duplicate Orderno LineNumber > combination. > > Now my data has two different of type of data one has unique key of > Orderno > LineNumber combination. On the other hand second part of data needs > OrderNo > LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key. > > Question is : Should I keep these data in two different table, or should I > keep them in a single table by adding more columns on PK ? > > What is the affect of defining larger PK even it is not needed for some of > the data? > > My table has data has 1.5 million record and increasing 75000 each month. > > 70 % of data has Unique Orderno Line Number > > Data is used by sql server reporting services, I have also some plans to > use > analysis server in the future. > > Thanks in advance. > > > > erdal, > > On Sat, 22 Oct 2005 13:15:06 +0300, Erdal Akbulut wrote:
Show quote >Hello, Hi Erdal,> >I could not reach a conclusion on this. Maybe someone helps me to understand >better. > >I have a consolidated text extract from other systems and want to upload >the data to an SQL table. > >Below is the fields in extract. > >OrderNo >LineNumber >SellingCompany >PurcCtry >FinalizeDt >ItemNo >CustomerID >SalesRep >Quantity >Sales_USD > > >At first analysis, I thought OrderNo and LineNumber are the PK of the table >and designed the table. > >When I receive the data I realized that since the data is coming from >different sources. Some of them has duplicate Orderno LineNumber >combination. (snip) >Question is : Should I keep these data in two different table, or should I >keep them in a single table by adding more columns on PK ? It seems to me that you're approaching this from the wrong way. You should start with how the data is supposed to be related in the business. Not knowing your business and making some wild assumptions about the meaning of some columns names, my first guess is that (OrderNo, LineNumber) might be a good candidate key for this table - but that's noway sure! If your business assigns one and only one salesrep toe ach customer, than SalesRep should be in the Customers table and can be removed from this table. Similar, if each customer makes all it's purchases in one country, then PurcCtry should be moved to the Customers table as well. There might be more similar dependencies between the columns that only someone with knowledge of the business model of your company can find. WRT the duplicated order numbers: if these sources each have their own system for assigning order numbers (as might be the case after the merger of two companies, or in a company with independent branches) and duplicates are to be expected, then you should add a column to indicate the source (not the data source, but the source of the order - e.g. the name or ID of the branch where the order comes from), and add that column to the primary key. OTOH, if the order numbers really should be unique in the complete collection of data, then you'll have to hunt down the reason why duplicates were entered in the first place, fix the problem, clean up the erronous data before proceeding with this consolidation project. (previously snipped text:) >Now my data has two different of type of data one has unique key of Orderno I'm not sure if I understand you right. Maybe you're just trying to find>LineNumber combination. On the other hand second part of data needs OrderNo >LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key. a set of columns that would technically (but not logically) qualify as primary key - in that case see above: either add a column or fix the real problem. But if you're saying that the business model logically implies that the data from one source has (OrderNo, LineNumber, FinalizeDt, ItemNo, CustomerID) as the "real" primary key, then a) You have a very unusual business model (but of course, mine is not to judge), and b) You can't combine the data from the two sources into one table, since they have a different logical structure - it would be like trying to combine ledger entries and personnel records into one table. >What is the affect of defining larger PK even it is not needed for some of You'll get corrupted data in your table. Would you consider defining the>the data? primary key on your Personnel table as (EmployeeID, EmpName)? I'd hope not - since it's not the question IF you'll end up with duplicated data, but WHEN you'll get it. And the answer is "sooner than you think". Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) > When I receive the data I realized that since the data is coming from It sounds as if what you need is something like SourceID, OrderNo, > different sources. Some of them has duplicate Orderno LineNumber > combination. > LineNumber. If SourceID is not included in the extract files, then populate it during the transform/load step. .. Thanks for the answers,
Let me try to clarify the issue. we have 7 companies running on the same application, and 2 more using different systems. All applications run on a mainframe environment and the data is also consolidated in mainframe. For the first 7 companies. OrderNo and Orderline are the candidates. No problem at this part. For the other 2 companies, we do not have the data in orderline level. The data is summed by item, customer etc by month, and they have dummy ordenos as 999999 and line no as 001 Currently I have two tables. one for the first 7 companies having Orderno and Linenumber as PK. and another table for the other 2 companies having PK with multiple columns. I was thinking to merge the data in one table and define a larger PK having PK columns from first and second table. Seems it is not recommended in this case. regards, Show quote "Erdal Akbulut" <erdali***@yahoo.com> wrote in message news:ukME7Fv1FHA.2348@TK2MSFTNGP15.phx.gbl... > Hello, > > I could not reach a conclusion on this. Maybe someone helps me to understand > better. > > I have a consolidated text extract from other systems and want to upload > the data to an SQL table. > > Below is the fields in extract. > > OrderNo > LineNumber > SellingCompany > PurcCtry > FinalizeDt > ItemNo > CustomerID > SalesRep > Quantity > Sales_USD > > > At first analysis, I thought OrderNo and LineNumber are the PK of the table > and designed the table. > > When I receive the data I realized that since the data is coming from > different sources. Some of them has duplicate Orderno LineNumber > combination. > > Now my data has two different of type of data one has unique key of Orderno > LineNumber combination. On the other hand second part of data needs OrderNo > LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key. > > Question is : Should I keep these data in two different table, or should I > keep them in a single table by adding more columns on PK ? > > What is the affect of defining larger PK even it is not needed for some of > the data? > > My table has data has 1.5 million record and increasing 75000 each month. > > 70 % of data has Unique Orderno Line Number > > Data is used by sql server reporting services, I have also some plans to use > analysis server in the future. > > Thanks in advance. > > > > erdal, > > On Mon, 24 Oct 2005 18:30:15 +0300, Erdal Akbulut wrote:
Show quote >Thanks for the answers, Hi Erdal,> >Let me try to clarify the issue. > >we have 7 companies running on the same application, and 2 more using >different systems. All applications run on a mainframe environment and the >data is also consolidated in mainframe. > >For the first 7 companies. OrderNo and Orderline are the candidates. No >problem at this part. > >For the other 2 companies, we do not have the data in orderline level. The >data is summed by item, customer etc by month, and they have dummy ordenos >as 999999 and line no as 001 > > >Currently I have two tables. one for the first 7 companies having Orderno >and Linenumber as PK. and another table for the other 2 companies having PK >with multiple columns. > >I was thinking to merge the data in one table and define a larger PK having >PK columns from first and second table. > >Seems it is not recommended in this case. Seems to me that you have two options to chose from: A) Use two tables. One keyed on (OrderNo, LineNumber) to store the data from the first 7 companies, and the second one with a smaller amount of columns, and keyed on (Item, Customer, etc, Month) for the last 2 companies. Don't introducce fake order and line numbers in the last table, they'll only confuse people. B) Use a staging table to accept the data from the first 7 companies. Aggregate this data to the same level as the data you receive from the other 2 companies. Then combine the data from the latter 2 companies with the aggregated data from the first 7 companies into one table. Which of these solutions suits your problem best is for you to decide, since I don't know what you're planning to do with the data, once it's in the database. Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) |
|||||||||||||||||||||||