|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Database design questionwould seem that (given my approach) being too normalized can also lead to delays when querying. I have a table that contains assignment data (an assignment in this case is like a case file). Through it's life time the assignment goes through a bunch of different states (Open, closed, pending Acceptance, etc). Occasionally users have to search for Assignments and filter the results based on how long an Assignment has been in a particular state - some states have date thresholds within the state category that further divide the state (i.e. Pending Acknowledgment - the thresholds would be: after 1 day: No Acknowledgment, 3 days: No Acknowledgment (3), 7 Days: No Acknowledgement - Stale, 20 days: No Acknowledgement - Return). CREATE TABLE [Assignment] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [PrimaryStateID] [int] NOT NULL , [SecondaryStateID] [int] NOT NULL ) CREATE TABLE [HISTState] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [TableRowID] [int] NOT NULL , [Date] [datetime] NOT NULL, [StateID] [int] NOT NULL , ) CREATE TABLE [AssignmentStates] ( [ID] [int] NOT NULL , [AssignmentState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) Assignment: Table that holds information about an assignment (approx 5000 records) HISTState: Assignment state history table (approx 40000 records) AssignmentStates: Table containing all the various possible assignment states. (approx 50 records) *** Please note I've removed some of the fields to reduce the size of this post. I need to get a list of Assignments but I have to filter by the length of time it has been with a given Assignment state so I use the following query: SELECT ASS.ID FROM Assignment AS ASS LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1 [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC)) WHERE HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending Acknowledgement') AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date]) **** Please note that the query has other components however the bottleneck is this JOIN. Anyway, this query returns about 2500 records but the JOIN has to go through about 191 million records because of the manner in which I'm constucting the query - this is ovbiously slowing things down! So, my choices are: rework the query to something more efficient but I don't necessarily know how to go about that, or 'de-normalize' the database and store some of the state/date data on the Assignment table so I can quickly query it. I don't like the second idea because it is not clean and takes me away from a normalized database. Do I add a date field to Assignment table for each State (Primary, Secondary) to indicate when it was last put into that state, or should I keep those values locked in the HISTState table? I hope I have offered enough information to work with in this post. If not I will be happy to elaborate if necessary. Thanks, Frank Frank
SELECT ASS.ID FROM Assignment AS ASS LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1 [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC)) WHERE HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending Acknowledgement') AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date]) The above query will failed due to incorrect syntax. ORDER BY clause must be a last statement in the query > I hope I have offered enough information to work with in this post. If Well, some more info about the indexes defime on the table as well as > not I will be happy to elaborate if necessary. primary--foreign keys and PLEASE post some sample data to be tested Show quoteHide quote "Frank" <mrpubni***@hotmail.com> wrote in message news:1157947373.078626.179850@p79g2000cwp.googlegroups.com... > I'm trying to design my database to be as normalized as possible but it > would seem that (given my approach) being too normalized can also lead > to delays when querying. > > I have a table that contains assignment data (an assignment in this > case is like a case file). Through it's life time the assignment goes > through a bunch of different states (Open, closed, pending Acceptance, > etc). Occasionally users have to search for Assignments and filter the > results based on how long an Assignment has been in a particular state > - some states have date thresholds within the state category that > further divide the state (i.e. Pending Acknowledgment - the thresholds > would be: after 1 day: No Acknowledgment, 3 days: No Acknowledgment > (3), 7 Days: No Acknowledgement - Stale, 20 days: No Acknowledgement - > Return). > > CREATE TABLE [Assignment] ( > [ID] [int] IDENTITY (1, 1) NOT NULL , > [PrimaryStateID] [int] NOT NULL , > [SecondaryStateID] [int] NOT NULL > ) > > CREATE TABLE [HISTState] ( > [ID] [int] IDENTITY (1, 1) NOT NULL , > [TableRowID] [int] NOT NULL , > [Date] [datetime] NOT NULL, > [StateID] [int] NOT NULL , > ) > > CREATE TABLE [AssignmentStates] ( > [ID] [int] NOT NULL , > [AssignmentState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS > NOT NULL > ) > > Assignment: Table that holds information about an assignment (approx > 5000 records) > HISTState: Assignment state history table (approx 40000 records) > AssignmentStates: Table containing all the various possible assignment > states. (approx 50 records) > > *** Please note I've removed some of the fields to reduce the size of > this post. > > I need to get a list of Assignments but I have to filter by the length > of time it has been with a given Assignment state so I use the > following query: > > SELECT > ASS.ID > > FROM > Assignment AS ASS > LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1 > [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC)) > > WHERE > HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending > Acknowledgement') > AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date]) > > **** Please note that the query has other components however the > bottleneck is this JOIN. > > Anyway, this query returns about 2500 records but the JOIN has to go > through about 191 million records because of the manner in which I'm > constucting the query - this is ovbiously slowing things down! > > So, my choices are: rework the query to something more efficient but I > don't necessarily know how to go about that, or 'de-normalize' the > database and store some of the state/date data on the Assignment table > so I can quickly query it. I don't like the second idea because it is > not clean and takes me away from a normalized database. > > Do I add a date field to Assignment table for each State (Primary, > Secondary) to indicate when it was last put into that state, or should > I keep those values locked in the HISTState table? > > I hope I have offered enough information to work with in this post. If > not I will be happy to elaborate if necessary. > > Thanks, > Frank > Uri,
Thanks for the comment but please verify the brackets in the JOIN. The ORDER BY clause is for the subquery in the JOIN, not the primary query. Thanks, Frank. Uri Dimant wrote: Show quoteHide quote > Frank > > SELECT > ASS.ID > > FROM > Assignment AS ASS > LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1 > [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC)) > > WHERE > HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending > Acknowledgement') > AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date]) > > The above query will failed due to incorrect syntax. ORDER BY clause must > be a last statement in the query > > > I hope I have offered enough information to work with in this post. If > > not I will be happy to elaborate if necessary. > Well, some more info about the indexes defime on the table as well as > primary--foreign keys and PLEASE post some sample data to be tested > > > > "Frank" <mrpubni***@hotmail.com> wrote in message > news:1157947373.078626.179850@p79g2000cwp.googlegroups.com... > > I'm trying to design my database to be as normalized as possible but it > > would seem that (given my approach) being too normalized can also lead > > to delays when querying. > > > > I have a table that contains assignment data (an assignment in this > > case is like a case file). Through it's life time the assignment goes > > through a bunch of different states (Open, closed, pending Acceptance, > > etc). Occasionally users have to search for Assignments and filter the > > results based on how long an Assignment has been in a particular state > > - some states have date thresholds within the state category that > > further divide the state (i.e. Pending Acknowledgment - the thresholds > > would be: after 1 day: No Acknowledgment, 3 days: No Acknowledgment > > (3), 7 Days: No Acknowledgement - Stale, 20 days: No Acknowledgement - > > Return). > > > > CREATE TABLE [Assignment] ( > > [ID] [int] IDENTITY (1, 1) NOT NULL , > > [PrimaryStateID] [int] NOT NULL , > > [SecondaryStateID] [int] NOT NULL > > ) > > > > CREATE TABLE [HISTState] ( > > [ID] [int] IDENTITY (1, 1) NOT NULL , > > [TableRowID] [int] NOT NULL , > > [Date] [datetime] NOT NULL, > > [StateID] [int] NOT NULL , > > ) > > > > CREATE TABLE [AssignmentStates] ( > > [ID] [int] NOT NULL , > > [AssignmentState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS > > NOT NULL > > ) > > > > Assignment: Table that holds information about an assignment (approx > > 5000 records) > > HISTState: Assignment state history table (approx 40000 records) > > AssignmentStates: Table containing all the various possible assignment > > states. (approx 50 records) > > > > *** Please note I've removed some of the fields to reduce the size of > > this post. > > > > I need to get a list of Assignments but I have to filter by the length > > of time it has been with a given Assignment state so I use the > > following query: > > > > SELECT > > ASS.ID > > > > FROM > > Assignment AS ASS > > LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1 > > [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC)) > > > > WHERE > > HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending > > Acknowledgement') > > AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date]) > > > > **** Please note that the query has other components however the > > bottleneck is this JOIN. > > > > Anyway, this query returns about 2500 records but the JOIN has to go > > through about 191 million records because of the manner in which I'm > > constucting the query - this is ovbiously slowing things down! > > > > So, my choices are: rework the query to something more efficient but I > > don't necessarily know how to go about that, or 'de-normalize' the > > database and store some of the state/date data on the Assignment table > > so I can quickly query it. I don't like the second idea because it is > > not clean and takes me away from a normalized database. > > > > Do I add a date field to Assignment table for each State (Primary, > > Secondary) to indicate when it was last put into that state, or should > > I keep those values locked in the HISTState table? > > > > I hope I have offered enough information to work with in this post. If > > not I will be happy to elaborate if necessary. > > > > Thanks, > > Frank > > Frank,
Try to get rid of a UDF in your WHERE clause - that's typically a performance killer. Also consider using covering indexes or an indexed view. >> I'm trying to design my database to be as normalized as possible .. << Get rid of the IDENTITY columns and add relational keys to the tables.You are not even in 1NF yet. >> Through it's life time the assignment goes through a bunch of different states (Open, closed, pending Acceptance, etc). << Read the article Transition Constraints at:http://www.dbazine.com/ofinterest/oi-articles/celko35/view?searchterm=Celko%20transition You will need add a temporal interval column the state table that you can use with time stamps in the assignments table. You also need to read ISO-11179 or a book on data modeling; there is no such thing as a "<something>_state_id" -- the attribute is either a state or an identifer and cannot be both. Also, why did you use a singular name fo a table less it has only one row? Data element names like "table_row_id" have no place whatsoever in an RDBMS -- they are physical locators and not logic identifiers. Stop writing with UDFs. They are proprietary and usually not needed if you have a good schema design. SQL wants to work with data and declarations, not procedural code. Here is a skeleton: CREATE TABLE AssignmentHistory (assignment_id CHAR(10) NOT NULL, --industry standard or carefully designed assignment_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL PRIMARY KEY (assignment_id, assignment_date), prior_state INTEGER NOT NULL, current_state INTEGER NOT NULL, FOREIGN KEY(prior_state, current_state) REFERENCES Transitions(prior_state, current_state), etc.); --This is a a state machine table from freshman comp sci! CREATE TABLE Transitions (transition_name CHAR(10) NOT NULL PRIMARY KEY, prior_state INTEGER NOT NULL, current_state INTEGER NOT NULL, PRIMARY KEY KEY(prior_state, current_state), transition_duration INTEGER NOT NULL CHECK (transition_duration > 0), etc.); >> I need to get a list of Assignments but I have to filter by the length of time it has been with a given Assignment state << Quick and untested skeleton:SELECT A.assignment_id FROM AssignmentHistory AS A, Transitons AS T WHERE A.currrent_state =5 -- pending acknowledgment AND CURRENT_TIMESTAMP >= DATEADD(DD, T.transition_duration, A.assignment_date) AND A.currrent_state = T.current_state;
Other interesting topics
|
|||||||||||||||||||||||