Home All Groups Group Topic Archive Search About

Database design question

Author
11 Sep 2006 4:02 AM
Frank
I'm trying to design my database to be as normalized as possible but it
would seem that (given my approach) being too normalized can also lead
to delays when querying.

I have a table that contains assignment data (an assignment in this
case is like a case file).  Through it's life time the assignment goes
through a bunch of different states (Open, closed, pending Acceptance,
etc).  Occasionally users have to search for Assignments and filter the
results based on how long an Assignment has been in a particular state
- some states have date thresholds within the state category that
further divide the state (i.e. Pending Acknowledgment - the thresholds
would be: after 1 day: No Acknowledgment, 3 days: No Acknowledgment
(3), 7 Days: No Acknowledgement - Stale, 20 days: No Acknowledgement -
Return).

CREATE TABLE [Assignment] (
    [ID] [int] IDENTITY (1, 1) NOT NULL ,
    [PrimaryStateID] [int] NOT NULL ,
    [SecondaryStateID] [int] NOT NULL
)

CREATE TABLE [HISTState] (
    [ID] [int] IDENTITY (1, 1) NOT NULL ,
    [TableRowID] [int] NOT NULL ,
    [Date] [datetime] NOT NULL,
    [StateID] [int] NOT NULL ,
)

CREATE TABLE [AssignmentStates] (
    [ID] [int] NOT NULL ,
    [AssignmentState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS
NOT NULL
)

Assignment: Table that holds information about an assignment (approx
5000 records)
HISTState: Assignment state history table (approx 40000 records)
AssignmentStates: Table containing all the various possible assignment
states. (approx 50 records)

*** Please note I've removed some of the fields to reduce the size of
this post.

I need to get a list of Assignments but I have to filter by the length
of time it has been with a given Assignment state so I use the
following query:

SELECT
     ASS.ID

FROM
     Assignment AS ASS
          LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1
[ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC))

WHERE
     HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending
Acknowledgement')
     AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date])

**** Please note that the query has other components however the
bottleneck is this JOIN.

Anyway, this query returns about 2500 records but the JOIN has to go
through about 191 million records because of the manner in which I'm
constucting the query - this is ovbiously slowing things down!

So, my choices are: rework the query to something more efficient but I
don't necessarily know how to go about that, or 'de-normalize' the
database and store some of the state/date data on the Assignment table
so I can quickly query it.  I don't like the second idea because it is
not clean and takes me away from a normalized database.

Do I add a date field to Assignment table for each State (Primary,
Secondary) to indicate when it was last put into that state, or should
I keep those values locked in the HISTState table?

I hope I have offered enough information to work with in this post.  If
not I will be happy to elaborate if necessary.

Thanks,
Frank

Author
11 Sep 2006 6:39 AM
Uri Dimant
Frank

SELECT
     ASS.ID

FROM
     Assignment AS ASS
          LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1
[ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC))

WHERE
     HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending
Acknowledgement')
     AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date])

The above query will failed due to incorrect syntax. ORDER BY clause  must
be  a last statement in the query

> I hope I have offered enough information to work with in this post.  If
> not I will be happy to elaborate if necessary.
Well,  some more info about the indexes defime on the table as well as
primary--foreign  keys and PLEASE post some sample data to be tested



Show quote
"Frank" <mrpubni***@hotmail.com> wrote in message
news:1157947373.078626.179850@p79g2000cwp.googlegroups.com...
> I'm trying to design my database to be as normalized as possible but it
> would seem that (given my approach) being too normalized can also lead
> to delays when querying.
>
> I have a table that contains assignment data (an assignment in this
> case is like a case file).  Through it's life time the assignment goes
> through a bunch of different states (Open, closed, pending Acceptance,
> etc).  Occasionally users have to search for Assignments and filter the
> results based on how long an Assignment has been in a particular state
> - some states have date thresholds within the state category that
> further divide the state (i.e. Pending Acknowledgment - the thresholds
> would be: after 1 day: No Acknowledgment, 3 days: No Acknowledgment
> (3), 7 Days: No Acknowledgement - Stale, 20 days: No Acknowledgement -
> Return).
>
> CREATE TABLE [Assignment] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [PrimaryStateID] [int] NOT NULL ,
> [SecondaryStateID] [int] NOT NULL
> )
>
> CREATE TABLE [HISTState] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [TableRowID] [int] NOT NULL ,
> [Date] [datetime] NOT NULL,
> [StateID] [int] NOT NULL ,
> )
>
> CREATE TABLE [AssignmentStates] (
> [ID] [int] NOT NULL ,
> [AssignmentState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS
> NOT NULL
> )
>
> Assignment: Table that holds information about an assignment (approx
> 5000 records)
> HISTState: Assignment state history table (approx 40000 records)
> AssignmentStates: Table containing all the various possible assignment
> states. (approx 50 records)
>
> *** Please note I've removed some of the fields to reduce the size of
> this post.
>
> I need to get a list of Assignments but I have to filter by the length
> of time it has been with a given Assignment state so I use the
> following query:
>
> SELECT
>     ASS.ID
>
> FROM
>     Assignment AS ASS
>          LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1
> [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC))
>
> WHERE
>     HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending
> Acknowledgement')
>     AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date])
>
> **** Please note that the query has other components however the
> bottleneck is this JOIN.
>
> Anyway, this query returns about 2500 records but the JOIN has to go
> through about 191 million records because of the manner in which I'm
> constucting the query - this is ovbiously slowing things down!
>
> So, my choices are: rework the query to something more efficient but I
> don't necessarily know how to go about that, or 'de-normalize' the
> database and store some of the state/date data on the Assignment table
> so I can quickly query it.  I don't like the second idea because it is
> not clean and takes me away from a normalized database.
>
> Do I add a date field to Assignment table for each State (Primary,
> Secondary) to indicate when it was last put into that state, or should
> I keep those values locked in the HISTState table?
>
> I hope I have offered enough information to work with in this post.  If
> not I will be happy to elaborate if necessary.
>
> Thanks,
> Frank
>
Author
11 Sep 2006 7:19 AM
Frank
Uri,

Thanks for the comment but please verify the brackets in the JOIN.  The
ORDER BY clause is for the subquery in the JOIN, not the primary query.

Thanks,
Frank.


Uri Dimant wrote:
Show quote
> Frank
>
> SELECT
>      ASS.ID
>
> FROM
>      Assignment AS ASS
>           LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1
> [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC))
>
> WHERE
>      HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending
> Acknowledgement')
>      AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date])
>
> The above query will failed due to incorrect syntax. ORDER BY clause  must
> be  a last statement in the query
>
> > I hope I have offered enough information to work with in this post.  If
> > not I will be happy to elaborate if necessary.
> Well,  some more info about the indexes defime on the table as well as
> primary--foreign  keys and PLEASE post some sample data to be tested
>
>
>
> "Frank" <mrpubni***@hotmail.com> wrote in message
> news:1157947373.078626.179850@p79g2000cwp.googlegroups.com...
> > I'm trying to design my database to be as normalized as possible but it
> > would seem that (given my approach) being too normalized can also lead
> > to delays when querying.
> >
> > I have a table that contains assignment data (an assignment in this
> > case is like a case file).  Through it's life time the assignment goes
> > through a bunch of different states (Open, closed, pending Acceptance,
> > etc).  Occasionally users have to search for Assignments and filter the
> > results based on how long an Assignment has been in a particular state
> > - some states have date thresholds within the state category that
> > further divide the state (i.e. Pending Acknowledgment - the thresholds
> > would be: after 1 day: No Acknowledgment, 3 days: No Acknowledgment
> > (3), 7 Days: No Acknowledgement - Stale, 20 days: No Acknowledgement -
> > Return).
> >
> > CREATE TABLE [Assignment] (
> > [ID] [int] IDENTITY (1, 1) NOT NULL ,
> > [PrimaryStateID] [int] NOT NULL ,
> > [SecondaryStateID] [int] NOT NULL
> > )
> >
> > CREATE TABLE [HISTState] (
> > [ID] [int] IDENTITY (1, 1) NOT NULL ,
> > [TableRowID] [int] NOT NULL ,
> > [Date] [datetime] NOT NULL,
> > [StateID] [int] NOT NULL ,
> > )
> >
> > CREATE TABLE [AssignmentStates] (
> > [ID] [int] NOT NULL ,
> > [AssignmentState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS
> > NOT NULL
> > )
> >
> > Assignment: Table that holds information about an assignment (approx
> > 5000 records)
> > HISTState: Assignment state history table (approx 40000 records)
> > AssignmentStates: Table containing all the various possible assignment
> > states. (approx 50 records)
> >
> > *** Please note I've removed some of the fields to reduce the size of
> > this post.
> >
> > I need to get a list of Assignments but I have to filter by the length
> > of time it has been with a given Assignment state so I use the
> > following query:
> >
> > SELECT
> >     ASS.ID
> >
> > FROM
> >     Assignment AS ASS
> >          LEFT JOIN HISTState AS HSTATE ON (HSTATE.ID IN (SELECT TOP 1
> > [ID] FROM HISTState WHERE TableRowID=ASS.ID ORDER BY [Date] DESC))
> >
> > WHERE
> >     HSTATE.StateID = AssignmentStatesGET_IDByName_UDF('Pending
> > Acknowledgement')
> >     AND GETDATE() >= DATEADD(dw,@Threshold,HSTATE.[Date])
> >
> > **** Please note that the query has other components however the
> > bottleneck is this JOIN.
> >
> > Anyway, this query returns about 2500 records but the JOIN has to go
> > through about 191 million records because of the manner in which I'm
> > constucting the query - this is ovbiously slowing things down!
> >
> > So, my choices are: rework the query to something more efficient but I
> > don't necessarily know how to go about that, or 'de-normalize' the
> > database and store some of the state/date data on the Assignment table
> > so I can quickly query it.  I don't like the second idea because it is
> > not clean and takes me away from a normalized database.
> >
> > Do I add a date field to Assignment table for each State (Primary,
> > Secondary) to indicate when it was last put into that state, or should
> > I keep those values locked in the HISTState table?
> >
> > I hope I have offered enough information to work with in this post.  If
> > not I will be happy to elaborate if necessary.
> >
> > Thanks,
> > Frank
> >
Author
11 Sep 2006 1:32 PM
Alexander Kuznetsov
Frank,

Try to get rid of a UDF in your WHERE clause - that's typically a
performance killer.
Also consider using covering indexes or an indexed view.
Author
11 Sep 2006 2:19 PM
--CELKO--
>> I'm trying to design my database to be as normalized as possible .. <<

Get rid of the IDENTITY columns and add relational keys to the tables.
You are not even in 1NF yet.

>> Through it's life time the assignment goes through a bunch of different states (Open, closed, pending Acceptance, etc). <<

Read the article Transition Constraints at:
http://www.dbazine.com/ofinterest/oi-articles/celko35/view?searchterm=Celko%20transition

You will need add a temporal interval column the state table that you
can use with time stamps in the assignments table. You also need to
read ISO-11179 or a book on data modeling; there is no such thing as a
"<something>_state_id" -- the attribute is either a state or an
identifer and cannot be both. Also, why did you use a singular name fo
a table less it has only one row?  Data element names like
"table_row_id" have no place whatsoever in an RDBMS -- they are
physical locators and not logic identifiers.

Stop writing with UDFs.  They are proprietary and usually not needed if
you have a good schema design.  SQL wants to work with data and
declarations, not procedural code.  Here is a skeleton:

CREATE TABLE AssignmentHistory
(assignment_id CHAR(10) NOT NULL, --industry standard or carefully
designed
assignment_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL
PRIMARY KEY (assignment_id, assignment_date),
prior_state INTEGER NOT NULL,
current_state INTEGER NOT NULL,
   FOREIGN KEY(prior_state, current_state)
  REFERENCES Transitions(prior_state, current_state),
etc.);

--This is a a state machine table from freshman comp sci!
CREATE TABLE Transitions
(transition_name CHAR(10) NOT NULL PRIMARY KEY,
prior_state INTEGER NOT NULL,
current_state INTEGER NOT NULL,
   PRIMARY KEY KEY(prior_state, current_state),
transition_duration INTEGER NOT NULL
  CHECK (transition_duration > 0),
etc.);

>> I need to get a list of Assignments but I have to filter by the length of time it has been with a given Assignment state <<

Quick and untested skeleton:

SELECT A.assignment_id
  FROM AssignmentHistory AS A, Transitons AS T
WHERE A.currrent_state =5  -- pending acknowledgment
   AND CURRENT_TIMESTAMP >= DATEADD(DD, T.transition_duration,
A.assignment_date)
   AND A.currrent_state = T.current_state;

AddThis Social Bookmark Button