Home All Groups Group Topic Archive Search About

Tough SQL problem, need expert advice!!!

Author
6 Jun 2006 9:25 PM
John
All,
   I think I am having a very tough problem, I need some expert advice
here.
   Please bear with me since it will takes me a while to explain the
situation.

   (Using SQL2005) I need to design the generic search utility on the
database server (run as Web Service), client supply the search criteria
(where clause)

I need return only one field:id back to user,

the data source the user need to search is coming from different
relational database tables , there is a main table and a few sub
tables, the main table has a primary key (id) and all the sub-table
have the main tables primary key as foreign key., but there are not
relationship among the sub tables, and client needs to able to search
the all the fields in all the tables.

Also, we have different database, each has its own main  table and sub
tables, and the structure of each table is different. for each set, I
have those information stored in my own meta table. For each set I
could create view, dynamically generate sql statement on the fly from
the metadata table I have, but the code that generate the dynamic SQL
has to be the same

I have two ways to solve this problem but neither is satisfactory

1)    Create a view or query that joins all the tables, because the
sub-tables only related to each other with the main table's primary
key, if I do a join on the main table and all sub- tables, I would
create a Cartesian products.

Select distinct id from cartesian_product_view where
maintable.field1=field1 and subtable1.field2=fied2 and
subtable3.field3=field3

Here on the main table with 40,000 rows, the cartesian_product_view has
100 million rows!!!!!


2)    Substitute the where clause with subquery . for example, if the
where clause is following:

where subtable1.name ='john' or subtable2.product_code ='xyz'
or subtable1.name ='tom

I will replace it with the following : (please trust me I have the code
to do the Substitution but it will be too hard to explain here)

Select distinct maintable.id where
exists (select * from subttable1 where name = 'john' and
maintable.id=subtable1.id)
or
exists (select * from subttable2 where product_code = 'xyz' and
maintable.id=subtable2.id)
or
exists (select * from subttable1 where name = 'tom' and
maintable.id=subtable1.id)
order by ...

the problem with that is the query is getting slower & slower when user
specify more search criteria, it spend more time parsing the dynamic
query than querying the database, when you specify enough criteria
(which translated into subqueries) SQL Server just gave up and throw
the following exception.


The query processor ran out of internal resources and could not produce
a query plan. This is a rare event and only expected for extremely
complex queries or queries that reference a very large number of tables
or partitions. Please simplify the query. If you believe you have
received this message in error, contact Customer Support Services for
more information.


So I am totally stuck.


Please advice.
Any suggestions and advice is greatly appreciated.
Thanks in advance.
John

Author
6 Jun 2006 10:04 PM
Erland Sommarskog
John (john***@yahoo.com) writes:
>    (Using SQL2005) I need to design the generic search utility on the
> database server (run as Web Service), client supply the search criteria
> (where clause)

The WHERE clause is not coming as a parameter, I hope?

Show quoteHide quote
> 2)     Substitute the where clause with subquery . for example, if the
> where clause is following:
>
> where subtable1.name ='john' or subtable2.product_code ='xyz'
> or subtable1.name ='tom
>
> I will replace it with the following : (please trust me I have the code
> to do the Substitution but it will be too hard to explain here)
>
> Select distinct maintable.id where
> exists (select * from subttable1 where name = 'john' and
> maintable.id=subtable1.id)
> or
> exists (select * from subttable2 where product_code = 'xyz' and
> maintable.id=subtable2.id)
> or
> exists (select * from subttable1 where name = 'tom' and
> maintable.id=subtable1.id)
> order by ...

So, the rule is that if any of the criterias match, the row should
be returned. Often dynamic searches are the other way round: all
should match.

One thing you could try is:

  Select maintable.id
  where  exists (select * from subttable1 where name = 'john' and
         maintable.id=subtable1.id)
  union
  Select maintable.id
  exists (select * from subttable2 where product_code = 'xyz' and
  maintable.id=subtable2.id)
  union
  Select maintable.id
  exists (select * from subttable1 where name = 'tom' and
  maintable.id=subtable1.id)

If the query processor chokes, you could insert ids into temp tables
as you handle the tables, one by one. (Or two by two or whatever.)

> the problem with that is the query is getting slower & slower when user
> specify more search criteria,

If more criterias means more data to search and return that may be
inevitable.


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Are all your drivers up to date? click for free checkup

Author
7 Jun 2006 2:26 PM
John
Unfortunately, the client supply the where clause, which could be
anything, such as
    (Lastname='johnson' or firstname='smith') and (city='Los
Angeles') or productid in (234,23434,33) ....
So it is very hard for me to break the search criteria into different
part and do a union considering the different logical operator
combination

Erland Sommarskog wrote:
Show quoteHide quote
> John (john***@yahoo.com) writes:
> >    (Using SQL2005) I need to design the generic search utility on the
> > database server (run as Web Service), client supply the search criteria
> > (where clause)
>
> The WHERE clause is not coming as a parameter, I hope?
>
> > 2)     Substitute the where clause with subquery . for example, if the
> > where clause is following:
> >
> > where subtable1.name ='john' or subtable2.product_code ='xyz'
> > or subtable1.name ='tom
> >
> > I will replace it with the following : (please trust me I have the code
> > to do the Substitution but it will be too hard to explain here)
> >
> > Select distinct maintable.id where
> > exists (select * from subttable1 where name = 'john' and
> > maintable.id=subtable1.id)
> > or
> > exists (select * from subttable2 where product_code = 'xyz' and
> > maintable.id=subtable2.id)
> > or
> > exists (select * from subttable1 where name = 'tom' and
> > maintable.id=subtable1.id)
> > order by ...
>
> So, the rule is that if any of the criterias match, the row should
> be returned. Often dynamic searches are the other way round: all
> should match.
>
> One thing you could try is:
>
>   Select maintable.id
>   where  exists (select * from subttable1 where name = 'john' and
>          maintable.id=subtable1.id)
>   union
>   Select maintable.id
>   exists (select * from subttable2 where product_code = 'xyz' and
>   maintable.id=subtable2.id)
>   union
>   Select maintable.id
>   exists (select * from subttable1 where name = 'tom' and
>   maintable.id=subtable1.id)
>
> If the query processor chokes, you could insert ids into temp tables
> as you handle the tables, one by one. (Or two by two or whatever.)
>
> > the problem with that is the query is getting slower & slower when user
> > specify more search criteria,
>
> If more criterias means more data to search and return that may be
> inevitable.
>
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
7 Jun 2006 9:24 PM
Erland Sommarskog
John (john***@yahoo.com) writes:
> Unfortunately, the client supply the where clause, which could be
> anything, such as
>      (Lastname='johnson' or firstname='smith') and (city='Los
> Angeles') or productid in (234,23434,33) ....
> So it is very hard for me to break the search criteria into different
> part and do a union considering the different logical operator
> combination

And who got that bright idea?

Sending WHERE clauses as parameters is a really bad idea. Either you
compose the entire SQL statement in the client. Or you have all the
SQL in the stored procedure. What you have now is the worst of both
worlds.

It will be very difficult to work well. And obvoiusly there is a risk
with SQL injection. And the procedure cache will be a waste.

Back to the drawing board, if you want to hear my opinion.



--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
8 Jun 2006 2:36 PM
John
Well, when I say I allow client to specify the where clause, that is a
just simple way to put it, actually I publish a list of client is
allowed to do search, which is different name than the database field
name, I have a table to map the field name the client is using to real
database field name ( or XML Query)
when I get the where clause I do the parsing and convert the client
supplied name into real database name (or XML Query expression), since
the parse is doing all the checks, I am not worry about the SQL
injection

You may ask why I am going through so much trouble, the reason it I am
writing a framework that needs to be extendable, which means I design
the basic infrastructure, other group could design application sitting
on top of the framework. For example, they could design their own
database, tables, as long as they following the general design, they
could reuse everything I am doing.

Erland Sommarskog wrote:
Show quoteHide quote
> John (john***@yahoo.com) writes:
> > Unfortunately, the client supply the where clause, which could be
> > anything, such as
> >      (Lastname='johnson' or firstname='smith') and (city='Los
> > Angeles') or productid in (234,23434,33) ....
> > So it is very hard for me to break the search criteria into different
> > part and do a union considering the different logical operator
> > combination
>
> And who got that bright idea?
>
> Sending WHERE clauses as parameters is a really bad idea. Either you
> compose the entire SQL statement in the client. Or you have all the
> SQL in the stored procedure. What you have now is the worst of both
> worlds.
>
> It will be very difficult to work well. And obvoiusly there is a risk
> with SQL injection. And the procedure cache will be a waste.
>
> Back to the drawing board, if you want to hear my opinion.
>
>
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
8 Jun 2006 9:34 PM
Erland Sommarskog
John (john***@yahoo.com) writes:
Show quoteHide quote
> Well, when I say I allow client to specify the where clause, that is a
> just simple way to put it, actually I publish a list of client is
> allowed to do search, which is different name than the database field
> name, I have a table to map the field name the client is using to real
> database field name ( or XML Query)
> when I get the where clause I do the parsing and convert the client
> supplied name into real database name (or XML Query expression), since
> the parse is doing all the checks, I am not worry about the SQL
> injection
>
> You may ask why I am going through so much trouble, the reason it I am
> writing a framework that needs to be extendable, which means I design
> the basic infrastructure, other group could design application sitting
> on top of the framework. For example, they could design their own
> database, tables, as long as they following the general design, they
> could reuse everything I am doing.

Now, I understand lesser and lesser. If you are parsing the input, you
should be able to break up in the input so you could try the union thing.

Looks like a very good idea to do this in a CLR procedure, by the way.

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
8 Jun 2006 11:45 PM
John
Unfortunately I did not write the parser, it is a generic component,
passing me the following information
1)    Field
2)    Operation
3)    Value
I could replace all of the value above
but the parser  does  not passing me the logic operator, even if it
does, it is going to be very hard to for me to do a union because I
could get very complicated logical operation and I could end up query
the same table multiple time for different fields. Such as
    (Lastname='johnson' or firstname='smith') and (city='Los Angeles')
or (productid in (234,23434,33) and  (Lastname='Smith' or
firstname='tom'))

also for "and" operation, I will need to do a intersection instead
of union. I do not believe SQLServer provide this function on
transaction sql.

Erland Sommarskog wrote:
Show quoteHide quote
> John (john***@yahoo.com) writes:
> > Well, when I say I allow client to specify the where clause, that is a
> > just simple way to put it, actually I publish a list of client is
> > allowed to do search, which is different name than the database field
> > name, I have a table to map the field name the client is using to real
> > database field name ( or XML Query)
> > when I get the where clause I do the parsing and convert the client
> > supplied name into real database name (or XML Query expression), since
> > the parse is doing all the checks, I am not worry about the SQL
> > injection
> >
> > You may ask why I am going through so much trouble, the reason it I am
> > writing a framework that needs to be extendable, which means I design
> > the basic infrastructure, other group could design application sitting
> > on top of the framework. For example, they could design their own
> > database, tables, as long as they following the general design, they
> > could reuse everything I am doing.
>
> Now, I understand lesser and lesser. If you are parsing the input, you
> should be able to break up in the input so you could try the union thing.
>
> Looks like a very good idea to do this in a CLR procedure, by the way.
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
9 Jun 2006 6:18 AM
jsfromynr
Hi There,

I think you can try something like this .

Select * from tableA A
Where Exists
(
Select 1 from TableB B where B.coulmn1=A.coulmn1
Union All
Select 1 from TableB B where B.coulmn2=A.coulmn2
Union All
Select 1 from TableB B where B.coulmn3=A.coulmn3
and so on........
)

Using OR in where condition is causing it to FULL SCAN the table.
also create index on columns used for comparing.


With Warm regards
Jatinder Singh

Show quoteHide quote
> Unfortunately I did not write the parser, it is a generic component,
> passing me the following information
> 1)    Field
> 2)    Operation
> 3)    Value
> I could replace all of the value above
> but the parser  does  not passing me the logic operator, even if it
> does, it is going to be very hard to for me to do a union because I
> could get very complicated logical operation and I could end up query
> the same table multiple time for different fields. Such as
>     (Lastname='johnson' or firstname='smith') and (city='Los Angeles')
> or (productid in (234,23434,33) and  (Lastname='Smith' or
> firstname='tom'))
>
> also for "and" operation, I will need to do a intersection instead
> of union. I do not believe SQLServer provide this function on
> transaction sql.
>
> Erland Sommarskog wrote:
> > John (john***@yahoo.com) writes:
> > > Well, when I say I allow client to specify the where clause, that is a
> > > just simple way to put it, actually I publish a list of client is
> > > allowed to do search, which is different name than the database field
> > > name, I have a table to map the field name the client is using to real
> > > database field name ( or XML Query)
> > > when I get the where clause I do the parsing and convert the client
> > > supplied name into real database name (or XML Query expression), since
> > > the parse is doing all the checks, I am not worry about the SQL
> > > injection
> > >
> > > You may ask why I am going through so much trouble, the reason it I am
> > > writing a framework that needs to be extendable, which means I design
> > > the basic infrastructure, other group could design application sitting
> > > on top of the framework. For example, they could design their own
> > > database, tables, as long as they following the general design, they
> > > could reuse everything I am doing.
> >
> > Now, I understand lesser and lesser. If you are parsing the input, you
> > should be able to break up in the input so you could try the union thing.
> >
> > Looks like a very good idea to do this in a CLR procedure, by the way.
> >
> > --
> > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
> >
> > Books Online for SQL Server 2005 at
> > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> > Books Online for SQL Server 2000 at
> > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
9 Jun 2006 7:14 AM
Erland Sommarskog
John (john***@yahoo.com) writes:
> Unfortunately I did not write the parser, it is a generic component,
> passing me the following information
> 1)     Field
> 2)     Operation
> 3)     Value
> I could replace all of the value above
> but the parser  does  not passing me the logic operator, even if it
> does, it is going to be very hard to for me to do a union because I
> could get very complicated logical operation and I could end up query
> the same table multiple time for different fields. Such as
>     (Lastname='johnson' or firstname='smith') and (city='Los Angeles')
> or (productid in (234,23434,33) and  (Lastname='Smith' or
> firstname='tom'))

Sounds like it's time to reconsider the solution from bottom up. Or accept
that you will never get a solution that performs well.

Not that I know that the UNION thing is going to run with lightning speed,
but the less degree of freedom you have, the poorer are you odds.

In any case, I feel that I cannot give much further advice, because there
is too much I don't know.

> also for "and" operation, I will need to do a intersection instead
> of union. I do not believe SQLServer provide this function on
> transaction sql.

INTERSECT is indeed a new operator, akin to UNION that was added in SQL
2005.

Although AND conditions are probably better implemented with regular AND
operators. They are far less devilish than the OR.



--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
10 Jun 2006 1:45 PM
--CELKO--
I think what you want is the ability to load tables with criteria and
not have to use dynamic SQL:

skill = Java AND (skill = Perl OR skill = PHP)

becomes the disjunctive canonical form:

(Java AND Perl) OR (Java AND PHP)

which we load into this table:

CREATE TABLE Query
(and_grp INTEGER NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (and_grp, skill));

INSERT INTO Query VALUES (1, 'Java');
INSERT INTO Query VALUES (1, 'Perl');
INSERT INTO Query VALUES (2, 'Java');
INSERT INTO Query VALUES (2, 'PHP');

Assume we have a table of job candidates:

CREATE TABLE Candidates
(candidate_name CHAR(15) NOT NULL,
skill CHAR(4) NOT NULL,
PRIMARY KEY (candidate_name, skill));

INSERT INTO Candidates VALUES ('John', 'Java'); --winner
INSERT INTO Candidates VALUES ('John', 'Perl');
INSERT INTO Candidates VALUES ('Mary', 'Java'); --winner
INSERT INTO Candidates VALUES ('Mary', 'PHP');
INSERT INTO Candidates VALUES ('Larry', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Larry', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Perl'); --winner
INSERT INTO Candidates VALUES ('Moe', 'PHP');
INSERT INTO Candidates VALUES ('Moe', 'Java');
INSERT INTO Candidates VALUES ('Celko', 'Java'); -- loser
INSERT INTO Candidates VALUES ('Celko', 'Algol');
INSERT INTO Candidates VALUES ('Smith', 'APL');  -- loser
INSERT INTO Candidates VALUES ('Smith', 'Algol');

The query is simple now:

SELECT DISTINCT C1.candidate_name
  FROM Candidates AS C1, Query AS Q1
WHERE C1.skill = Q1.skill
GROUP BY Q1.and_grp, C1.candidate_name
HAVING COUNT(C1.skill)
       = (SELECT COUNT(*)
            FROM Query AS Q2
           WHERE Q1.and_grp = Q2.and_grp);

You can retain the COUNT() information to rank candidates.  For example
Moe meets both qualifications, while other candidates meet only one of
the two. You can Google "canonical disjunctive form" for more details.
This is a form of relational division.
Author
7 Jun 2006 6:12 AM
Omnibuzz
If you have concurrency, then there might be a problem in reading so many
tables.
One suggestion I would give is to have a seperate stored procedure for each
of the table that is scanned and have an output param to return the ID.
And have one master SP that will call each of the SPs in a particular order.
Start with the SP where you will most likely find the result and go in a
sequence till you get an ID from one of the SP.

This way, not all tables will be read at the same time and you will have
more control over the flow.

Hope this helps.
Author
7 Jun 2006 2:32 PM
John
two problem here
1) since the client supply the where clause, which could be anything,
such as
    (Lastname='johnson' or firstname='smith') and (city='Los
Angeles') or productid in (234,23434,33) ....
that means the master SP has to dynamically call the different sp
depends on the client input.
2) also, if the slave sp returns more than one id, then the output
param will not work.


Omnibuzz wrote:
Show quoteHide quote
> If you have concurrency, then there might be a problem in reading so many
> tables.
> One suggestion I would give is to have a seperate stored procedure for each
> of the table that is scanned and have an output param to return the ID.
> And have one master SP that will call each of the SPs in a particular order.
> Start with the SP where you will most likely find the result and go in a
> sequence till you get an ID from one of the SP.
>
> This way, not all tables will be read at the same time and you will have
> more control over the flow.
>
> Hope this helps.

Bookmark and Share