Home All Groups Group Topic Archive Search About

More on querying remote server

Author
6 Apr 2006 3:55 PM
Ian Boyd
i'll just keep posting the problems, as my own archive.

SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'

Issues the remote query:
SOURCE:(servertest),
QUERY:(
    SELECT Tbl1001."TransactionGUID"
        Col1003,Tbl1001."TransactionNumber" Col1004
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
)

And tries to bring back 8.5M rows - and takes minutes to return, when really
there is only one row.

Question #1: Why doesn't it include the WHERE clause in the remote query?

If i change the query to:

SELECT TOP 1 TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'

It still issues the remote query:
SOURCE:(servertest),
QUERY:(
    SELECT
        Tbl1001."TransactionGUID" Col1003,
        Tbl1001."TransactionNumber" Col1004
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
)

But this time it only brings over 10 rows (somehow), performs a top 1
filter, and returns instantly.

Question #2: How is it bringing over 10 rows only? (hint: cursor)

If i change the query to:
SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionNumber = 9679

It issues the remote query:
SOURCE:(servertest),
QUERY:(
    SELECT
        Tbl1001."TransactionNumber" Col1004
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
    WHERE Tbl1001."TransactionNumber"=(9679)
)

which returns only one row, and returns instantly. Obviously SQL Server can
*sometimes* do the optimization, othertimes it won't.

Question #3: Why does it include the WHERE clause in the remote query.

If i change the query to:
SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionDate = '2002-06-10 08:19:10.513'

It issues the remote query:
SOURCE:(servertest),
QUERY:(
    SELECT
        Tbl1001."TransactionNumber" Col1004
    FROM "cmsarchivetraining"."dbo"."Transactions_90" Tbl1001
    WHERE Tbl1001."TransactionDate"='2002-06-10T08:19:10.513'
)

Question #4: Why does it include the WHERE clause in the remote query?

Now i create a view:
CREATE VIEW CMSArchiveTranasctions AS
SELECT *
FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90

and issue the query:

SELECT TransactionNumber
FROM CMSArchiveTransactions
WHERE TransactionNumber = 9679

It wants to issues the remote query:
SOURCE:(SERVERTEST),
QUERY:(
    SELECT
        Tbl1001."TransactionNumber" Col1005
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
)

And tries to bring back 8.5M rows - and takes minutes to return, when really
there is only one row.

Question #5: Why does it not include the WHERE clause in the remote query?

If i change the query to:
SELECT TOP 1 TransactionNumber
FROM CMSArchiveTransactions
WHERE TransactionNumber = 9679

it issues the remote query:
SOURCE:(SERVERTEST),
QUERY:(
    SELECT TOP 1 Col1004
    FROM (
        SELECT
            Tbl1001."TransactionNumber" Col1004
            FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
            WHERE Tbl1001."TransactionNumber"=(9679)
    ) Qry1018
)

Question #6: Why is it now including the TOP inside the remote query, when
my earlier issue of a TOP query didn't include the TOP limiter?

Question #7: Why is it now including the WHERE clause in the remote query,
when not including a TOP 1 it won't include the where clause limiter?

Question #8: Why does it include the WHERE clause when i have any TOP
limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
where clause when i don't?

So i try changing my query to:
SELECT TransactionNumber
FROM (
    SELECT *
    FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90) t
WHERE TransactionNumber = 9679

And it issues the remote query:
SOURCE:(SERVERTEST),
QUERY:(
    SELECT
        Tbl1001."TransactionNumber" Col1004
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
    WHERE Tbl1001."TransactionNumber"=(9679)
)

So querying through a view it will not include the WHERE clause, but if i
include the view as a derived table, it can include the WHERE clause.

Question #9: Why does it include the WHERE clause when i query through a
derived table, and not through a view?

So i try changing the query to
SELECT TransactionNumber
FROM (
    SELECT *
    FROM CMSArchiveTransactions) t
WHERE TransactionNumber = 9679

So here we are, pay attention to this one. If i attempt to query the view
directly, it doesn't get optimized. Now i am going to query my view THROUGH
a derived table. What do you think it will do?

SOURCE:(SERVERTEST),
QUERY:(
    SELECT Tbl1001."TransactionNumber" Col1004
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
    WHERE Tbl1001."TransactionNumber"=(9679)
)

It in fact now DOES include the where clause, but if i try to query my view
directly, it can't optimize it. This is a stunning development.

It means the rules are random, which makes it difficult to optimize
distributed queries.


Rules for including where clause:
    Filter on uniqueidentifer:
        Query includes TOP 1: Yes*
        Query includes TOP 2: No
        Query includes TOP n: No
        Query includes TOP n PERCENT: No
        Query doesn't include TOP: No
    Filter on integer:
        Query linked SQL Server: Yes
        Query view that queries linked SQL Server:
            Query view directly:
                Query includes TOP 1: Yes
                Query includes TOP 2: Yes
                Query includes TOP 100: Yes
                Query includes TOP 999999999: Yes
                Query includes TOP 100 PERCENT: Yes
                Query doesn't include TOP: No
            Query view through derived table:
                Query includes TOP 1: Yes
                Query includes: TOP 2: Yes
                Query includes: TOP n: Yes
                Query includes: TOP 100 PERCENT: Yes
                Query includes TOP: Yes

Author
6 Apr 2006 4:33 PM
ML
You haven't posted any DDL. Are there any indexes on the remote table? Which
columns are indexed?


ML

---
http://milambda.blogspot.com/
Author
6 Apr 2006 4:42 PM
Ian Boyd
"ML" <M*@discussions.microsoft.com> wrote in message
news:E89D60ED-EAEF-46F9-9F31-1D56842B277E@microsoft.com...
> You haven't posted any DDL. Are there any indexes on the remote table?
> Which
> columns are indexed?

Before answering that, let me ask you this: why does it matter?  It doesn't
seem to matter *sometimes*. And even if there aren't an indexes on the
remote tables, everyone can agree that sending any filtering criteria to a
remote server is faster than trying to filter rows after they've crossed a
link.

Even a table scan on a remote server is better than bringing the rows over a
network to be filtered.

Additionally, my post is not dealing with query performance. When SQL Server
does what it should do, the queries run fine. If SQL Server doesn't do what
it's supposed to do, the queries do not run fine. Indexes are not the
performance limitation here.

That having been said: yes.
Author
6 Apr 2006 5:04 PM
ML
If you think you know your data better than the optimizer does (as might be
the case - especially with outdated statistics), you could use the REMOTE
join hint to tell the optimizer to process the join on the remote server.

Basically, the optimizer can only rely on statistics. If they are missing on
the remote site, he prefers "staying local" although filtering data remotely
might have yielded better performance. Why put a stress on a remote server if
poor performance is expected anyway?

But when statistics are up to date on both sites, then the optimizer still
has full control over the execution of queries (unless specified otherwise
through the use of hints) which basically means that he can still make the
wrong assumptions - and I'm as much in the dark here as you are.


ML

---
http://milambda.blogspot.com/
Author
6 Apr 2006 5:20 PM
Ian Boyd
> Basically, the optimizer can only rely on statistics. If they are missing
> on
> the remote site, he prefers "staying local" although filtering data
> remotely
> might have yielded better performance. Why put a stress on a remote server
> if
> poor performance is expected anyway?

i guess this is where common sense gets to take a back seat.

> But when statistics are up to date on both sites, then the optimizer still
> has full control over the execution of queries (unless specified otherwise
> through the use of hints) which basically means that he can still make the
> wrong assumptions - and I'm as much in the dark here as you are.

Is there anyone who could explain the bewildering optimizer choices?
Author
6 Apr 2006 5:26 PM
Jim Underwood
I would hazard a guess that the local SQL server has no knowledge of the
statistics on the remote server and can only determine which query to send
across to the remote server based on the existence of keys (possibly indexes
or unique indexes).

The question would be what indexes exist on the remote table.  I believe if
you look at these closely your results will turn out to be much more
consistent than you think.


Show quote
"ML" <M*@discussions.microsoft.com> wrote in message
news:064A71B2-9B67-4F97-8635-6854A4B9551C@microsoft.com...
> If you think you know your data better than the optimizer does (as might
be
> the case - especially with outdated statistics), you could use the REMOTE
> join hint to tell the optimizer to process the join on the remote server.
>
> Basically, the optimizer can only rely on statistics. If they are missing
on
> the remote site, he prefers "staying local" although filtering data
remotely
> might have yielded better performance. Why put a stress on a remote server
if
> poor performance is expected anyway?
>
> But when statistics are up to date on both sites, then the optimizer still
> has full control over the execution of queries (unless specified otherwise
> through the use of hints) which basically means that he can still make the
> wrong assumptions - and I'm as much in the dark here as you are.
>
>
> ML
>
> ---
> http://milambda.blogspot.com/
Author
6 Apr 2006 5:45 PM
Ian Boyd
"Jim Underwood" <james.underwoodATfallonclinic.com> wrote in message
news:ug9G18ZWGHA.3972@TK2MSFTNGP02.phx.gbl...
>I would hazard a guess that the local SQL server has no knowledge of the
> statistics on the remote server and can only determine which query to send
> across to the remote server based on the existence of keys (possibly
> indexes
> or unique indexes).
>
> The question would be what indexes exist on the remote table.  I believe
> if
> you look at these closely your results will turn out to be much more
> consistent than you think.


Performs badly:
SELECT * FROM ViewOfTransactions
WHERE TransactionNumber = 1234

Performs goodly:
SELECT * FROM (SELECT * FROM ViewOfTransactions)
WHERE TransactionNumber = 1234

Same statistics on the remote server.
Same indexes on the remote server.
Same query.
Returning same results.

Indexes and statistics are not the problem.
Author
6 Apr 2006 6:02 PM
Jim Underwood
I don't think SQL Server will necessarily handle a view identically to an
inline view as far as optimization goes, even for local tables.  In this
case I would agree that it should (and may very well be a bug), but I
believe using views instead of inline views does affect the optimizer and
change the priority of it's processes.

The question of keys and indexes is in reference to why it sends the filter
to the remote database for TransactionNumber and TransactionDate, but not
transactionGUID.  If the indexes on all three of these fields are identical,
then you are correct, but I would be surprised to find that they are
identical.  On the other hand, data types may matter as well, although I
can't think of a reason why they would.

Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:OnxleGaWGHA.4924@TK2MSFTNGP05.phx.gbl...
>
> "Jim Underwood" <james.underwoodATfallonclinic.com> wrote in message
> news:ug9G18ZWGHA.3972@TK2MSFTNGP02.phx.gbl...
> >I would hazard a guess that the local SQL server has no knowledge of the
> > statistics on the remote server and can only determine which query to
send
> > across to the remote server based on the existence of keys (possibly
> > indexes
> > or unique indexes).
> >
> > The question would be what indexes exist on the remote table.  I believe
> > if
> > you look at these closely your results will turn out to be much more
> > consistent than you think.
>
>
> Performs badly:
> SELECT * FROM ViewOfTransactions
> WHERE TransactionNumber = 1234
>
> Performs goodly:
> SELECT * FROM (SELECT * FROM ViewOfTransactions)
> WHERE TransactionNumber = 1234
>
> Same statistics on the remote server.
> Same indexes on the remote server.
> Same query.
> Returning same results.
>
> Indexes and statistics are not the problem.
>
>
Author
6 Apr 2006 6:29 PM
Ian Boyd
> The question of keys and indexes is in reference to why it sends the
> filter
> to the remote database for TransactionNumber and TransactionDate, but not
> transactionGUID.

i will confess, that i knew the answer to filtering on uniqueidentifier
columns before posting.

Question #1 was a plant, to see who knows what they're talking about.


You can see my previous posts on remote uniqueidentifier filtering:
http://tinyurl.com/rrwtp
http://tinyurl.com/owgr9
http://tinyurl.com/ppybe
http://tinyurl.com/pwb26
Author
6 Apr 2006 7:16 PM
Jim Underwood
You should remember that I am speaking not from what I know, but from what I
can deduce based on the information provided (hence my referencing to making
guesses on what was happening).  I have found that the optimizers are too
complex to ever fully understand, and that the best we can do is find
general strategies that can sometimes make a difference in the execution
plan.

In this case, looking at the indexes is a necessary first step to
determining what the difference could be.  Excluding that as a difference,
looking at the datatypes is a logical next step.

If you are looking for answers as to exactly what the code does under the
covers, then I apologize for cluttering up the thread with theory and
guesses.  However, if you are looking for idea that might help explain what
is happening, then posting all of the information available will contribute
to the discussion (including indexes and applicable links).

Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:e22t4eaWGHA.3800@TK2MSFTNGP03.phx.gbl...
> > The question of keys and indexes is in reference to why it sends the
> > filter
> > to the remote database for TransactionNumber and TransactionDate, but
not
> > transactionGUID.
>
> i will confess, that i knew the answer to filtering on uniqueidentifier
> columns before posting.
>
> Question #1 was a plant, to see who knows what they're talking about.
>
>
> You can see my previous posts on remote uniqueidentifier filtering:
> http://tinyurl.com/rrwtp
> http://tinyurl.com/owgr9
> http://tinyurl.com/ppybe
> http://tinyurl.com/pwb26
>
>
Author
6 Apr 2006 8:54 PM
Ian Boyd
> posting all of the information available will contribute
> to the discussion (including indexes and applicable links).

1. Setting up databases on multiple servers, and linked servers, with
required volumes is something nobody here would attempt - only PSS could be
convinced to try it. Even then, making it fail in the way would be iffy.
(Just because it fails the exact same time for me, every time, every
database i have tried it on - doesn't make it repeatable).

2. The nature of the symptoms is beyond everyone except those familiar with
SQL Sever itself - not just people used to using it. None of those people
patrols these groups.

3. The only reliable way for said person to test it would be to connect to
our databases servers and run the queries for themselves - not very likely

4. People like to answer the question that wasn't being asked.
My question: "Should these queries have different execution plans?"

What they answer: "What kind of indexes and statistics can i use to ensure
that these queries don't have different execution plans?"
Author
6 Apr 2006 5:43 PM
Ian Boyd
> If you think you know your data better than the optimizer does

It seems to have no problem figuring out better data access strategies if i
rewrite my queries to something identical.

> you could use the REMOTE
> join hint to tell the optimizer to process the join on the remote server.

i'm doing a select, not a join. Plus, this isn't available in 2000.

> Basically, the optimizer can only rely on statistics.
Again, why does it then work sometimes, and not others - for the same
query - same statistics - complete or not.

> But when statistics are up to date on both sites, then the optimizer still
> has full control over the execution of queries (unless specified otherwise
> through the use of hints) which basically means that he can still make the
> wrong assumptions - and I'm as much in the dark here as you are.

i do appreciate the effort.

Is anyone out there that knows the secret way to sucessfully query remote
SQL Servers? This whole distributed federated sales pitch fails miserably
whenever anyone uses it.

And, as i've been asking for 3 or 4 years, is it a failing in the 2000
optimizer? Has it been fixed in 2005? If it has been fixed in 2005 then you
have to admit it was broken in 2000.
Author
6 Apr 2006 5:44 PM
Jim Underwood
I dont know if it is improved in 2005, but I have always considered linked
servers (DB Links in Oracle) to be miserable for performance.  They have
their uses, but performance does not apear to be one of them.

Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:OUeVDFaWGHA.1900@TK2MSFTNGP04.phx.gbl...
>
> And, as i've been asking for 3 or 4 years, is it a failing in the 2000
> optimizer? Has it been fixed in 2005? If it has been fixed in 2005 then
you
> have to admit it was broken in 2000.
>
>
Author
6 Apr 2006 7:45 PM
ML
FYI: the REMOTE join hint is available in SQL 2000 as well.

Described under "FROM" in Books Online.

And, yes it's a join hint which can only be declared when joining local and
remote objects.


ML

---
http://milambda.blogspot.com/
Author
6 Apr 2006 8:56 PM
Ian Boyd
> Described under "FROM" in Books Online.

Yeah, i was silly. i tried finding it on the web verses in the BOL.

> And, yes it's a join hint which can only be declared when joining local
> and
> remote objects.

Excellent, i can definetly make use of it when trying to join 30 local rows
to 8million remote rows - and SQL Server insists on bringing 8M over here
first.
Author
6 Apr 2006 9:15 PM
ML
It's more like a finger than an arm, isn't it...? :)

The improvement for the view in SQL 2005 is refreshing news, though. If you
don't have a blog up already, this whole experiment is definitely something
to put up. Thanks!


ML

---
http://milambda.blogspot.com/
Author
6 Apr 2006 5:25 PM
Ian Boyd
Can anyone
    ever imagine
    any situation
    under any circumstances
    in any manner
    in any capacity
    in any way
    with any product
    running on any kind of data link
    fast or slow
    on any database setup
    existing in any universe
    moving at any velocity
    at any time since the big bang itself
where it would be better to filter rows after fetching them?
Author
6 Apr 2006 5:39 PM
Jim Underwood
Absolutely.  The same way there are cases when a full table scan is better
than using an index.

If you have 1,000 rows in a table and you add a filter on and indexed field
that returns 999 records, then SQL server will do a full table scan instead
of using the index.

If you have 1,000 rows in a remote table and you add a filter that returns
999 records, let the database that is actually going to use the data spend
the cycles filtering it.  Let the remote server just do a quick IO and send
the data over the network with minimal CPU usage.

Now, if the local server has statistics on the tables in the remote
database, then it could make much better decisions regarding when to send
the filter and when to not.  Because it doesn't know the remote database, it
has no idea how many rows will be returned (unless querying on a key field
maybe?) and makes a (rather useless) judgement call.

Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:%23fsEP7ZWGHA.2064@TK2MSFTNGP03.phx.gbl...
> Can anyone
>     ever imagine
>     any situation
>     under any circumstances
>     in any manner
>     in any capacity
>     in any way
>     with any product
>     running on any kind of data link
>     fast or slow
>     on any database setup
>     existing in any universe
>     moving at any velocity
>     at any time since the big bang itself
> where it would be better to filter rows after fetching them?
>
>
Author
6 Apr 2006 6:34 PM
Ian Boyd
> If you have 1,000 rows in a remote table and you add a filter that returns
> 999 records, let the database that is actually going to use the data spend
> the cycles filtering it.  Let the remote server just do a quick IO and
> send
> the data over the network with minimal CPU usage.

i disagree. If the table contains only 1 row, perform the filter.

Even if the table contains a million rows, and your statistics tell you that
every row matches the filter, do the filter.

You're going to have to serialize each row anyway - so don't send the
non-matching ones.
Author
6 Apr 2006 6:40 PM
Jim Underwood
What you keep forgetting is that there are no statistics available until
after the query gets sent to the remote server.  The local SQL server has to
make a guess as to whether or not passing the parameter will be more
efficent than processing the entire set.  When it chooses (incorrectly in
this case) to not send the parameter, the statistics are never taken into
consideration.

Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:OvYnbhaWGHA.4148@TK2MSFTNGP03.phx.gbl...
> > If you have 1,000 rows in a remote table and you add a filter that
returns
> > 999 records, let the database that is actually going to use the data
spend
> > the cycles filtering it.  Let the remote server just do a quick IO and
> > send
> > the data over the network with minimal CPU usage.
>
> i disagree. If the table contains only 1 row, perform the filter.
>
> Even if the table contains a million rows, and your statistics tell you
that
> every row matches the filter, do the filter.
>
> You're going to have to serialize each row anyway - so don't send the
> non-matching ones.
>
>
Author
6 Apr 2006 7:05 PM
Ian Boyd
> What you keep forgetting is that there are no statistics available until
> after the query gets sent to the remote server.

i see your point: always send over as much filtering criteria as possible.

> The local SQL server has to
> make a guess as to whether or not passing the parameter will be more
> efficent than processing the entire set.  When it chooses (incorrectly in
> this case) to not send the parameter, the statistics are never taken into
> consideration.

It is always the incorrect choice to not send the parameters.
Author
6 Apr 2006 8:44 PM
Jim Underwood
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:%23ImnTzaWGHA.196@TK2MSFTNGP04.phx.gbl...
> It is always the incorrect choice to not send the parameters.
>
On this point we differ.  Although I would agree that it would be
appropriate almost every time, and as a result sending over the criteria by
default would be the logical thing to do, there would be rare cases when it
might be more efficent to do it locally.  Of course you don't write the code
to cater to the rare occurances by sacrificing the performance on the vast
majorty.

I agree that is should be sending the parameters, assuming there are not
compatability issues with them.  However, soemtimes SQL server may not be
smart enough to properly break down a query and determine which parameters
can be passed to the server and which must be applied locally.
Author
6 Apr 2006 5:26 PM
Ian Boyd
Here is some SQL to create a table named "Transactions", and will contain
some fields including "TransactionDate", "TransactionNumber",
"TransactionGUID"

USE pubs
go
DROP TABLE Transactions
go
CREATE Table Transactions (
[TransactionID] [int] NOT NULL IDENTITY,
TransactionGUID uniqueidentifier NOT NULL default newid(),
[title] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[type] [char] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[pub_id] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[price] [money] NULL ,
[advance] [money] NULL ,
[royalty] [int] NULL ,
[ytd_sales] [int] NULL ,
[notes] [varchar] (200) NULL ,
[TransactionDate] [datetime] NOT NULL DEFAULT (getdate())
)

go

INSERT INTO Transactions (title, type, pub_id, price, advance, royalty,
ytd_sales, notes, TransactionDate)
SELECT
[title],
[type],
[pub_id],
[price],
[advance],
[royalty],
[ytd_sales],
[notes],
CAST(CAST([pubdate] as real) + RAND(3234)*2000 - 1000 AS datetime)
FROM Titles
CROSS JOIN (
  SELECT (a.Number * 256) + b.Number AS Number
  FROM master..spt_values a,
   master..spt_values b
  WHERE  a.Type = 'p'
   AND b.Type = 'p') numbers


You'll have to create the linked server yourself, you'll need two servers.
You'll also have to substitute your own values for TransactionGUID,
TransactionDate, TransactionNumber
Author
6 Apr 2006 5:29 PM
Jim Underwood
I am guessing to a fair extent here, but some possible (and I think likely)
explanations for this behavior...
The main thing is, I think, that SQL Server has no statistics available when
accessing a remote database and makes a best guess based on the existence of
keys.

Incidently, when working with Oracle 8 I found similar issues with their
"database links".

>Question #1: Why doesn't it include the WHERE clause in the remote query?
No Index on transactionGUID?

>Question #2: How is it bringing over 10 rows only? (hint: cursor)
10 rows is the size of the chunks retrieved.  The top 1 is in the first ten
rows, so SQL Server doesnt bother retrieving any further data from the
remote database.

>Question #3: Why does it include the WHERE clause in the remote query.
Index or PK on TransactionNumber?

>Question #4: Why does it include the WHERE clause in the remote query?
Index or PK on TransactionDate?

>Question #5: Why does it not include the WHERE clause in the remote query?
Entire view is retrieved.  The optimizer will sometimes do this with local
views as well, depending on how they are used.

>Question #6: Why is it now including the TOP inside the remote query, when
>my earlier issue of a TOP query didn't include the TOP limiter?
>Question #7: Why is it now including the WHERE clause in the remote query,
>when not including a TOP 1 it won't include the where clause limiter?
>Question #8: Why does it include the WHERE clause when i have any TOP
>limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
>where clause when i don't?
Top says return only X rows from the view.  Because it needs to order the
results, a different execution plan is returned.  This will happen with
local views also, depending on statistics and how they are used.

>Question #9: Why does it include the WHERE clause when i query through a
>derived table, and not through a view?
The local SQL server is rewriting your select to directly access the table.
This particular SQL may look different to a human, but logically, and to SQL
Server, it is identical to your 3rd example.  This is perfectly consistent.



Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:O$7N6IZWGHA.4424@TK2MSFTNGP05.phx.gbl...
> i'll just keep posting the problems, as my own archive.
>
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
>
> Issues the remote query:
> SOURCE:(servertest),
> QUERY:(
>     SELECT Tbl1001."TransactionGUID"
>         Col1003,Tbl1001."TransactionNumber" Col1004
>     FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
>
> And tries to bring back 8.5M rows - and takes minutes to return, when
really
> there is only one row.
>
> Question #1: Why doesn't it include the WHERE clause in the remote query?
>
> If i change the query to:
>
> SELECT TOP 1 TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
>
> It still issues the remote query:
> SOURCE:(servertest),
> QUERY:(
>     SELECT
>         Tbl1001."TransactionGUID" Col1003,
>         Tbl1001."TransactionNumber" Col1004
>     FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
>
> But this time it only brings over 10 rows (somehow), performs a top 1
> filter, and returns instantly.
>
> Question #2: How is it bringing over 10 rows only? (hint: cursor)
>
> If i change the query to:
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE TransactionNumber = 9679
>
> It issues the remote query:
> SOURCE:(servertest),
> QUERY:(
>     SELECT
>         Tbl1001."TransactionNumber" Col1004
>     FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>     WHERE Tbl1001."TransactionNumber"=(9679)
> )
>
> which returns only one row, and returns instantly. Obviously SQL Server
can
> *sometimes* do the optimization, othertimes it won't.
>
> Question #3: Why does it include the WHERE clause in the remote query.
>
> If i change the query to:
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE TransactionDate = '2002-06-10 08:19:10.513'
>
> It issues the remote query:
> SOURCE:(servertest),
> QUERY:(
>     SELECT
>         Tbl1001."TransactionNumber" Col1004
>     FROM "cmsarchivetraining"."dbo"."Transactions_90" Tbl1001
>     WHERE Tbl1001."TransactionDate"='2002-06-10T08:19:10.513'
> )
>
> Question #4: Why does it include the WHERE clause in the remote query?
>
> Now i create a view:
> CREATE VIEW CMSArchiveTranasctions AS
> SELECT *
> FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90
>
> and issue the query:
>
> SELECT TransactionNumber
> FROM CMSArchiveTransactions
> WHERE TransactionNumber = 9679
>
> It wants to issues the remote query:
> SOURCE:(SERVERTEST),
> QUERY:(
>     SELECT
>         Tbl1001."TransactionNumber" Col1005
>     FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
>
> And tries to bring back 8.5M rows - and takes minutes to return, when
really
> there is only one row.
>
> Question #5: Why does it not include the WHERE clause in the remote query?
>
> If i change the query to:
> SELECT TOP 1 TransactionNumber
> FROM CMSArchiveTransactions
> WHERE TransactionNumber = 9679
>
> it issues the remote query:
> SOURCE:(SERVERTEST),
> QUERY:(
>     SELECT TOP 1 Col1004
>     FROM (
>         SELECT
>             Tbl1001."TransactionNumber" Col1004
>             FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>             WHERE Tbl1001."TransactionNumber"=(9679)
>     ) Qry1018
> )
>
> Question #6: Why is it now including the TOP inside the remote query, when
> my earlier issue of a TOP query didn't include the TOP limiter?
>
> Question #7: Why is it now including the WHERE clause in the remote query,
> when not including a TOP 1 it won't include the where clause limiter?
>
> Question #8: Why does it include the WHERE clause when i have any TOP
> limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
> where clause when i don't?
>
> So i try changing my query to:
> SELECT TransactionNumber
> FROM (
>     SELECT *
>     FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90) t
> WHERE TransactionNumber = 9679
>
> And it issues the remote query:
> SOURCE:(SERVERTEST),
> QUERY:(
>     SELECT
>         Tbl1001."TransactionNumber" Col1004
>     FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>     WHERE Tbl1001."TransactionNumber"=(9679)
> )
>
> So querying through a view it will not include the WHERE clause, but if i
> include the view as a derived table, it can include the WHERE clause.
>
> Question #9: Why does it include the WHERE clause when i query through a
> derived table, and not through a view?
>
> So i try changing the query to
> SELECT TransactionNumber
> FROM (
>     SELECT *
>     FROM CMSArchiveTransactions) t
> WHERE TransactionNumber = 9679
>
> So here we are, pay attention to this one. If i attempt to query the view
> directly, it doesn't get optimized. Now i am going to query my view
THROUGH
> a derived table. What do you think it will do?
>
> SOURCE:(SERVERTEST),
> QUERY:(
>     SELECT Tbl1001."TransactionNumber" Col1004
>     FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>     WHERE Tbl1001."TransactionNumber"=(9679)
> )
>
> It in fact now DOES include the where clause, but if i try to query my
view
> directly, it can't optimize it. This is a stunning development.
>
> It means the rules are random, which makes it difficult to optimize
> distributed queries.
>
>
> Rules for including where clause:
>     Filter on uniqueidentifer:
>         Query includes TOP 1: Yes*
>         Query includes TOP 2: No
>         Query includes TOP n: No
>         Query includes TOP n PERCENT: No
>         Query doesn't include TOP: No
>     Filter on integer:
>         Query linked SQL Server: Yes
>         Query view that queries linked SQL Server:
>             Query view directly:
>                 Query includes TOP 1: Yes
>                 Query includes TOP 2: Yes
>                 Query includes TOP 100: Yes
>                 Query includes TOP 999999999: Yes
>                 Query includes TOP 100 PERCENT: Yes
>                 Query doesn't include TOP: No
>             Query view through derived table:
>                 Query includes TOP 1: Yes
>                 Query includes: TOP 2: Yes
>                 Query includes: TOP n: Yes
>                 Query includes: TOP 100 PERCENT: Yes
>                 Query includes TOP: Yes
>
>
Author
6 Apr 2006 6:23 PM
Ian Boyd
None of this is index or statistics related. Rewriting the queries into a
different form causes SQL Server to just "figure it out." It is an optimizer
failing.

>>Question #1: Why doesn't it include the WHERE clause in the remote query?
> No Index on transactionGUID?

At least for this one i have documentation on how it's the optimizer's
fault:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1a_6oxf.asp

aka http://tinyurl.com/ekfah
Read from the 2nd paragraph under "Remote Query Execution"

So that explains Question#1. Can anyone explain questions #2 through #9?

>>Question #2: How is it bringing over 10 rows only? (hint: cursor)
> 10 rows is the size of the chunks retrieved.  The top 1 is in the first
> ten
> rows, so SQL Server doesnt bother retrieving any further data from the
> remote database.

Question #10: Why does it not do that for a TOP 2?

> Index or PK on ...
Indexed, fullscanned.

Question #12: Why does it sometimes work well, and not others, when i don't
change indexes or statistics.

>>Question #5: Why does it not include the WHERE clause in the remote query?
> Entire view is retrieved.

>>Question #6: Why is it now including the TOP inside the remote query, when
>>my earlier issue of a TOP query didn't include the TOP limiter?
>>Question #7: Why is it now including the WHERE clause in the remote query,
>>when not including a TOP 1 it won't include the where clause limiter?
>>Question #8: Why does it include the WHERE clause when i have any TOP
>>limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include the
>>where clause when i don't?
> Top says return only X rows from the view.  Because it needs to order the
> results, a different execution plan is returned.  This will happen with
> local views also, depending on statistics and how they are used.

The "entire view is returned", and yet it can also sometimes work without
returning the entire view.

e.g. Bad
    SELECT * FROM View
    WHERE ...
Good:
    SELECT * FROM (SELECT * FROM View) dt
    WHERE ...

> The local SQL server is rewriting your select to directly access the
> table.
> This particular SQL may look different to a human, but logically, and to
> SQL
> Server, it is identical to your 3rd example.

They are all identical, yet SQL Server performs then differently.

Query#1
SELECT *
FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90
WHERE TransactionNumber = 1234

is identical to

Query#2
SELECT *
FROM (
    SELECT *
    FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90) AS DerivedTable
WHERE TransactionNumber = 1234

is identical to

CREATE VIEW Transactions AS
SELECT *
FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90

Query#3
SELECT *
FROM Transactions
WHERE TransactionNumber = 1234

is identical to

CREATE VIEW Transactions AS
SELECT *
FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90

Query#4
SELECT *
FROM (
    SELECT * FROM Tranasctions) AS DerivedTable
WHERE TransactionNumber = 1234

And yet they all perform differently. Some perform exactly as you would
expect, and others are unusable. And this wide variation all happens without
changing anything on the remote server. Indexes are there, statistics are
there (or equivalently indexes are not there, statistics are not there).
It's not like i change anything on the remote server between each query
attempt.

Query 1, 2, 3 and 4 above are all the same. Why do they perform differently?
Can i get pointed to a section in the BOL that explains how to why they
could possibly perform differently. Then, i need the documentation that
explains the right one to use. Generally, any construct involved a
distributed view is bad. You have to rewrite your queries to not us a view,
but you have to queries the remote servers yourself and join the results.

Hence the half-dozen posts i've made on the subject in the last 4 years.
Author
6 Apr 2006 8:05 PM
Conor Cunningham [MS]
Hello Ian.

I'm sorry that you've been having trouble with the distributed query
feature.

I'll attempt to answer some of your questions, but I'll preface this by
saying that the reasons are not easy to explain.

First, SQL 2005 now properly remotes filters over the uniqueidentifier type.
The reasons for this are partially because we did not have statistical
support and partially due to the fact that people weren't generally doing
comparisons over these types when SQL 2000 was shipped.  We've added
statistical support for GUIDs in SQL 2005, and this allows us to make better
choices for remoting and we enabled this functionality.

In regards to some of your other questions, there are sometimes functional
reasons why we can not remote things. (ie we would not return correct
results in all cases if we did remote this element of the query).  This can
cause the query to always be executed locally.

In other cases, we didn't have the resources to get every element remoted
(and tested) before shipping.  I know quite well that this causes customer
pain, and there isn't a better answer on this.

Without debugging, it's hard to make official answers on your other
problems.  I would suspect that the TOP case would retrieve a network packet
full of rows and then perform the TOP if it is not remoted.  This might be
~10 rows in your case.  The interaction of several features is likely the
cause of this, and 10 rows vs. 1 row isn't a huge performance issue if they
always fit into 1 logical network packet.

We wrote a paper that described some of the technicals about how the
distributed query feature is implemented in SQL Server.

Here's a link to it.  I'll warn everyone that it's a research conference
paper, and it doesn't go into the practical details at the level that you
want for specific questions like "why doesn't it remote X?", but it should
give an overview about how the system works and this might help you answer
some of those specific questions you have the next time you see an issue and
want to get more insight into why..

http://citeseer.ist.psu.edu/733756.html

I'll try to post up a link to this on the Optimizer's team blog later today:
http://blogs.msdn.com/QueryOptTeam

I hope that this helps answer some of your questions.

Thanks,

Conor

Show quote
"Ian Boyd" <ian.msnews***@avatopia.com> wrote in message
news:ODrpebaWGHA.4972@TK2MSFTNGP02.phx.gbl...
> None of this is index or statistics related. Rewriting the queries into a
> different form causes SQL Server to just "figure it out." It is an
> optimizer failing.
>
>>>Question #1: Why doesn't it include the WHERE clause in the remote query?
>> No Index on transactionGUID?
>
> At least for this one i have documentation on how it's the optimizer's
> fault:
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1a_6oxf.asp
>
> aka http://tinyurl.com/ekfah
> Read from the 2nd paragraph under "Remote Query Execution"
>
> So that explains Question#1. Can anyone explain questions #2 through #9?
>
>>>Question #2: How is it bringing over 10 rows only? (hint: cursor)
>> 10 rows is the size of the chunks retrieved.  The top 1 is in the first
>> ten
>> rows, so SQL Server doesnt bother retrieving any further data from the
>> remote database.
>
> Question #10: Why does it not do that for a TOP 2?
>
>> Index or PK on ...
> Indexed, fullscanned.
>
> Question #12: Why does it sometimes work well, and not others, when i
> don't change indexes or statistics.
>
>>>Question #5: Why does it not include the WHERE clause in the remote
>>>query?
>> Entire view is retrieved.
>
>>>Question #6: Why is it now including the TOP inside the remote query,
>>>when
>>>my earlier issue of a TOP query didn't include the TOP limiter?
>>>Question #7: Why is it now including the WHERE clause in the remote
>>>query,
>>>when not including a TOP 1 it won't include the where clause limiter?
>>>Question #8: Why does it include the WHERE clause when i have any TOP
>>>limiter (even TOP 999999999, or TOP 100 PERCENT), but does not include
>>>the
>>>where clause when i don't?
>> Top says return only X rows from the view.  Because it needs to order the
>> results, a different execution plan is returned.  This will happen with
>> local views also, depending on statistics and how they are used.
>
> The "entire view is returned", and yet it can also sometimes work without
> returning the entire view.
>
> e.g. Bad
>    SELECT * FROM View
>    WHERE ...
> Good:
>    SELECT * FROM (SELECT * FROM View) dt
>    WHERE ...
>
>> The local SQL server is rewriting your select to directly access the
>> table.
>> This particular SQL may look different to a human, but logically, and to
>> SQL
>> Server, it is identical to your 3rd example.
>
> They are all identical, yet SQL Server performs then differently.
>
> Query#1
> SELECT *
> FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90
> WHERE TransactionNumber = 1234
>
> is identical to
>
> Query#2
> SELECT *
> FROM (
>    SELECT *
>    FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90) AS DerivedTable
> WHERE TransactionNumber = 1234
>
> is identical to
>
> CREATE VIEW Transactions AS
> SELECT *
> FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90
>
> Query#3
> SELECT *
> FROM Transactions
> WHERE TransactionNumber = 1234
>
> is identical to
>
> CREATE VIEW Transactions AS
> SELECT *
> FROM SERVERTEST.dbo.CMSArchiveTraining.Transactions_90
>
> Query#4
> SELECT *
> FROM (
>    SELECT * FROM Tranasctions) AS DerivedTable
> WHERE TransactionNumber = 1234
>
> And yet they all perform differently. Some perform exactly as you would
> expect, and others are unusable. And this wide variation all happens
> without changing anything on the remote server. Indexes are there,
> statistics are there (or equivalently indexes are not there, statistics
> are not there). It's not like i change anything on the remote server
> between each query attempt.
>
> Query 1, 2, 3 and 4 above are all the same. Why do they perform
> differently? Can i get pointed to a section in the BOL that explains how
> to why they could possibly perform differently. Then, i need the
> documentation that explains the right one to use. Generally, any construct
> involved a distributed view is bad. You have to rewrite your queries to
> not us a view, but you have to queries the remote servers yourself and
> join the results.
>
> Hence the half-dozen posts i've made on the subject in the last 4 years.
>
Author
6 Apr 2006 9:08 PM
Ian Boyd
> First, SQL 2005 now properly remotes filters over the uniqueidentifier
> type.

i tested this, and posted my results as a respose to my OP. It doesn't
remotely filter on a uniqueidentifier. i should preface that by saying that
SQL 2005 does not remote it - i don't know if it is important that the
remote server is not SQL 2005 (i wouldn't think it would be important)

> The reasons for this are partially because we did not have statistical
> support and partially due to the fact that people weren't generally doing
> comparisons over these types when SQL 2000 was shipped.  We've added
> statistical support for GUIDs in SQL 2005, and this allows us to make
> better choices for remoting and we enabled this functionality.

Is that why SQL 2005 isn't remotely filtering on uniqueidentifiers, the
remote data source does not have statistics available on a uniqueidentifier
column? Does SQL Server just not collect them, even if you think you have
them? Are statistics important in order to remote a filter?

> In regards to some of your other questions, there are sometimes functional
> reasons why we can not remote things. (ie we would not return correct
> results in all cases if we did remote this element of the query).  This
> can cause the query to always be executed locally.

Would it be possible that any of my examples fit into this category? i
cannot imagine how remoting more of the query can affect the results -
especially since when it does perform remoting - the results are fine.

> In other cases, we didn't have the resources to get every element remoted
> (and tested) before shipping.  I know quite well that this causes customer
> pain, and there isn't a better answer on this.

Remoting is partially done in 2000. i've been fighting with it for a few
years, and it's not done in 2005? For shame.

Show quote
> Without debugging, it's hard to make official answers on your other
> problems.  I would suspect that the TOP case would retrieve a network
> packet full of rows and then perform the TOP if it is not remoted.  This
> might be ~10 rows in your case.  The interaction of several features is
> likely the cause of this, and 10 rows vs. 1 row isn't a huge performance
> issue if they always fit into 1 logical network packet.
>
> We wrote a paper that described some of the technicals about how the
> distributed query feature is implemented in SQL Server.
>
> Here's a link to it.  I'll warn everyone that it's a research conference
> paper, and it doesn't go into the practical details at the level that you
> want for specific questions like "why doesn't it remote X?", but it should
> give an overview about how the system works and this might help you answer
> some of those specific questions you have the next time you see an issue
> and want to get more insight into why..
>
> http://citeseer.ist.psu.edu/733756.html

Any free mirrors?

> I'll try to post up a link to this on the Optimizer's team blog later
> today:
> http://blogs.msdn.com/QueryOptTeam


> I hope that this helps answer some of your questions.

Still left trying to infer things...and decipher SQL's thinking by trial and
error.
Author
6 Apr 2006 7:04 PM
Ian Boyd
OP was for 2000 talking to 2000
Now testing 2005 talking to 2000.

Test#1
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
>
> QUERY:(
>    SELECT Tbl1001."TransactionGUID"
>        Col1003,Tbl1001."TransactionNumber" Col1004
>    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )

2005: Same, fails.


Test#2
> SELECT TOP 1 TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE transactionGUID = '4956E240-8B6E-437E-B9B0-5E83FC25E3F0'
>
> QUERY:(
>    SELECT
>        Tbl1001."TransactionGUID" Col1003,
>        Tbl1001."TransactionNumber" Col1004
>    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )
> But this time it only brings over 10 rows (somehow), performs a top 1
> filter, and returns instantly.

2005: Same, acceptable fail.


Test#3
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE TransactionNumber = 9679
>
> QUERY:(
>    SELECT
>        Tbl1001."TransactionNumber" Col1004
>    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>    WHERE Tbl1001."TransactionNumber"=(9679)
> )

2005: Same, pass.


Test#4
> SELECT TransactionNumber
> FROM servertest.CMSArchiveTraining.dbo.Transactions_90
> WHERE TransactionDate = '2002-06-10 08:19:10.513'
>
> QUERY:(
>    SELECT
>        Tbl1001."TransactionNumber" Col1004
>    FROM "cmsarchivetraining"."dbo"."Transactions_90" Tbl1001
>    WHERE Tbl1001."TransactionDate"='2002-06-10T08:19:10.513'
> )
Same: Same, pass.


Test#5
> CREATE VIEW CMSArchiveTranasctions AS
> SELECT *
> FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90
>
> SELECT TransactionNumber
> FROM CMSArchiveTransactions
> WHERE TransactionNumber = 9679
>
> QUERY:(
>    SELECT
>        Tbl1001."TransactionNumber" Col1005
>    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> )

2005: FIXED!

QUERY:(
    SELECT Tbl1001."TransactionNumber" Col1005
    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
    WHERE Tbl1001."TransactionNumber"=?
)

It parameterized the query, interesting.


Test#6
> SELECT TOP 1 TransactionNumber
> FROM CMSArchiveTransactions
> WHERE TransactionNumber = 9679
>
> QUERY:(
>    SELECT TOP 1 Col1004
>    FROM (
>        SELECT
>            Tbl1001."TransactionNumber" Col1004
>            FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>            WHERE Tbl1001."TransactionNumber"=(9679)
>    ) Qry1018
> )

2005: Same, pass.


Test#7
> SELECT TransactionNumber
> FROM (
>    SELECT *
>    FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90) t
> WHERE TransactionNumber = 9679
>
> QUERY:(
>    SELECT
>        Tbl1001."TransactionNumber" Col1004
>    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>    WHERE Tbl1001."TransactionNumber"=(9679)
> )

2005: Same, pass.


Test#8
> SELECT TransactionNumber
> FROM (
>    SELECT *
>    FROM CMSArchiveTransactions) t
> WHERE TransactionNumber = 9679
>
> QUERY:(
>    SELECT Tbl1001."TransactionNumber" Col1004
>    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
>    WHERE Tbl1001."TransactionNumber"=(9679)
> )

2005: Same, pass.



Fixed: querying a remote server through a view.
Still broken: Filtering by uniqueidentifier (and presumably also bit)
Still hack: Using TOP 1 to select a single row when filtering by
uniqueidentifier.


But the view thing alone is nice. i hope it keeps working.
Author
6 Apr 2006 7:57 PM
ML
Show quote
> Test#5
> > CREATE VIEW CMSArchiveTranasctions AS
> > SELECT *
> > FROM SERVERTEST.CMSArchiveTraining.dbo.Transactions_90
> >
> > SELECT TransactionNumber
> > FROM CMSArchiveTransactions
> > WHERE TransactionNumber = 9679
> >
> > QUERY:(
> >    SELECT
> >        Tbl1001."TransactionNumber" Col1005
> >    FROM "CMSArchiveTraining"."dbo"."Transactions_90" Tbl1001
> > )
>
> 2005: FIXED!

Nice. Have you tried the view with the guid maybe?


ML

---
http://milambda.blogspot.com/
Author
6 Apr 2006 10:34 PM
Ian Boyd
> Nice. Have you tried the view with the guid maybe?

That is an excellent idea.

Unfortunaly, when i try to query the linked server from 2005 all i get is
    "An error occurred while executing batch. Error message is: Internal
connection fatal error."

Yet i can query it fine via a linked server from 2000.

So, if 2005 decides to start working again i will have an answer for you.
Otherwise, we will never know.
Author
6 Apr 2006 11:02 PM
Ian Boyd
At ML's suggestion, i've testing querying by uniqueidentifier from a 2005
machine to 2000. (i.e. 2005 to something remote)

NOTES
1. A "pass" is if the WHERE clause is passed to the remote machine.
2. A "fail" is if the 8 million rows are transferred across the network in
order to find the one matching
3. The view is defined as:
        CREATE VIEW CMSArchiveTransactions AS
        SELECT * FROM servertest.CMSArchiveTraining.dbo.Transactions_90
4. Collation compatable option on the linked server does not affect the
results.
5. It is claimed that 2005 supports remoting of uniqueidentifiers (where
2000 documented that it did not support it).
        http://blogs.msdn.com/QueryOptTeam/


Test#1
SELECT TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionGUID = '3A753496-CEB5-4DF6-8855-833BFDB22094'

Result: FAIL


Test#2
SELECT TOP 1 TransactionNumber
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionGUID = '3A753496-CEB5-4DF6-8855-833BFDB22094'

Result: FAIL* (didn't remote filter, but stopped after bringing 9 rows over)

Test#3
SELECT TransactionNumber
FROM (SELECT * FROM servertest.CMSArchiveTraining.dbo.Transactions_90) dt
WHERE TransactionGUID = '3A753496-CEB5-4DF6-8855-833BFDB22094'

Result: FAIL


Test#4
SELECT TransactionNumber
FROM CMSArchiveTransactions
WHERE TransactionGUID = '3A753496-CEB5-4DF6-8855-833BFDB22094'

Result: FAIL


Test#5
SELECT TransactionNumber
FROM (SELECT * FROM CMSArchiveTransactions) dt
WHERE TransactionGUID = '3A753496-CEB5-4DF6-8855-833BFDB22094'

Result: FAIL


Test#6
SELECT TOP 1 TransactionNumber
FROM (SELECT * FROM CMSArchiveTransactions) dt
WHERE TransactionGUID = '3A753496-CEB5-4DF6-8855-833BFDB22094'

Result: FAIL* (didn't remote filter, but stopped after bringing 9 rows over)


SUMMARY
SQL 2005 will not remote uniqueidentifiers. They don't remote queries from
SQL 2005 to SQL 2000 that contain filters comparing columns to constant
uniqueidentifier values as well.
Author
7 Apr 2006 2:09 PM
Ian Boyd
The game is: What should the remote query be?

Get ready. Go!

NOTE: These can be solved by inspection. No knowledge of DDL, indexes or
statistics is required.



Question#1: Querying a linked server
SELECT *
FROM servertest.CMSArchiveTraining.dbo.Transactions_90
WHERE TransactionGUID = '9B2D0E34-F91C-45E8-A904-1BDA0E72802B'

Answer#1:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionGUID = '9B2D0E34-F91C-45E8-A904-1BDA0E72802B'
)



Question#2: Querying a linked server through a derived table
SELECT *
FROM (SELECT * FROM servertest.CMSArchiveTraining.dbo.Transactions_90)
CMSArchiveTransactions
WHERE TransactionGUID = '9B2D0E34-F91C-45E8-A904-1BDA0E72802B'

Answer#2:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionGUID = '9B2D0E34-F91C-45E8-A904-1BDA0E72802B'
)



Question#3: Querying a linked server through a view
CREATE VIEW CMSArchiveTransactions AS
SELECT *
FROM servertest.CMSArchiveTraining.dbo.Transactions_90

SELECT *
FROM CMSArchiveTransactions
WHERE TransactionGUID = '9B2D0E34-F91C-45E8-A904-1BDA0E72802B'

Answer#3:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionGUID = '9B2D0E34-F91C-45E8-A904-1BDA0E72802B'
)



Question#4: Joining to a linked server table
SELECT CMSArchiveTransactions.*
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID

Answer#4:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
)



Question#5: Joining to a linked server with join criteria
SELECT CMSArchiveTransactions.*
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE c.CustomerID = 4463

Answer#5:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE CustomerID = 4463
)



Question#6: Joining to a linked server with join criteria
SELECT CMSArchiveTransactions.*
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE t.TransactionAmount > 1000

Answer#6:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionAmount > 1000
)



Question#7: Joining to a linked server with join criteria
SELECT CMSArchiveTransactions.*
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE t.TransactionAmount > 1000
AND c.CustomerID = 4463

Answer#7:
QUERY:(
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionAmount > 1000
    AND CustomerID = 4463
)



Question#8 Joining to a linked server with join criteria
SELECT c.CustomerName, t.TransactionAmount
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE c.CustomerID = 4463

Answer#8:
QUERY:(
    SELECT CustomerID, TransactionAmount
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE CustomerID = 4463
)



Question#9: Joining to a linked server with join criteria
SELECT c.CustomerName, t.TransactionAmount
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE c.CustomerID >= 4000
AND c.CustomerID <= 5000

Answer#9:
QUERY:(
    SELECT CustomerID, TransactionAmount
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE CustomerID >= 4000
    AND CustomerID <= 5000
)



Question#10: Joining to a linked server with aggregates
ELECT c.CustomerID, c.CustomerName,
LastCustomerTransactions.LastTransactionDate
FROM Customers c
    LEFT JOIN (
        SELECT CustomerID, MAX(TransactionDate) AS LastTransactionDate
        FROM CMSArchiveTransactions
        GROUP BY CustomerID) AS LastCustomerTransactions
    ON c.CustomerID = LastCustomerTransactions.CustomerID

Answer#10:
QUERY:(
    SELECT CustomerID, MAX(TransactionDate) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
)


Question#11 Joining to a linked server with aggregates and join criteria
SELECT c.CustomerID, c.CustomerName,
LastCustomerTransactions.LastTransactionDate
FROM Customers c
    LEFT JOIN (
        SELECT CustomerID, MAX(TransactionDate) AS LastTransactionDate
        FROM CMSArchiveTransactions
        GROUP BY CustomerID) AS LastCustomerTransactions
    ON c.CustomerID = LastCustomerTransactions.CustomerID
WHERE c.CustomerID = 4463

Answer#11:
QUERY:(
    SELECT CustomerID, MAX(TransactionDate) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE CustomerID = 4463
)



Question#12: Joining to a linked server with aggregates
SELECT c.CustomerID, c.CustomerName,
LastCustomerTransactions.LastTransactionDate
FROM Customers c
    LEFT JOIN (
        SELECT CustomerID, MAX(TransactionDate) AS LastTransactionDate
        FROM CMSArchiveTransactions
        GROUP BY CustomerID) AS LastCustomerTransactions
    ON c.CustomerID = LastCustomerTransactions.CustomerID
WHERE c.CustomerName = 'Ian Boyd'

Answer#12:
QUERY:(
    SELECT CustomerID, MAX(TransactionDate) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
)



Question#13: Joining to a linked server with aggregates
SELECT c.CustomerName, SUM(CMSArchiveTransactions.TransactionAmount)
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
GROUP BY c.CustomerName

Answer#13:
QUERY:(
    SELECT CustomerID, SUM(TransactionAmount) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    GROUP BY CustomerID
)



Question#14: Joining to a linked server with aggregates
SELECT c.CustomerName, SUM(CMSArchiveTransactions.TransactionAmount)
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE CustomerName = 'Ian Boyd'
GROUP BY c.CustomerName

Answer#14:
QUERY:(
    SELECT CustomerID, SUM(TransactionAmount) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    GROUP BY CustomerID
)



Question#15: Joining to a linked server with aggregates
SELECT c.CustomerName, SUM(CMSArchiveTransactions.TransactionAmount)
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE c.CustomerName LIKE '%ian%'
GROUP BY c.CustomerName

Answer#15:
QUERY:(
    SELECT CustomerID, SUM(TransactionAmount) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    GROUP BY CustomerID
)



Question#16: Joining to a linked server with aggregates
SELECT c.CustomerName, SUM(CMSArchiveTransactions.TransactionAmount)
FROM CMSArchiveTransactions t
    INNER JOIN Customers c
    ON t.CustomerID = c.CustomerID
WHERE CustomerName LIKE '%ian%'
AND c.CustomerID >= 4000
AND c.CustomerID <= 5000
GROUP BY c.CustomerName

Answer#16:
QUERY:(
    SELECT CustomerID, SUM(TransactionAmount) Col1042
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE CustomerID >= 4000
    AND CustomerID <= 5000
    GROUP BY CustomerID
)



Question#17: Right join to linked table
SELECT *
FROM Customers c
    RIGHT OUTER JOIN CMSArchiveTransactions t
    ON c.CustomerID = t.CustomerID

Answer#17:
QUERY: (
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
)

Question#18: Right join to linked table with left table filtering
SELECT *
FROM Customers c
    RIGHT OUTER JOIN CMSArchiveTransactions t
    ON c.CustomerID = t.CustomerID
WHERE c.CustomerID = 4463

Answer#18:
QUERY: (
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
)

Question#19: Right join to linked table with right table filtering
SELECT *
FROM Customers c
    RIGHT OUTER JOIN CMSArchiveTransactions t
    ON c.CustomerID = t.CustomerID
WHERE t.TransactionAmount >= 1000

Answer#19:
QUERY: (
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionAmount >= 1000
)


Question#20: Right join to linked table with both table filtering
SELECT *
FROM Customers c
    RIGHT OUTER JOIN CMSArchiveTransactions t
    ON c.CustomerID = t.CustomerID
WHERE c.CustomerID = 4463
AND t.TransactionAmount >= 1000

Answer#20:
QUERY: (
    SELECT *
    FROM "CMSArchiveTraining"."dbo"."Transactions_90"
    WHERE TransactionAmount >= 1000
)
Author
7 Apr 2006 2:48 PM
Ian Boyd
Extra credit questions.


Question#21:
CREATE VIEW AllTransactions AS
    SELECT t.*, 'local' AS SourceTable
    FROM LocalTransactions t

    UNION ALL

    SELECT t.*, 'remote' AS SourceTable
    FROM servertest.CMSArchiveTraining.dbo.Transactions_90

SELECT *
FROM AllTransactions
WHERE Source = 'local'

Answer#21
no remote query


Question#22
SELECT *
FROM AllTransactions
WHERE Source <> 'remote'

Answer#22
no remote query


Question#23
SELECT MAX(TranasctionDate)
FROM AllTransactions

Answer#23
QUERY(
    SELECT MAX(TransactionDate) Col1402
    FROM CMSArchiveTraining.dbo.Transactions_90)

Bonus points:
    AGGREGATE
        CONCAT
            LOCALQUERY(
                SELECT MAX(TransactionDate) Col1402
                FROM LocalTransactions)
            REMOTEQUERY(
                SELECT MAX(TransactionDate) Col1402
                FROM CMSArchiveTraining.dbo.Transactions_90)


Question#24
SELECT c.CustomerID, MAX(TransactionDate)
FROM Customers c
    INNER JOIN AllTransactions t
GROUP BY c.CustomerID

Answer#24
QUERY(
    SELECT CustomerID, MAX(TranasctionDate)
    FROM CMSArchiveTraining.dbo.Transactions_90
    GROUP BY CustomerID)

Bonus points:

INNER JOIN
    Customers
    AGGREGATE
        CONCAT
            REMOTEQUERY(
                SELECT CustomerID, MAX(TranasctionDate)
                FROM CMSArchiveTraining.dbo.Transactions_90
                GROUP BY CustomerID)
            LOCALQUERY(
                SELECT CustomerID, MAX(TransactionDate)
                FROM LocalTransactions
            GROUP BY CustomerID)

AddThis Social Bookmark Button