By Susantha Bathige,
Understanding INNER join in detail.
The objective of this article is dig into detail of how INNER joins produce the result set. The INNER join is the common join type used in many cases. Even though it is used very often I have seen most people are not certain how it produces the result set when changing the join condition.
Before go into the article contents, I need to create the two tables below to illustrate the examples used here.
Table 1
CREATE TABLE #InnerJoinTest1
(
InnerJoinTest1_PK INT IDENTITY(1,1)
)
GO
Table 2
CREATE TABLE #InnerJoinTest2
(
InnerJoinTest2_PK int
)
GO
Since the tables above are created in tempdb, the database currently used in your query editor will not matter. Then I use the queries below to populate the created tables with some sample data.
To keep the explanation easy I used only 5 records in each table. However once you understand the theory behind it you can map it to larger tables as well. The concept is similar despite of the number records in the tables.
Populate table 1
WHILE SCOPE_IDENTITY()<5 OR SCOPE_IDENTITY() IS NULL
INSERT INTO #InnerJoinTest1
DEFAULT VALUES
Populate table 2
INSERT INTO #InnerJoinTest2
SELECT * FROM #InnerJoinTest1
Once you have executed all the queries above successfully, everything is set for our discussion.
The Scenario
I will present you all the queries which involve an INNER JOIN in advance so that you can manipulate and decide how many records each query will return. Finally you can compare your answers with the result set that actual query execution produces by the SQL Server.
Query 1
SELECT a.*,b.* FROM #InnerJoinTest1 a
INNER JOIN #InnerJoinTest2 b
ON a.InnerJoinTest1_PK=b.InnerJoinTest2_PK
Query 2
SELECT a.*,b.* FROM #InnerJoinTest1 a
INNER JOIN #InnerJoinTest2 b
ON a.InnerJoinTest1_PK>b.InnerJoinTest2_PK
Query 3
SELECT a.*,b.* FROM #InnerJoinTest1 a
INNER JOIN #InnerJoinTest2 b
ON a.InnerJoinTest1_PK>=b.InnerJoinTest2_PK
Query 4
SELECT b.*,a.* FROM #InnerJoinTest1 a
INNER JOIN #InnerJoinTest2 b
ON a.InnerJoinTest1_PK<b.InnerJoinTest2_PK
ORDER BY a.InnerJoinTest1_PK
Query 5
SELECT b.*,a.* FROM #InnerJoinTest1 a
INNER JOIN #InnerJoinTest2 b
ON a.InnerJoinTest1_PK<=b.InnerJoinTest2_PK
ORDER BY a.InnerJoinTest1_PK
Query 6
SELECT a.*,b.* FROM #InnerJoinTest1 a
INNER JOIN #InnerJoinTest2 b
ON a.InnerJoinTest1_PK<>b.InnerJoinTest2_PK
Take a piece of paper and manually determine the result set of each query will return.
Cartesian Product
First of all, I would like to mention that I will consider table 1 as set 1 and table 2 as set 2. (according to the set theory).
Let's produce the Cartesian product of the two sets. The Cartesian product of the two sets is given below.
You can produce the Cartesian product when you join every row of one table to every row of another table. You can also get one by joining every row of a table to every row of itself.
It is very important to understand how the result set is produced for the Cartesian product from the queries above. The Cartesian product will give you the maximum possible number of combinations that can be created from the given two sets. The number of records in the Cartesian product can be computed by using the following formula.
Number of records = No.of records of table 1 X No.of record of table2
In our case it is 25. (5x5)
Rationalization
Let's look at the first query. This kind of INNER JOIN is the most popular and often you may be able to determine the result set it produces. This INNER JOIN with the equal condition will give you the matching records of both sets. If you look at the above Cartesian product, how many matching records you will be able to find?
I have highlighted the matching instances in the above Cartesian product. So that the first query will give you the highlighted records as the output.
Then consider the second query.
In this case you need to find all the possible record combinations where the first table value is greater than the second table values.
the image below shows the Cartesian product and I have highlighted the record combination for query 2.
Execute query 2 and compare the result set with the highlighted records shown above. The actual query execution result should match with the above highlighted result set.
Applying the same theory for rest of the queries, you should be able to figure out the result set of each one.
Let's take query 3. Again I'm using the Cartesian product to determine the result set. This query will produce the records where InnerJoinTest1_PK is greater than or equal to the InnerJoinTest2_PK.
It is quite easier now to determine the result set if you consider the Cartesian product of two tables.
Let's look at the next two queries. (Query 4 &5) The Cartesian product below is shown the result set of both queries 4 and 5.
The green highlighted records are the output of query 4 where as query 5 will give you the combination of green highlighted and yellow highlighted records. Because the difference of query 4 and 5 is the '=' operator. So that the query 5 will output equal values of both columns in addition to the output of the query 4. The query 4 will produce the records where InnerJoinTest1_PK is less than the InnerJoinTest2_PK of the table 2.
The order of results from the actual execution of this query may be different with the Cartesian product. But it is not really a matter for our discussion. You use simply an ORDER BY clause to order the result set as you desire.
Finally consider the last query. (Query 6) This query will output the records where both columns are not equal. Again look at out most important Cartesian product shown below.
The green highlighted records show where both columns are not equal. Theoretically this result set is similar to Cartesian product - intersection. The intersection is the INNER JOIN result with equal operator. (Output of the query 1)
source http://www.sqlservercentral.com/articles/T-SQL/63322/
Kamis, Juli 10, 2008
Understanding INNER join in detail
Selasa, Juli 08, 2008
PIVOT working incorrectly?
By Ryan Clare
In some situations there may be a reason to total data alongside a column you want to pivot. In this example we will get a grand total of saverPoints that each salesperson earns with their sales as well as separating the number of sales by month.
create table #tmpPivotTest
(salesPersonsName varchar(25),
numberOfSales int,
monthAbv varchar(3),
saverPoints int)
insert into #tmpPivotTest values ('Yagnesh', 1, 'Jan', 1)
insert into #tmpPivotTest values ('Yagnesh', 2, 'Feb', 2)
insert into #tmpPivotTest values ('Yagnesh', 2, 'Feb', 2)
insert into #tmpPivotTest values ('Yagnesh', 2, 'Jan', 2)
insert into #tmpPivotTest values ('Jason', 3, 'Feb', 2)
insert into #tmpPivotTest values ('Jason', 1, 'Feb', 1)
--Shows how many sales each sales person gets per month
select salesPersonsName, [Jan], [Feb] from
(select salesPersonsName, numberOfSales, monthAbv from #tmpPivotTest) as SourceTable
PIVOT (sum(numberOfSales) for monthAbv in ([Jan],[Feb]) )as PivotTable
salesPersonsName Jan Feb
------------------------- ----------- -----------
Jason NULL 4
Yagnesh 3 4
This basic pivot works flawlessly. When we try to add another column to the select we don't get everything.
--Try to get the saver points as well
select salesPersonsName, [Jan], [Feb], saverPoints from
(select * from #tmpPivotTest) as SourceTable
PIVOT (sum(numberOfSales) for monthAbv in ([Jan], [Feb]) )as PivotTable
salesPersonsName Jan Feb saverPoints
------------------------- ----------- ----------- -----------
Jason NULL 1 1
Yagnesh 1 NULL 1
Jason NULL 3 2
Yagnesh 2 4 2
We are missing some of the saver points in here and it splits the sales person with each distinct value of saverPoints. For some reason pivot isn't working correctly! It's actually losing data! Lets try to sum the values together again...
select salesPersonsName, sum([Jan]) as Jan, sum([Feb]) as Feb, sum(saverPoints) as saverPoints from
(select * from #tmpPivotTest) as SourceTable
PIVOT (sum(numberOfSales) for monthAbv in ([Jan], [Feb]) )as PivotTable
group by salesPersonsName
select salesPersonsName, sum(saverPoints) as saverPoints from #tmpPivotTest group by salesPersonsName
salesPersonsName Jan Feb saverPoints
------------------------- ----------- ----------- -----------
Jason NULL 4 3
Yagnesh 3 4 3
salesPersonsName saverPoints
------------------------- -----------
Jason 3
Yagnesh 7
The saver points are still missing. The saver points are ignored since they are the same on both rows. If we change the rows ever so slightly we can get the correct answer.
select salesPersonsName, sum([Jan]) as Jan, sum([Feb]) as Feb, sum(saverPoints) as saverPoints from
(select *, newid() as superDecombobulatorV2
from #tmpPivotTest) as SourceTable
PIVOT (sum(numberOfSales) for monthAbv in ([Jan], [Feb]) )as PivotTable
group by salesPersonsName
select salesPersonsName, sum(saverPoints) as saverPoints from #tmpPivotTest group by salesPersonsName
salesPersonsName Jan Feb saverPoints
------------------------- ----------- ----------- -----------
Jason NULL 4 3
Yagnesh 3 4 7
salesPersonsName saverPoints
------------------------- -----------
Jason 3
Yagnesh 7
After being frustrated with pivot not working "correctly" and finally looking at books online with a cool head. We find that it is working according to the documentation in books online. From books online:
The following is annotated syntax for PIVOT.
SELECT <non-pivoted column> ,
[first pivoted column] AS <column name> ,
[second pivoted column] AS <column name> ,
...
[last pivoted column] AS <column name>
FROM
( <SELECT query that produces the data> )
AS <alias for the source query>
PIVOT
(
<aggregation function>( <column being aggregated> )
FOR
[<column that contains the values that will become column headers>]
IN ( [first pivoted column] , [second pivoted column] ,
... [last pivoted column] )
) AS <alias for the pivot table>
<optional ORDER BY clause>
Notice it only has one non-pivoted column and each value of the pivoted column listed. If I'd of had been level headed instead of trying to force pivot to work I would have just done two queries. One to get the pivoted data into a temp table, and the other to get the summed data and update the temp table. So let the lesson be learned. You can sometimes force your way through "incorrect" things in SQL Server, but it may just be easier to calm down and read the documentation.
PS: Don't forget to clean up
drop table #tmpPivotTest
source : http://www.sqlservercentral.com/articles/pivot/62808/