Apr
18
2008
One of the most important things I am constantly learning is that it is critical to think outside of the box. This ability is unfortunately limited to your own scope of experience and knowledge. That's why working alone is dangerous! I have been there more times than I dare count in my experience... It is always cool to be able to throw in an assist to others and prove what the books have taught you in a real life scenario. Yesterday, I had that great opportunity with a SQL statement that a coworker was struggling with. This guy a slouch by any means of the definition, considering the fact that a Ph.D follows his title will prove that!
To the point, the SQL WHERE IN statement. I have read and been told many times that performance exponentially worsens when the quantity of values in the "IN" statement grow. This statement that was brought to me for assistance was querying data by linking 3 tables together;
Pretty standard setup as far as databases are concerned, mind you this is a shortened version for brevity, the SampleResult table has 15 columns for example. Well, the query was just as "Plain James" as well;
The Sample table consists of over 250,000 records, and the SampleResults table weights in at 1.4 million records. So... How long would it take to run this query? Try 40 seconds... Changing the formatting around to use INNER JOINS made it look clean (IMO) but of course, rendered the exact results, 40 seconds, since it is the same. Creating a view didn't help either. Now, keep in mind, these tables all have nice and appropriate indexes on them with even rebuilding the indexes and statistics, 40 seconds.
Why you ask? Well, the pesky SQL WHERE IN clause, that's why! If you remove some of the values in the IN collection, things speed up, but not by much...
So, basically, we just created a simple table variable, populated it with the same values from our original IN statement.
Then, used an INNER JOIN statement on the @t table object joining on the WellId and bingo, a simple spin on the original query.
A lot more code to set up, but what do you want to guess was the performance boost???
Try 97.5% performance increase (if my math is right... difference/ original). We went from 40 seconds to less than 1 second... As a matter of a fact I could run it twice and still land under a second... If that is not proof of concept and putting what you sit and read on blogs and books into reality, then I don't know what is... As I stated, there is a little bit more code involved, and in the real implementation the values, or WellId's are always dynamic, a little love in an stored procedure and all runs great.
Lesson here... Think alternatively. Additionally, if you get stuck, consult someone else and pick their brain to see if another perception can assist.