Copy
Buckle up: time to learn more about SQL Server,
or whatever I'm obsessed with this week.

How to Pass a List of Values Into a Stored Procedure

Say we have a stored procedure that queries the Stack Overflow Users table to find people in a given location. Here’s what the table looks like:

And here’s what my starting stored procedure looks like:

CREATE OR ALTER PROC dbo.usp_SearchUsersByLocation @SearchLocation NVARCHAR(40) AS
  SELECT *
    FROM dbo.Users
	WHERE Location = @SearchLocation
	ORDER BY DisplayName;
GO
EXEC usp_SearchUsersByLocation 'San Diego, CA, USA';

It works:

And the actual execution plan isn’t bad, although it manages to overestimate the number of people who live in San Diego (est 1264 rows for the top right index seek), so it ends up granting too much memory for the query, and gets a yellow bang warning on the SELECT because of that:

No big deal though – it’s fast. But now my users say they wanna find soulmates in MULTIPLE cities, not just one. They want to be able to pass in a pipe-delimited list of users and search through a few places:

EXEC usp_SearchUsersByLocation 'San Diego, CA, USA|San Francisco, CA|Seattle, WA|Los Angeles, CA';

Method #1, no good:
Joining directly to STRING_SPLIT.

SQL Server 2016 added a very nifty and handy STRING_SPLIT function that lets us parse a string with our delimiter of choice:

CREATE OR ALTER PROC dbo.usp_SearchUsersByLocation @SearchLocation NVARCHAR(MAX) AS
  SELECT *
    FROM dbo.Users
	WHERE Location IN (SELECT value FROM STRING_SPLIT(@SearchLocation,'|'))
	ORDER BY DisplayName;
GO
EXEC usp_SearchUsersByLocation 'San Diego, CA, USA|San Francisco, CA|Seattle, WA|Los Angeles, CA';

The good news is that it compiles and produces accurate results. (Hey, some days, I’ll take any good news that I can get.) The bad news is that the execution plan doesn’t look great:

  1. SQL Server starts by estimating that the STRING_SPLIT will produce 50 values. That’s a hard-coded number that has nothing to do with the actual contents of our @SearchLocation string.
  2. SQL Server estimates that it’s going to find 388 people in our locations – and that also has nothing to do with the contents of our string. Eagle-eyed readers will note that this 388 estimate is lower than the original estimate for San Diego alone!
  3. SQL Server does thousands of key lookups, and this plan gets even worse fast when you use bigger locations. It quickly reads more pages than there are in the table itself.
  4. Because of the original low 388 row estimate, SQL Server didn’t budget enough memory for the sort, which ends up spilling to disk.

The root problem here: STRING_SPLIT doesn’t produce accurate estimates for the number of rows nor their contents.

Method #2, better:
dump STRING_SPLIT into a temp table first.

This requires a little bit more work at the start of our proc:

CREATE OR ALTER PROC dbo.usp_SearchUsersByLocation @SearchLocation NVARCHAR(MAX) AS
BEGIN
  CREATE TABLE #Locations (Location NVARCHAR(40));
  INSERT INTO #Locations (Location)
    SELECT value FROM STRING_SPLIT(@SearchLocation,'|');

  SELECT *
    FROM dbo.Users
	WHERE Location IN (SELECT Location FROM #Locations)
	ORDER BY DisplayName;
END
GO
EXEC usp_SearchUsersByLocation 'San Diego, CA, USA|San Francisco, CA|Seattle, WA|Los Angeles, CA';

By dumping the string’s contents into a temp table, SQL Server can then generate statistics on that temp table, and use those statistics to help it better estimate the number of rows it’ll find in the various cities. Here’s the actual plan:

Now, the bad 50-row estimate for STRING_SPLIT has a small blast radius: the only query impacted is the insert into the temp table, which isn’t a big deal. Then when it’s time to estimate rows for the index seek, they’re much more accurate – within about 5X – and now the Sort operator has enough memory budgeted to avoid spilling to disk.

Method #3, terrible:
dump STRING_SPLIT into a table variable.

I know somebody’s gonna ask, so I have to do it:

CREATE OR ALTER PROC dbo.usp_SearchUsersByLocation @SearchLocation NVARCHAR(MAX) AS
BEGIN
  DECLARE @Locations TABLE (Location NVARCHAR(40));
  INSERT INTO @Locations (Location)
    SELECT value FROM STRING_SPLIT(@SearchLocation,'|');

  SELECT *
    FROM dbo.Users
	WHERE Location IN (SELECT Location FROM @Locations)
	ORDER BY DisplayName;
END
GO
EXEC usp_SearchUsersByLocation 'San Diego, CA, USA|San Francisco, CA|Seattle, WA|Los Angeles, CA';

In SQL Server 2017 & prior, the query plan on this manages to be even worse than calling STRING_SPLIT directly:

SQL Server now only estimates that 1 row is coming out of the table variable, so now its estimate on the number of users it’ll find is down to just 55, and the sort spills even more pages to disk.

Thankfully, as you may have heard, SQL Server 2019 fixes this problem. I’ll switch to 2019 compatibility level, and here’s the actual plan:

Okay, well, as it turns out, no bueno. We just get a 4 row estimate instead of 1 for the table variable, and we still only get 110 estimated users in those cities. Table variables: still a hot mess in SQL Server 2019.

The winner: pass in a string, but when your proc starts, split the contents into a temp table first, and use that through the rest of the query.

This week's sponsor: Don't be alone this Valentine's Day. Cozy up with this free database performance monitoring tool.


 

Agree? Disagree? Leave a comment.

 
sfs_icon_twitter.png
sfs_icon_forward.png
icon_feed.png
Copyright © 2020 Brent Ozar Unlimited®, All rights reserved.