Copy
Buckle up: time to learn more about SQL Server,
or whatever I'm obsessed with this week.

Updated Stack Overflow Public Data Set for June 2019

Taryn and the kind folks at Stack Overflow have updated their public XML data dump for June, so I’ve imported that into an updated sample database for your blogging and presenting satisfaction.

You can download the 40GB torrent (magnet) and it expands to a ~350GB SQL Server 2008 database. Because it’s so large, we only distribute it with BitTorrent – if you’re new to that, here are more detailed instructions.

Fun facts about this month’s release:

  • The Votes table is up to 172,502,324 rows, but only takes 6.2GB space (since it’s fairly narrow.)
  • The PostHistory table, on the other hand, only has 118,390,637 rows, but consumes 196GB (185GB of which is off-row text data.)
  • The Users table finally broke 8 digits: it’s got 10,528,666 rows, and is still a nice tidy 1.3GB (it’s wide, but most people don’t populate much in the text fields like Location, WebsiteUrl, AboutMe.)

I’m torn about whether or not I’ll distribute the next one in SQL Server 2008 format, or start using SQL Server 2012. The VM I use to build the database has 2008, so it’s not like it costs me extra work to continue using 2008. Plus, you can still attach this in 2019 – gotta love how robust SQL Server’s file handling is. Is there a reason I should change to distributing the next one in 2012 format instead?

This week's sponsor: Free forever SQL Server monitoring with Spotlight Cloud.


 

Agree? Disagree? Leave a comment.

 
Free forever monitoring with Spotlight Cloud.
June 8: SQLSat South Florida

June 11-13 Orlando: 
SQL Intersection

June 17-19 online:
Mastering Index Tuning

August 1 online, new class:
Fundamentals of Query Tuning
sfs_icon_twitter.png
sfs_icon_forward.png
icon_feed.png
Copyright © 2019 Brent Ozar Unlimited®, All rights reserved.