SQL Server 2016 introduced various JSON capabilities and built in functions—an interesting one is the ability to use T-SQL to query JSON data.
Consider a use case where you have a data structure that you are persisting to the database as JSON by just serializing your models to a string and storing that in an NVARCHAR column.
If for some reason you end up needing to query this data, whether for debugging purposes, or for a feature, you can now do so via T-SQL. Previously, your only programmatic option would be to pull the data into memory, deserialize it, and then query the data. This is potentially expensive, especially if you are dealing with large amounts of data.
Let's say you have a structure as follows:
ArticleDraft maps to a table in the database. ArticleDraftContent has a `content` property with a complex structure. This complex structure has a top-level tags property, and a bunch of other stuff, maybe some complex hierarchy. Rather than representing the whole thing as tables with relationships, we choose to persist it by just serializing it to JSON.
(Some care needs to be taken to handle the versioning of ArticleDraftContent, but that is out of scope for this post.)
The SQL for the ArticleDraft table is:
We can seed it with some sample data as follows:
Now we can query that data using T-SQL.
A normal select returns what you would expect:
The basic function you need to know is OPENJSON(). This allows you to take a json string, and ‘open’ it, mapping it to an object with the schema defined in the WITH clause. You can then query that schema using T-SQL.
A simple example is to just get tags for each draft. First, we CROSS APPLY OPENJSON(draft.Content), which for each row in ArticleDraft will open the Content column. From that we pull out a name, and the complex content (of ArticleDraftContent, see the interface above). Then from that content we pull out the tags.
If we want to find which drafts are tagged with 'arcurve', we just CROSS APPLY again, and pull out each individual tag, and do a normal WHERE:
This returns duplicates, but you can handle that by adding a `GROUP BY`:
Performance considerations must be weighed with the functionality. Some aspects to consider:
- You are trading off JSON de-serialization in the middle tier (your code) against having the database do it for you. Be mindful of the time/space trade-offs and impact on your complete solution.
- NVARCHAR(max) stores data out of the row, which results in additional I/O. Consider using NVARCHAR columns if the JSON string will be predictable in length.
- Some use cases are more suited than others. Storing diagnostic, audit, or metadata information as JSON may be appropriate when it augments core relational data, but if you wind up storing all of your data in JSON format, then you probably shouldn’t be using a relational database.