Monday, June 25, 2018

Denormalize SQL Server data

Using FOR XML PATH and STRING_AGG() to denormalize SQL Server data


Problem
Converting a set of normalized data (a one-to-many relationship), and flatten that data into a single, comma-separated string of phone numbers for each user in SQL Server?
Solution

FOR XML PATH and STRING_AGG()

SQL Server has two great methods for grouped concatenation: STRING_AGG()( SQL Server 2017)   and FOR XML PATH. 
Tables and sample data:
CREATE TABLE dbo.Users
(
  UserID int CONSTRAINT PK_Users PRIMARY KEY,
  Name   sysname CONSTRAINT UQ_UserName UNIQUE
);
GO

CREATE TABLE dbo.UserPhones
(
  UserID      int CONSTRAINT FK_UserPhones_Users 
              FOREIGN KEY REFERENCES dbo.Users(UserID),
  PhoneType   varchar(4) NOT NULL,
  PhoneNumber varchar(32) NOT NULL
);
GO

INSERT dbo.Users(UserID, Name) VALUES
(1,N'John Doe'),(2,N'Jane Doe'),(3,N'Anon E. Mouse');
INSERT dbo.UserPhones(UserID, PhoneType, PhoneNumber)
VALUES(1,'Home','123-456-7890'),(1,'Cell','456-789-1234'),
      (2,'Work','345-678-1291'),(2,'Cell','110-335-6677');
GO   
--------------------------------------------------------------------------------------------------
Output
Name              PhoneNumbers
----------------  ------------------------------------
Anon E. Mouse    
Jane Doe          Cell 110-335-6677, Work 345-678-1291
John Doe          Cell 456-789-1234, Home 123-456-7890 

Using FOR XML PATH in  versions earlier than SQL Server 2017 to get comma separated string


SELECT 
  u.Name,
  PhoneNumbers = 
  (
    SELECT ', ' + p.PhoneType + ' ' + p.PhoneNumber
      FROM dbo.UserPhones AS p
      WHERE p.UserID = u.UserID
      FOR XML PATH('')
  )
FROM dbo.Users AS u
ORDER BY u.Name;

Output:
phone numbers
The rows are ordered correctly, but the phone numbers are not listed alphabetically.   NULL value is returned for the first row, whereas the desired result lists that as an empty string.  We’ve concatenated  the results with a leading comma.
STUFF()  is used to replace the first two characters in the concatenated string with an empty string:
SELECT 
  u.Name, 
  PhoneNumbers = STUFF
  (
    (
      SELECT ', ' + p.PhoneType + ' ' + p.PhoneNumber
        FROM dbo.UserPhones AS p 
        WHERE p.UserID = u.UserID
        FOR XML PATH('')
    ), 1, 2, N''
  )
FROM dbo.Users AS u
ORDER BY u.Name;

Output
phone numbers
COALESCE()  to replace that NULL with an empty string:
SELECT 
  u.Name, 
  PhoneNumbers = COALESCE(STUFF
  (
    (
      SELECT ', ' + p.PhoneType + ' ' + p.PhoneNumber
        FROM dbo.UserPhones AS p 
        WHERE p.UserID = u.UserID
        ORDER BY p.PhoneType
        FOR XML PATH('')
    ), 1, 2, N''
  ), N'')
FROM dbo.Users AS u
ORDER BY u.Name;
Output
name
STRING_AGG() WITHIN GROUP()  in SQL Server 2017 or  Azure SQL Database accomplishes the task of   FOR XML PATH,STUFF, COALESCE() 
SELECT 
  u.Name, 
  PhoneNumbers = STRING_AGG(CONCAT(p.PhoneType, ' ', p.PhoneNumber), ', ')
                 WITHIN GROUP (ORDER BY p.PhoneType)
FROM dbo.Users AS u
LEFT OUTER JOIN dbo.UserPhones AS p
ON u.UserID = p.UserID
GROUP BY u.Name
ORDER BY u.Name; 
Output:
phone numbers
Performance wise FOR XML PATH query is much more expensive  than STRING_AGG() WITHIN GROUP() due to both using more CPU and  requiring more I/O:

No comments:

Post a Comment