I will continue the common topic in SQL, string concatenation, and show how it can be handled in SQL Server. The application developer will obviously be tempted to see things in a loop and try to divide the logic row by row. After all these years with so many nice features and enhancements to allow a more suitable approach in the database, many developers are still tempted to go down at row level and open loops, transfer data in variables, take things slowly, slowly. String concatenation is still a burden for SQL, in the sense that programmers are still tempted to loop through tables and to gather strings one each other according to the required logic. These kinds of solutions are still, unfortunately, the default solutions to be used by most of the developers.
I explained how it works in Oracle and illustrate the two alternates approaches, the first one where the programmer is opening a cursor (or two) and transfer data in a row by row approach, finally getting the desired results. This was the atomic approach. I also explained the opposite approach, by which the programmer see things holistically and try to handle the string concatenation directly in a set based manner. In Oracle, there is a function named LISTAGG that allows this manipulation of string concatenation directly, without the need to follow the traditional approach of cursor, loop, transfer, loop again etc.
I want to explain the same exercise as before but in SQL Server. Despite the fact that Transact SQL is a declarative language while PL SQL is a procedural language, the style of development can be very similar. Looking at the same exercise with countries and continents we can easily see how the programmer can solve the problem trying to think non SQL and going down to the row or trying to think SQL (and holistically) and using the set based functionalities that the system offer.
Let’s take the same example that will illustrate the problem of string concatenation. For that, we imagine a simple database called geo with two tables: continents and countries. Apart from the geo tables, imagine a result table called List_Of_Countries.
Here is the the design below:
IF OBJECT_ID(N'Geo_Continents', N'U') IS NULL BEGIN CREATE TABLE Geo_Continents ( Id INT NOT NULL, Continent NVARCHAR(50) NOT NULL ); ALTER TABLE Geo_Continents ADD CONSTRAINT Pk_Geo_Continents_Id PRIMARY KEY (Id); END IF OBJECT_ID(N'Geo_Countries', N'U') IS NULL BEGIN CREATE TABLE Geo_Countries ( Id INT NOT NULL, Code NVARCHAR(2) NOT NULL, Name NVARCHAR(50) NOT NULL, Continent_Id INT NOT NULL ); ALTER TABLE Geo_Countries ADD CONSTRAINT Pk_Geo_Countries_Id PRIMARY KEY (Id); ALTER TABLE Geo_Countries ADD CONSTRAINT Uq_Geo_Countries_Id UNIQUE (Code); ALTER TABLE Geo_Countries ADD CONSTRAINT Fk_Countries_Continents FOREIGN KEY (Continent_Id) REFERENCES Geo_Continents (Id); END IF OBJECT_ID(N'List_Of_Countries', N'U') IS NULL BEGIN CREATE TABLE List_Of_Countries (Continent NVARCHAR(50), List_Of_Countries NVARCHAR(1000)); END
The tables contain the list of continents as well as some of the countries. Add the continents and countries here (see below).
INSERT INTO Geo_Continents (Id, Continent) VALUES (1, 'Europe'); INSERT INTO Geo_Continents (Id, Continent) VALUES (2, 'North America'); INSERT INTO Geo_Continents (Id, Continent) VALUES (3, 'South America'); INSERT INTO Geo_Continents (Id, Continent) VALUES (4, 'Central America'); INSERT INTO Geo_Continents (Id, Continent) VALUES (5, 'Africa'); INSERT INTO Geo_Continents (Id, Continent) VALUES (6, 'Asia'); INSERT INTO Geo_Continents (Id, Continent) VALUES (7, 'Australia'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (1, 'AR', 'Argentina', '3'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (2, 'AU', 'Australia', '7'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (3, 'BE', 'Belgium', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (4, 'BR', 'Brazil', '3'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (5, 'CA', 'Canada', '2'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (6, 'CH', 'Switzerland', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (7, 'CN', 'China', '6'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (8, 'DE', 'Germany', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (9, 'DK', 'Denmark', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (10, 'EG', 'Egypt', '5'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (11, 'FR', 'France', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (12, 'IN', 'India', '6'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (13, 'IT', 'Italy', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (14, 'JP', 'Japan', '6'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (15, 'RO', 'Romania', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (16, 'ML', 'Malaysia', '3'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (17, 'MX', 'Mexico', '4'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (18, 'NG', 'Nigeria', '5'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (19, 'NL', 'Netherlands', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (20, 'SG', 'Singapore', '6'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (21, 'UK', 'United Kingdom', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (22, 'US', 'United States of America', '2'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (23, 'ZM', 'Zambia', '5'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (24, 'LU', 'Luxembourg', '1'); INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (25, 'KR', 'South Korea', '6');
Now we have our data! Let’s take a look at our data, running a simple query. See here the data
SELECT co.Continent, cn.Code AS Country_Code, cn.Name AS Country_Name, cn.Continent_id FROM Geo_Continents co INNER JOIN Geo_Countries cn ON co.Id = cn.Continent_Id ORDER BY co.Continent,cn.Code;
See the geo dataset below:

The goal of the exercise is to concatenate the countries codes for a continent and move every continent with the list of countries separated by a comma into the result table List_Of_Countries.
So, for example, for Africa, we should have EG, NG, ZM. For Asia we should have CN, IN, JP, KR, SG etc.
Let’s try the classic manner of work, mainly specific to application developers. Look at Oracle version and check the SQL Server version. See the similarities between them. Please also see the style of development, and see the fact that the developer simply ignores the fact that he/she is in a relational database and tries to go down at the row, because of the lack of tools, or better said, due to the lack of knowledge of the tools that an engine like Oracle or SQL Server can offer. The programmer follows the classical approach.
- The classic (row, atomic, procedural approach)
The same tendency as in Oracle can be found in SQL Server , the tendency to think procedurally and atomically. The programmer is picking every continent in a cursor and transfer the continent, eventually the continent identifier. Being inside the cursor with continents, the programmer opens the second cursor, and chooses every country code for the selected continent. Afterwards, the developer concatenates one by one the country codes in a result string. Finally, the list of concatenated country codes are inserted along with the continent in the result table.
So, to conclude before viewing the traditional, atomic (row based) solution.
- Loop through continents, firstly, and take the continent into a variable.
- Loop second time through the countries, per selected continents, and concatenate the country codes. We also add commas and exclude the last one!
- Inserting the continent and the list of country codes into the result table (for every continent).
Here is the exercise:
DECLARE @v_Continent_Id INT, @v_Country_Code NVARCHAR(2), @v_Continent NVARCHAR(50), @v_Count INT, @v_Sum_Of_Countries NVARCHAR(1000); DECLARE C_Continent CURSOR FOR SELECT DISTINCT cn.Continent_Id, co.Continent FROM Geo_Continents co INNER JOIN Geo_Countries cn ON co.Id = cn.Continent_Id ORDER BY 1; BEGIN TRUNCATE TABLE List_Of_Countries; OPEN C_Continent; FETCH NEXT FROM C_Continent INTO @v_Continent_Id, @v_Continent; WHILE @@FETCH_STATUS = 0 BEGIN SELECT @v_Sum_Of_Countries = ''; SELECT @v_Count = COUNT(*) FROM Geo_Countries WHERE Continent_Id = @v_Continent_Id; IF @v_Count = 1 SELECT @v_Sum_Of_Countries = Code FROM Geo_Countries WHERE Continent_Id = @v_Continent_Id; ELSE BEGIN DECLARE C_Countries CURSOR FOR SELECT Code AS Country_Code FROM Geo_Countries WHERE Continent_Id = @v_Continent_Id ORDER BY 1; OPEN C_Countries; FETCH NEXT FROM C_Countries INTO @v_Country_Code; WHILE @@FETCH_STATUS = 0 BEGIN SELECT @v_Sum_Of_Countries = @v_Sum_Of_Countries + @v_Country_Code + ', '; FETCH NEXT FROM C_Countries INTO @v_Country_Code; END; CLOSE C_Countries; DEALLOCATE C_Countries; SELECT @v_Sum_Of_Countries = SUBSTRING(@v_Sum_Of_Countries, 1, LEN(@v_Sum_Of_Countries) - 1); END; INSERT INTO List_Of_Countries (Continent, List_Of_Countries) VALUES (@v_Continent, @v_Sum_Of_Countries); FETCH NEXT FROM C_Continent INTO @v_Continent_Id, @v_Continent; END; CLOSE C_Continent; DEALLOCATE C_Continent; SELECT * FROM List_Of_Countries ORDER BY Continent; END; GO
om the differences in the syntax and particularities of PL SQL against Transact SQL, it is easy to see the style of development. It is the same. The programmer can be inside a relational database and totally ignore that! This is what is happening now, for this exercise.
The flow is quite straight forward: take the continents, move the content in variables, one by one. Lucky us we have just a limited number of continents, so the loop is small! Afterwards, per every continent, take the countries, again one by one, and start the process of concatenation. Dedicate a result string for the result (per continent), populate that result with the countries and add a comma. The same principle of work, very intuitive and classic, but very expensive in the same time and large in terms of lines of code. However, if is to consider a serious issue, performance is clearly the first one and readability it follows, but at a great distance. I will come back to this later.
Let’s see the result after the first execution, using the atomic and procedural approach.

We obtained the desired result but with what cost? Of course, having such few data like in our example will not impact performance but imagine an average quantity of data and looping twice through it!
2. The set based approach for SQL Server, two possible solutions
As discussed earlier when showing the Oracle way of concatenate strings, by using the dedicated function LISTAGG for this purpose (of string concatenation), in SQL Server things are more or less similar. The same tendencies, for allowing more and more functionalities that allow the set based way, SQL Server is not an exception.
There is an equivalent of LISTAGG in Oracle, the function STRING_AGG. Even etymologically, the meaning is obvious. But, before showing this facility, I want to explain that this function is quite new. IT started with SQL Server 2017 (https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017).
For string concatenation, the function STRING_AGG is perfect. The definition is the same as LISTAGG function in Oracle. It is used to aggregate strings (eventually from data in columns in a database table). Same as LISTAGG, this function allows direct strings concatenation without the need for any loop or cursor or other procedural facility. Let’s see the second solution, that I call the holistic solution offered by this new SQL Server function.
DELETE List_Of_Countries; INSERT INTO List_Of_Countries (Continent, List_Of_Countries) SELECT co.Continent, STRING_AGG(cn.Code, ', ') WITHIN GROUP (ORDER BY cn.Code ASC) AS List_Of_Countries FROM Geo_Continents co INNER JOIN Geo_Countries cn ON co.Id = cn.Continent_Id GROUP BY co.Continent; SELECT * FROM List_Of_Countries ORDER BY Continent;
First of all, after executing the script above, you can easily see that the result table is the same for both approaches, so the string_agg function is able to do in one simple SQL statement what the complex logic above was doing in a quite complex logic (opening two cursors and associated loops, transferring variables, concatenating etc.).
Of course, there are a lot of examples over the Internet with STRING_AGG. We can easily go and check for others examples. What it really matters is the will to write in a certain way in a relational database and to unknowledge the paradigm. A 7 years old boy knows how to google nowadays, so trying to find out one function or another is the easy part of the problem. Being aware of a certain style of writing in the relational database is something beyond a simple search. Trying to gain a better performance by writing holistically and trying to adapt the paradigm in a relational database is something that can be achieved by any developer, if he/she realizes the importance of this paradigm.
However, the STRING_AGG is a brand new function. What about systems written in 2014, or 2012 or even earlier? Is there any option for a set based approach? There is another way of concatenating strings, the use of FOR XML PATH (https://docs.microsoft.com/en-us/sql/relational-databases/xml/for-xml-sql-server?view=sql-server-2017)
The same exercise can be done using this syntax. See the logic below.
DELETE List_Of_Countries; WITH Countries (Continent, Country_Code) AS ( SELECT co.Continent, cn.code AS Country_Code FROM Geo_Continents co INNER JOIN Geo_Countries cn ON co.Id = cn.Continent_Id ), Continents (Continent, List_Of_Countries) AS ( SELECT cc.Continent, (SELECT cn.Country_Code + ', ' FROM Countries cn WHERE cn.Continent = cc.Continent ORDER BY cn.Country_Code FOR XML PATH('')) AS List_Of_Countries FROM Countries cc GROUP BY cc.Continent ) INSERT INTO List_Of_Countries (Continent, List_Of_Countries) SELECT Continent, SUBSTRING(List_Of_Countries, 1, LEN(List_Of_Countries) - 1) AS List_Of_Countries FROM Continents; SELECT * FROM List_Of_Countries ORDER BY Continent; DELETE List_Of_Countries; WITH Countries (Continent, Country_Code) AS ( SELECT co.Continent, cn.code AS Country_Code FROM Geo_Continents co INNER JOIN Geo_Countries cn ON co.Id = cn.Continent_Id ) INSERT INTO List_Of_Countries (Continent, List_Of_Countries) SELECT cc.Continent, STUFF((SELECT ', ' + cn.Country_Code FROM Countries cn WHERE cn.Continent = cc.Continent ORDER BY cn.Country_Code FOR XML PATH('')), 1, 1,'') AS List_Of_Countries FROM Countries cc GROUP BY cc.Continent; SELECT * FROM List_Of_Countries ORDER BY Continent;
Obviously, the use of STRING_AGG seems much simpler as it actually is. However, the use of FOR XML PATH is also holistic and allow the same results. This solution is available in older versions of SQL Server and there are many examples over the internet that illustrate the string concatenation. I showed two ways of filling the result table, one with two with blocks and one with one and with the use of stuff function just to add a bit of variety, but this is not necessarily very relevant to the topic.
3. Conclusions
Nowadays, the vendors are increasing the number of feature like this. However, there is not enough argumentation in the favor of that style of programming that is using these features and is using the strength of SQL. The features exists and their number is increasing. However, programmers needs to be aware of their existence and need to know how to use these.
See the simplicity of the code in the set based approach, how simple the SQL statement is compared with the complex procedural code in the first example! Increase the quantity of data and see the difference in performance, it is huge! As I mentioned earlier, if the simplicity of the code in the set based approach is, let’s say, an esthetic and subjective argument (I saw application developers that really prefer the first approach as being intelligible for themselves); we cannot say the same about the performance! There is a significant difference in performance between the two approaches.
To conclude, the same problem, of string concatenation, can be solved in many ways. There are now more and more facilities that allow set based and holistic approach, the SQL ways. And this is very good for performance, firstly. However, readability is also a relevant matter. Apart from being aware of these facilities, we also need to be aware to adapt our style of programming and use the correct manner when developing in a relational database.


Leave A Comment