Like in mathematics, SQL exercises can accept many solutions for the same problem. Let’s imagine we want to calculate the sum of first 4 numbers (1+2+3+4). We can add numbers consecutively like 1 plus 2 equals 3, then  3 plus 3 equals 6, and 6 plus 4 equals 10. Or we can apply the inductive formula (n * (n+1)/2) that means 4*5/2 equals 10. Of course, mathematics is not SQL but analogies may happen. After all, SQL itself is more or less a practical exemplification of relational algebra.

So, in the example above, we can consider a loop when summarizing the first n natural numbers and add number by number in that loop until we gained the final result. Loops are one of the common methods in any programming language. On the other hand, one disadvantage of loops is, as we all know, performance, as loops are quite expensive and is better to avoid them, whenever we find better ways to handle things.  In my experience, whenever I was able to eliminate some loops I was happier than before!

I would like to cover a common topic in SQL, string concatenation. Until recently, string concatenation was a burden for SQL, in the sense that programmers were forced to loop through tables and to link string one each other according to the required logic. These kinds of solutions are still very common and the world is still full of databases and logics where the problems are solved in this manner: cursors, loops, switch to variables, concatenate in intermediate strings etc.

Let’s take the example that will illustrate the problem of string concatenation. For that, we imagine a simple database called geo with two tables: continents and countries. Apart from the geo tables, imagine a result table called List_Of_Countries.

So, let’s see the design below:

CREATE TABLE Geo_Continents
(
    Id                      INT NOT NULL,
    Continent               VARCHAR(50) NOT NULL    
);

ALTER TABLE Geo_Continents ADD CONSTRAINT Pk_Geo_Continents_Id PRIMARY KEY (Id);

CREATE TABLE Geo_Countries
(
    Id                      INT NOT NULL, 
    Code                    VARCHAR(2) NOT NULL,
    Name                    VARCHAR(50) NOT NULL,
    Continent_Id            INT NOT NULL   
);

ALTER TABLE Geo_Countries ADD CONSTRAINT Pk_Geo_Countries_Id PRIMARY KEY (Id);

ALTER TABLE Geo_Countries ADD CONSTRAINT Uq_Geo_Countries_Id UNIQUE (Code);

ALTER TABLE Geo_Countries ADD CONSTRAINT Fk_Countries_Continents FOREIGN KEY (Continent_Id) REFERENCES Geo_Continents (Id);

DROP TABLE List_Of_Countries;

CREATE TABLE List_Of_Countries (Continent VARCHAR(50), List_Of_Countries VARCHAR(1000));

The tables contain the list of continents as well as some of the countries (for simplicity, just a reduced number of countries). Add the continents and countries here.

INSERT INTO Geo_Continents (Id, Continent) VALUES (1, 'Europe');
INSERT INTO Geo_Continents (Id, Continent) VALUES (2, 'North America');
INSERT INTO Geo_Continents (Id, Continent) VALUES (3, 'South America');
INSERT INTO Geo_Continents (Id, Continent) VALUES (4, 'Central America');
INSERT INTO Geo_Continents (Id, Continent) VALUES (5, 'Africa');
INSERT INTO Geo_Continents (Id, Continent) VALUES (6, 'Asia');
INSERT INTO Geo_Continents (Id, Continent) VALUES (7, 'Australia');

INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (1, 'AR', 'Argentina', '3');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (2, 'AU', 'Australia', '7');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (3, 'BE', 'Belgium', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (4, 'BR', 'Brazil', '3');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (5, 'CA', 'Canada', '2');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (6, 'CH', 'Switzerland', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (7, 'CN', 'China', '6');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (8, 'DE', 'Germany', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (9, 'DK', 'Denmark', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (10, 'EG', 'Egypt', '5');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (11, 'FR', 'France', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (12, 'IN', 'India', '6');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (13, 'IT', 'Italy', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (14, 'JP', 'Japan', '6');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (15, 'RO', 'Romania', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (16, 'ML', 'Malaysia', '3');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (17, 'MX', 'Mexico', '4');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (18, 'NG', 'Nigeria', '5');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (19, 'NL', 'Netherlands', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (20, 'SG', 'Singapore', '6');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (21, 'UK', 'United Kingdom', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (22, 'US', 'United States of America', '2');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (23, 'ZM', 'Zambia', '5');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (24, 'LU', 'Luxembourg', '1');
INSERT INTO Geo_Countries (Id, Code, Name, Continent_Id) VALUES (25, 'KR', 'South Korea', '6');

Now we have our data! Let’s take a look at our data, running a simple query.

SELECT co.Continent, cn.Code AS Country_Code, cn.Name AS Country_Name, cn.Continent_id

FROM Geo_Continents co INNER JOIN Geo_Countries cn

   ON co.Id = cn.Continent_Id

ORDER BY co.Continent,cn.Code;  

See the geo dataset below:

The goal of the exercise is to concatenate the countries codes for a continent and move every continent with the list of countries separated by a comma into the result table List_Of_Countries.

So, for example, for Africa, we should have EG, NG, ZM. For Asia we should have CN, IN, JP, KR, SG etc.

Let’s try the classic manner of work, mainly specific to application developers.

1)   The procedural approach

The tendency, very often, is to think procedurally and atomically. Take every continent in a cursor and transfer the continent, eventually the continent identifier. Being inside the cursor with continents, open the second cursor, and pick every country code for the selected continent. Afterwards, concatenate one by one the country codes in a result string. Finally, the list of concatenated country codes are inserted along with the continent in the result table.

So, to conclude before viewing the traditional, atomic solution.

  1. We loop through continents, firstly, and take the continent into a variable.
  2. We loop second time through the countries, per selected continents, and concatenate the country codes.   We also add commas and exclude the last one!
  3. We insert the continent and the list of country codes into the result table (for every continent).

Let’s see the exercise:

DECLARE 
	v_Continent_Id				INT;
	v_Country_Code				VARCHAR2(2);
	v_Continent					VARCHAR2(50);

	v_Sum_Of_Countries VARCHAR(1000);

	CURSOR C_Continent IS 
	SELECT DISTINCT cn.Continent_Id, co.Continent
	FROM Geo_Continents co INNER JOIN Geo_Countries cn
	ON co.Id = cn.Continent_Id
	ORDER BY 1;

	CURSOR C_Countries (p_Continent_Id INT) IS
	SELECT Code AS Country_Code
	FROM Geo_Countries
	WHERE Continent_Id = p_Continent_Id
	ORDER BY 1;
BEGIN
	DELETE List_Of_Countries;
	OPEN C_Continent;

	LOOP
		FETCH C_Continent INTO v_Continent_Id, v_Continent;

		EXIT WHEN C_Continent%NOTFOUND;

		v_Sum_Of_Countries := '';

		OPEN C_Countries (v_Continent_Id);

		LOOP
			FETCH C_Countries INTO v_Country_Code;

			EXIT WHEN C_Countries%NOTFOUND;

			v_Sum_Of_Countries := v_Sum_Of_Countries || v_Country_Code || ', ';
		END LOOP;

		CLOSE C_Countries;

	v_Sum_Of_Countries := SUBSTR(v_Sum_Of_Countries, 1, LENGTH(v_Sum_Of_Countries) - 2);

	INSERT INTO List_Of_Countries (Continent, List_Of_Countries)
	VALUES (v_Continent, v_Sum_Of_Countries);

	COMMIT;

	END LOOP;

	CLOSE C_Continent;

	COMMIT;
END;
/


This solution is common to an application developer not familiarized with the SQL way and with the dataset manipulations. The fact that the programmer opened two loops does not seems to bother because it seems somehow intuitive to  a certain degree. Not thinking SQL is the main reason of this approach. I call this approach procedural and atomic because the programmer is moving atomically to every continent and, being inside this first loop, open the second one and take the countries per continent and concatenate. The logic is also highly procedural. The programmer is visualizing one continent and one country, he is going down to the lowest level of detail possible

We do not need to forget that database languages, like PL SQL or others, are a mixture of instructions. Looking at PL SQL, and others similar languages we can say these are all firstly SQL languages, so they contain the classic set of SQL statements. What comes after make the structured side of the language,   and are simply extensions to SQL. These are not meant to be used as a main facility but an alternate facility! In other words, do not use PL SQL if you can use SQL, so do not use cursors and records and others procedural objects if you can use simply SQL. In most of the cases, SQL is faster and simpler. And, if simplicity is something visible to programmers eyes and we can live with that we cannot say the same thing about performance. Performance is visible in the users eyes and this is an issue we can’t ignore , after all.

Let’s see the result after the first execution, using the atomic and procedural approach.

We obtained the desired result but with what cost? Of course, having such few data like in our example will not impact performance but imagine an average quantity of data and looping twice through it!

2)       The set based approach

Among others tendencies, there is a clear one for adding more and more  functionalities that will allow the programmer to work set based and holistic instead of working procedurally and atomically. The set of analytic functions are one of the best examples. Almost all the database systems include more and more features that allow the work in a set based and holistic manner and avoid the inefficient and underperforming style  of work described before.

For string concatenation, the function LISTAGG is excellent. Let’s see a definition: LISTAGG function in DBMS is used to aggregate strings from data in columns in a database table. It makes it very easy to concatenate strings. It is similar to concatenation but uses grouping. (see https://www.geeksforgeeks.org/sql-listagg/). And indeed, this function allows direct strings concatenation without the need for any loop or cursor or other procedural facility. Let’s see the second solution, that I call the holistic solution offered by this magic Oracle function.

DELETE List_Of_Countries;

INSERT INTO List_Of_Countries (Continent, List_Of_Countries)
SELECT Continent, List_Of_Countries
FROM 
(
	SELECT co.Continent, LISTAGG(cn.Code, ', ') WITHIN GROUP (ORDER BY cn.Code) OVER (PARTITION BY co.Continent) AS List_Of_Countries
	FROM Geo_Continents co INNER JOIN Geo_Countries cn
		ON co.Id = cn.Continent_Id

) c
GROUP BY Continent, List_Of_Countries
ORDER BY 1;

COMMIT;

First of all, after executing the script above, you can easily see that the result table is the same for both approaches, so the listagg function is able to do in one simple SQL statement  what the complex logic above was doing in an entire adventure (opening two cursors and associated loops, transferring variables, concatenating etc.).

3)      Conclusions

Nowadays, the vendors are increasing the number of feature like this. However, there is not enough argumentation in the favor of that style of programming that is using these features and is using the strength of SQL. The features exists and their number is increasing. However,  programmers needs to be aware of their existence and need to know how to use these.

See the simplicity of the code in the set based approach, how simple the SQL  statement is compared with the complex procedural code in the first example! Increase the quantity of data and see the difference in performance, it is huge! As I mentioned earlier, if the simplicity of the code in the set based approach is, let’s say, an esthetic and subjective argument (I saw application developers  that really prefer the first approach as being intelligible for themselves); we cannot say the same about the performance because here we are talking about customers and users!

And, I want to add one more reflection. Someone can say that the difference is made by one function: LISTAGG. This is a relatively new function (Oracle 11G release 2). So, before that, we need other ways. Being aware of these functions is, of course, important. But this is just one part of the equation. The most important part is to adapt to the set based approach while manipulating data with SQL and to try to find the best ways to avoid these atomic, non SQL approach that is simply not suitable for data manipulation. Even if it might seems simpler to understand for the classic programmer, not used with SQL way, the performance and the simplicity are clearly on the SQL side.