Learning SQL as Data Scientist — II

Mayur Jain
4 min readMay 31, 2020

--

In continuation with Learning SQL as Data Scientist, i am sharing the second set of queries, which will help you in improving your SQL skill and perform basic data analytics from database point of view.

Data Analytics

Let’s kick start it !

Aggregate— is a simple term meaning combining a list of elements and quantifying it with a value.

fake_apps Table

I am using a fake_apps table, which contains fake mobile applications data with columns as app_name, price, category and downloads.

Lets Jump into SQL keywords

Count — It calculates how many rows are in a table.

SELECT COUNT(*)
FROM fake_apps;

Query to check, how many free apps are in the table?

SELECT COUNT(*)
FROM fake_apps
WHERE price = 0.0;

SUM — It performs the mathematical addition.
Query to check, What is the total number of downloads for all of the apps combined?

SELECT SUM(downloads)
FROM fake_apps;

MAX/MIN — It calculates the min and max of an column.

Query to check, What is the least number of times an app has been downloaded?

SELECT MIN(downloads)
FROM fake_apps;

Query to check, What is max price of an app ?

SELECT MAX(price)
FROM fake_apps;

Average — It calculates the average of an column.
Query to calculate the average price for all the apps in the table.

SELECT AVG(price)
FROM fake_apps;

ROUND — It rounds off the decimals points to required number for decimal points.

Roundoff the average value got from above query, below the 2 number after decimal point.

SELECT ROUND(AVG(price),2)
FROM fake_apps;

Of all the keywords in SQL, GROUP BY is the most important keyword. It groups all the data based on categorical type of data, as in we can group the apps by category and check number of apps under that category and many such queries are possible using GROUP BY.

GROUP BY — The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT. The below is from previous blog while representing the movies database.

SELECT year,
AVG(imdb_rating)
FROM movies
GROUP BY year
ORDER BY year;

We can use an aggregate function COUNT() and we arrange the price into groups. Then add a WHERE clause to count the total number of apps that have been downloaded more than 5,000 times, at each price.

SELECT price, COUNT(*)
FROM fake_apps
WHERE downloads > 5000
GROUP BY price;

Query to calculate the total number of downloads for each category.

SELECT category, SUM(downloads)
FROM fake_apps
GROUP BY category;

SQL lets us use thecolumn reference(s) in our GROUP BY that will make our lives easier.

1 is the first column selected
2 is the second column selected
3 is the third column selected

SELECT category,
price,
AVG(downloads)
FROM fake_apps
GROUP BY 1, 2;

Note: 1 represents category and 2 represents the price column.

HAVING — In addition to being able to group data using GROUP BY, SQL also allows you to filter which groups to include and which to exclude.

When Having takes over Group By ?
For instance, imagine that we want to see how many movies of different genres were produced each year, but we only care about years and genres with at least 10 movies. We can’t use WHERE here because we don’t want to filter the rows; we want to filter groups.

HAVING is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can be used with HAVING.

HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.

Below query returns the average downloads (rounded) and the number of apps — at each price point. However, certain price points don’t have very many apps, so their average downloads are less meaningful. So we add a Having clause to restrict the query to price points that have more than 1 apps.

SELECT price,
ROUND(AVG(downloads)),
COUNT(*)
FROM fake_apps
GROUP BY price
HAVING COUNT(*) > 1;

Let’s Summarize

Aggregate functions combine multiple rows together to form a single value of more meaningful information.

COUNT(): count the number of rows
SUM(): the sum of the values in a column
MAX()/MIN(): the largest/smallest value
AVG(): the average of the values in a column
ROUND(): round the values in the column

GROUP BY is a clause used with aggregate functions to combine data from one or more columns.
HAVING limit the results of a query based on an aggregate property.

Link : Learning SQL as Data Scientist — I

If you have learnt anything new here, then don’t forget to clap👏”.

Connect with me in LinkedIn

--

--

No responses yet