A quickstart guide to SQL Queries for beginners.
In this cheat sheet, you’ll find a list of the basic SQL queries, what they mean, and example code snippets. For these examples, I used Google’s BigQuery to practice. Depending on the SQL database you use the naming conventions of the columns and tables may be different, but the basic set up of the queries will be the same.
What it does:
*
if you want to select all the columns in the databasedatabaseName.columnName
→ for specifying which DB and column, useful when joining datasetsWhat it does:
FROM
clause. This is useful for making them easier to remember. For example, FROM Player_Attributes attributes
Example:
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
Here we will get all of the columns from the table Project Tycho Reports of the USA Contagious Disease dataset.
As you can see there are 759,467 rows in the results. That’s a lot and this is where adding other clauses will help narrow down the results. Alternatively, we also have the option to export the data which then can be used with other analysis software or programming language (e.g., Python) to better understand it.
Instead of using the asterisk *
to select all the columns we could also select a single column name (or multiple column names separated by commas ,
):
SELECT
disease
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
What it does:
%
can be used for searching strings>
and less than <
symbols are used for looking for numerical values_
is used for searching for charactersin
keyword performs an exact match while the like
keyword performs a looser matchbetween
, and
, or
, is null
, is not null
Example:
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
disease="MUMPS"
Here we can select all of the columns from the table, but the WHERE
clauses specifies we only want the entries where the disease is MUMPS. This means other diseases are not included.
Here’s another example where we are looking for a numerical value. In this case, when the number of incidences of a disease per 100,000 people is greater than 1.
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
incidence_per_100000>1
We can even combine these statements together with the and
keyword and look for the disease “MUMPS” where the incidence per 100,000 people is greater than 1.
SELECT
*
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
WHERE
disease="MUMPS" and incidence_per_100000>1
What it does:
Example:
SELECT
disease,
AVG(incidence_per_100000) as avg_incidence
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
GROUP BY
disease
Here we have a list of the average incidence per 100,000 people grouped by disease.
There are two things to note in this example:
as
to give an alias to the avg_incidence
. This can be used when doing a calculation or with column name that might be particularly long. This helps us make the column names more human-friendly.ORDER BY
clause.What it does:
DESC
to the end for descending orderExample:
SELECT
disease,
AVG(incidence_per_100000) as avg_incidence
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
GROUP BY
disease
ORDER BY
avg_incidence
Here we have a list of the average incidence per 100,000 people grouped by disease and ordered based on the incidence.
By default, it is listed from lowest to highest (ascending order). But we can also add the keyword desc
to the ORDER BY
clause in order to list it from highest to lowest (descending). See second example below.
SELECT
disease,
AVG(incidence_per_100000) as avg_incidence
FROM
`bigquery-public-data.usa_contagious_disease.project_tycho_reports`
GROUP BY
disease
ORDER BY
avg_incidence desc
Those are the basic SQL queries you need to get started with querying a database. At the very least, you need to know how to use SELECT
and WHERE
so that you can pull data out. However, it’s helpful to know other clauses like GROUP BY
and ORDER BY
to narrow down or filter your data. This is particularly helpful with large datasets.
We also got a preview of how we can use functions, like AVG()
, to perform calculations and return the results in a column. This helps us get a quick glance at the shape of the data which might leads us to areas for deeper exploration.
This is just a quick start guide. Like all things technology, there’s a lot more you can do with SQL and I’m not an expert by any means. But I do enjoy learning and sharing what I learn as I go. If you’d like to learn more about SQL, checkout the ‘References & Resources’ section. I’ve included a list of resources that were helpful to me when learning SQL and writing this post. Happy coding!
If you want to take a deeper dive into SQL queries, I found the following resource helpful while writing this post.