SQL (Structured Query Language) is the standarddatabase managementlanguage across the web. Though there are several different database packages, MySQL being the most common, all of them use SQL as their foundation. Basically, if you want to access relational data from your application, regardless of package or scripting language, you are going to need to understand SQL.

This tutorial is meant to take a beginner SQL programmer from the most simple queries to something much more advanced. Since there are some differences in the ways that different packages handle SQL, we’re going to be using MySQL for these examples.

First, the Data

For the purposes of this tutorial, we are going to pretend that we own an e-commerce business. Our MySQL database has tables tracking our products, customers, orders, and customer support calls. Here is a layout of our hypothetical database:

Let’s Start Simple

Before we get into anything more advanced, let’s look at some basic, but useful, SQL queries that we can perform on this database.

First, a query to retrieve all data on our customers, just about the simplest query possible:

SELECT * FROM tblCustomers

All of the basic parts of a SQL query are there: the command (SELECT) tells the database what kind of operation you are doing. The asterisk means that we are retrieving data from all the columns in the database, rather than a specific list. Lastly,FROM tblCustomerslets the program know which table to retrieve. These are like the subject, verb, and object in a sentence. They give us all the basic information we need to do a simple query.

But I Don’t Want All the Data

You’re rarely going to want all the data from all your customers at once. That’s not very useful for getting at anything specific. So let’s narrow down which customers’ data gets retrieved.

SELECT * FROM tblCustomers
WHERE custCity = "Portland" AND custState = "OR"

Here, we’ve added to our query to restrict our results, which now will only include those customers living in Portland, Oregon.

But we are still getting all the data for those customers, which we probably don’t need. The next step is to restrict what we’re taking, analogous to the subject of our sentence:

SELECT custID, custFName, custLName FROM tblCustomers
WHERE custCity = "Portland" AND custState = "OR"

Now our results only include the customer’s first and last name and their unique ID. With a few tweaks, we’ve gone from a basic but not very useful query to a specific and meaningful one.

Introduction to Joins

The real power of relational databases is the ability to link data between tables. One table is just rows and columns, pretty two-dimensional. But bring in relationships between tables, and you can do much more with your data. In SQL, tables are linked using the JOIN command. There are seven kinds of SQL Joins: Left, Inner, Full Outer, Right, Left with Null Right, Right with Null Left, and Cross. Cross joins are a complicated issue for a separate post, but the others can be understood best with a series of Venn diagrams, with each circle representing a table:

As you can see, a join is mostly used to add data to a query, but in the case of the Null joins, it can also be used to pare down the data in the query result. Let’s get into some examples.

Here’s a basic left join using our example database:
SELECT c.custLName, o.OrdID
FROM tblCustomers c
LEFT JOIN tblOrders o ON c.custID = o.ord_custID

That query creates a result with all customer last names and the order numbers matching that customer. If a customer has no orders, their last name will still appear in a row of the result, but with a NULL result in the column that would usually hold the order number. If there are orders that don’t match any customer ID (which there really shouldn’t be, logically) they won’t appear at all.

Also note that we use shorthand for our tables: c for tblCustomers and o for tblOrders. This is a pretty standard way of saving space and making the query less bulky. As queries get more complicated, having to write out your table name every time is going to get annoying and increase the chance of making a mistake.

Let’s try a different kind of join:
SELECT c.custLName, o.OrdID
FROM tblCustomers c
INNER JOIN tblOrders o ON c.custID = o.ord_custID

This is almost identical to our last query, except that those empty customer rows with no orders will disappear. An inner join returns only the overlap in the two tables.

The Null joins are sometimes a source of confusion, so let’s do an example of one of those:
SELECT o.OrdID
FROM tblCustomers c
RIGHT JOIN tblOrders o ON c.custID = o.ord_custID
WHERE c.custID IS NULL

Here we get a list of order IDs that don’t match any customer ID. Only these mismatches are returned. In our case, there shouldn’t really be any results that match this query, but it could be useful for checking the database for this sort of error.

Adding More Joins

Joining two tables is great, but we have more than two tables in our database, don’t we? Joins can be used in a series to gather matching data from as many tables as you want.

SELECT p.prodID
FROM tblCustomers c
LEFT JOIN tblOrders o ON c.custID = o.ord_custID
LEFT JOIN tblProducts p ON o.ord_prodID = p.prodID
WHERE c.custID = 1

This query retrieves a list of all product IDs ordered by a given customer. Because both joins are left, only the orders that match the customer ID are included, and only the products that match the order ID are included as well. Let’s do an exercise. Try to write a query which returns all the product IDs that have never been ordered before, using two joins from the same three tables we used above. Put your query in this textarea:

Your query should look something like this:

SELECT p.prodID
FROM tblCustomers c
LEFT JOIN tblOrders o ON c.custID = o.ord_custID
RIGHT JOIN tblProducts p ON o.ord_prodID = p.prodID
WHERE o.ordID IS NULL

This returns the products that have never been included in any order by any customer. See how the first left join combines data from the customer and order tables, then the right join uses the Order IDs in that result to restrict a list of Product IDs before sending them as a result.

Occam’s Razor: Complexity Isn’t the Goal

A note to keep in mind before we finish up this tutorial. As powerful as the ability to join tables can be, and as necessary as these skills are for a web developer, keep good old Occam’s Razor in mind. All else being equal, keep it simple. Any time you’re querying more than one table, make sure each one is actually necessary. With this in mind, we can go back to that last query and clean it up. Since every order has a customer, and since we aren’t restricting the results by the customer ID, the customer table isn’t necessary at all in that query. Here’s a cleaner version:

SELECT p.prodID
FROM tblOrders o
RIGHT JOIN tblProducts p ON o.ord_prodID = p.prodID
WHERE o.ordID IS NULL

Bill Gates famously said once that he liked to hire lazy programmers, because they would find the best ways to decrease their own work load, and thus write more efficient code. Don’t work harder on your queries than you have to, and don’t make your database work harder than necessary either. Happy programming.

09-08 09:33