This tutorial is intended for complete beginners, and should help you avoid the annoying pitfalls that most people encounter when trying to wrap their heads around the concepts of SQL and databases, as well as networking and deployment of a database on Windows and Linux.
SQL is a language for accessing data in a database. SQL stands for Structured Query Language.
There are many kinds of database. SQL works with only one kind: Relational Databases, based on a model called the Relational Database Management System (RDBMS). There are other kinds of databases, including Graph, which is based on the newer Associative model, and Object, which was supposed to replace RDBMS in the 1980s and 1990s, but it ultimately was not popular and was merged into the Relational Database model to form the Object-Relational Database Management System (ORDBMS). There are other database types, also.
Relational Database Structure
Databases contain tables, like Excel spreadsheets. Here’s an example of a 4 column table that stores a list of people who log into a computer:
When you add a new record to a table, you have to make sure your entry’s column matches the data type (and other parameters, like “non-negative”) that the table’s column specifies. If a column is intended to store an integer, it cannot store decimal numbers, text, or anything else. So a valid set of entries for the table are as follows:
Most SQL databases do not know what a “string” is. They use fixed-length strings called “varchars” (variable character arrays), or arbitrary-length strings called “Text”. When creating a varchar, you must specify the fixed maximum number of characters (letters, numbers, and symbols) they can store. In the example above, the
Name field (“field” means the same thing as “cell”) is wrong. It should be re-written as
varchar(55), where 55 is the maximum number of characters you want it to hold. A typical person’s name doesn’t need a lot of space, so we chose 55, arbitrarily. The reason we do this is so that the database can save space and speed up its ability to find data. This is important on high-traffic websites.
The general structure of a database
Physical Server (Your computer or remote computer)
└ Database Server (eg. MySQL Software)
└ Database Server (eg. PostgreSQL Software)
The Database is also called a “Schema” (a stupid renaming designed to breed confusion among newcomers).
The database software is often called “the server” thanks to software engineers’ obliviousness to the existence of hardware also called “servers”.
Database server software runs in the background. In Windows, this type of program is called a “service”. In Linux it is called a “daemon”. Actually, it’s more complicated.
A common combination of servers is web + email + database + SSH (secure shell).
Accessing a Remote database
If you want to access a database from a different computer, read this section. If you just want to use a database on your local machine, read the next section.
To access a databases, you must make a connection over the network, so you have to provide an IP address, or a URL, as well as the database’s port number.
Ports are virtual channels through which your computer can send or receive messages on the internet or local network. They are addressed by their number (port 80, 444, etc).
If the software doesn’t ask for the port by name, you need to append it to the URL, like this:
Creating a database
To create a database, you will need to install the server software, connect to the server, and then create the database.
Setting up a server on your computer
When two or more database programs are running at the same time on your local machine, you can access them both. To allow multiple servers to run on the same machine, a system was invented many years ago called ports. Ports are virtual channels through which your computer can send or receive messages on the internet or local network. They are addressed by their number (port 80, 444, etc). Each server runs on a different port. If they try to use the same port, you will have to tell one of them to use a different port. The database software installer will tell you which port it wants to use. You will also have to exclude the app (or port) from being blocked by your firewall. If Windows doesn’t ask you, configure it manually. On Linux, the firewall is configured with the
iptables command (or a configuration file, in NixOS).
During the MySQL or PostgreSQL install process, the installer makes you create your connection credentials. Save the root password.
Browsing your database connection
The MySQL installer will install their database explorer tool, MySQL Workbench. As a learner, the Workbench is not intuitive. After getting familiar with SQL basics, it is a nice tool for analyzing your database and experimenting with SQL commands (queries).
If you used MySQL’s installer, the connection will be already set up in the Workbench. Otherwise, open MySQL Workbench or another database browser and add a connection. For the Hostname field, provide the IP address, 127.0.0.1, or the alias, “localhost”, which points to 127.0.0.1.
After connecting, you can begin adding and modifying things. You do this by writing commands called queries and sending them to the database server. In the Workbench, you must first compose a query before sending it. Open a query editor by clicking on “File” > “New query tab”. Type into the blank text editor. To send them to the database, click the Execute button above the text (Lightning icon).
A database server, by default, has no databases. You must create them yourself using SQL.
There are several different SQL dialects. They all stem from the same idea, but each dialect may have some added unique capabilities and conveniences, or they may have a different way to write the same thing. These examples are what is common between all of them. When I deviate from the standard, I explain.
Create an empty database with no tables:
CREATE DATABASE database_name;
You have a database, but before you can add anything to your db, you have to select it, since you can have more than one database:
Next, add a table with column names:
CREATE TABLE user_accounts ( `Name` varchar(55), `Birth Date` date, `Password` varchar(55), `Total Logins` int );
I named mine “user_accounts”. You can name yours anything you want.
Notice I used the back-tick character to name the columns. This is required for names containing spaces. MySQL uses back-ticks, while Microsoft SQL uses
create statement above takes the general form:
CREATE TABLE table_name ( column_name data_type, column_name data_type, column_name data_type, ... );
); represents your facial expression after you discover how inconsistent the different dialects of SQL are.
Add some entries to your table:
INSERT INTO user_accounts VALUES ('Ryan Johnson', '1990-09-09', 'My$ecretPa$$word123', 5);
INSERT INTO table_name VALUES (value1, value2, value3);
You may ask yourself: “What if I want to work with multiple databases and don’t want to change the database with
USE database_name every time I send a new request? Why not just include the database in the INSERT statement?” This is traditionally not possible, but some dialects support this. This right here should tip you off that SQL was designed in an ad-hoc manner, and is in dire need of quality of life changes. Some dialects, eg. MySQL, do this, but code made with one dialect’s conveniences can’t be re-used with other database software. Beware the never-ending list of SQL dialect gotchas.
This lack of consistency persists across implementations, despite the SQL standard being updated regularly. The latest update was in 2016. Hopefully these differences are resolved in the near future.
To read values from the database, send a
SELECT query to the database. For example, to get all the names of every person in my
SELECT Name FROM user_accounts;
This should return the name that you put into the table earlier. If you added more than one entry, it will show you all the names from user_accounts.
In general, a
SELECT query specifies criteria for the database to look up, then the database returns all matching results. This is known as a declarative programming paradigm and is a very powerful paradigm with potential for use in AI.
SELECT column_name FROM my_table;
MySQL allows you to specify a database as well:
SELECT col_name FROM db_name.tbl_name;
source: MySQL docs
To get every column of data, use the * symbol:
SELECT * FROM my_table;
The above statement spits back the results:
If you want it to give you only certain qualifying entries, for example, every person whose name is “Bob” you can add a condition, using a WHERE clause:
SELECT * FROM my_table WHERE Name="Bob";
SELECT * FROM my_table WHERE conditions;
These conditions can be pretty complex, involving pattern-matching and logic. To get every entry where the phone number has the 901 area code:
SELECT * FROM my_table WHERE telephone_number LIKE '%901_%_%_%_%_%_%_%'
The “LIKE” word is an operator that tells the database to look for text patterns in its records. In this case, the
% before the 901 looks for any quantity of characters, and the
_% looks for exactly one character (but it doesn’t matter what character). Here are some example strings that qualify:
19011234567 (without the plus) or just
9011234567, but not
901-123-4567, because the hyphens count as characters.
When you are creating tables, you can specify rules for each column’s data, so that when you want to look things up, all of the entries are already in the correct form. If you allow the user to store the phone number in either the non-hyphenated or hyphenated format, then you will need to match multiple patterns in the
SELECT * FROM my_table WHERE telephone_number LIKE '%901_%_%_%_%_%_%_%' OR telephone_number LIKE '%901-_%_%_%-_%_%_%_%'
If you specify rules for the columns, then you will need to also include code in the front-end app (say a web page) to re-format user input to meet your requirements.
What is the best SQL Database?
There are some different use cases to consider. For almost any potential application, PostgreSQL (Postgres) is the best-performing one. Unlike PostreSQL, MySQL can connect to a distributed database spanning multiple remote machines. It does this by using a different database storage back-end.