Make your SQL safer
SQL can be a beautiful thing, but it can also be a dangerous thing. If you're using SQL to access a database for an app that's used by hundreds or thousands or even millions of users, you need to be careful - because you could accidentally damage or erase all that data. There are various techniques you can use to make your SQL safer, however.
Avoiding bad updates/deletes
Before you issue an
UPDATE, run a
SELECTwith the same
WHEREto make sure you're updating the right column and row.
For example, before running:
UPDATE users SET deleted = true WHERE id = 1;
You could run:
SELECT id, deleted FROM users WHERE id = 1;
Once you decide to run the update, you can use the LIMIT operator to make sure you don't accidentally update too many rows:
UPDATE users SET deleted = true WHERE id = 1 LIMIT 1;
Or if you were deleting:
DELETE users WHERE id = 1 LIMIT 1;
When we issue a SQL command that changes our database in some way, it starts what is called a "transaction." A transaction is a sequence of operations treated as a single logical piece of work (like a bank transaction), and in the world of databases, a transaction must comply to the ACID principles to make sure the operations are processed reliably.
Whenever we issue a command like
DELETE, we are automatically starting a transaction. However, if we want, we can also wrap up multiple commands inside a bigger transaction. It may be that we only want an
UPDATEto go through if another
UPDATEgoes through as well, so we want to put both of them in the same transaction.
In that case, we can wrap the commands in
BEGIN TRANSACTION; UPDATE people SET husband = "Winston" WHERE user_id = 1; UPDATE people SET wife = "Winnefer" WHERE user_id = 2; COMMIT;
If the database is unable to issue both those
UPDATEcommands for some reason, then it will rollback the transaction and leave the database how it was when it started.
We also use transactions when we want to make sure that all of our commands operate on the same view of the data - when we want to ensure that no other transactions operate on that same data while the sequence of commands is running. When you're looking at a sequence of commands you want to run, ask yourself what would happen if another user issued commands at the same time. Could your data end up in a weird state? In that case, you should run in a transaction.
For example, the following commands create a row denoting that a user earned a badge, and then update the user's recent activity to describe that:
INSERT INTO user_badges VALUES (1, "SQL Master", "4pm"); UPDATE user SET recent_activity = "Earned SQL Master badge" WHERE id = 1;
At the same time, another user or process might be awarding the user a second badge:
INSERT INTO user_badges VALUES (1, "Great Listener", "4:05pm"); UPDATE user SET recent_activity = "Earned Great Listener badge" WHERE id = 1;
These commands could now actually be issued in this order:
INSERT INTO user_badges VALUES (1, "SQL Master"); INSERT INTO user_badges VALUES (1, "Great Listener"); UPDATE user SET recent_activity = "Earned Great Listener badge" WHERE id = 1; UPDATE user SET recent_activity = "Earned SQL Master badge" WHERE id = 1;
Their recent activity would now be "Earned SQL Master badge" even though the most recently inserted badge was "Great listener". That's not the end of the world, but it's also probably not what we expected.
Instead, we could run those in a transaction, to guarantee that no other transactions happen in the middle:
BEGIN TRANSACTION; INSERT INTO user_badges VALUES (1, "SQL Master"); UPDATE user SET recent_activity = "Earned SQL Master badge" WHERE id = 1; COMMIT;
You should definitely follow all those tips, but sometimes mistakes still happen. Because of that, most companies make backups of their databases - on an hourly, daily, or weekly basis, depending on the size of the database and space available. When something bad happens, they can then import data from the old database for whichever tables were damaged or lost. The data may end up a little outdated, but outdated data is often better than no data at all.
A related approach is replication - always storing multiple copies of the databases in different places. If for some reason a particular copy of the database is unavailable (like lightning hit the building that it's in, which has actually happened to me!), then the query can be sent to another copy of the database which is hopefully still available. If the data is very important, then it should probably be replicated to ensure availability. For example, if a doctor is trying to pull up a list of a patient's allergies to determine how to treat them in an emergency situation, then they can't afford to wait for engineers to get the data out of a backup, they need it immediately.
However, it is a lot more effort to replicate databases and it often means slower performance since write operations have to be performed in all of them, so companies must decide whether the benefits of replication are worth the costs, and investigate the best way of setting it up for their environment.
Many database systems have users and privileges built into them, because they are stored on a server and accessed by multiple users. There is no concept of user/privilege in the SQL scripts on Khan Academy, because SQLite is typically used in a single-user scenario, and thus you can write to it as long as you have access to the drive it's stored on.
But if you are using a database system on a shared server one day, then you should make sure to set up users and privileges properly from the beginning. As a general rule, there should be only a few users that have full access to the database (like backend engineers), since it can be so dangerous.
For example, here's how we can give full access to a particular user:
GRANT FULL ON TABLE users TO super_admin;
And here's how we can give only SELECT access to a different user:
GRANT SELECT ON TABLE users TO analyzing_user;
In a big company, you often don't even want to give
SELECTaccess to most users, because there might be private data in a table, like a user's email address or name. Many companies have anonymized versions of their databases that they can query on without worrying about access to private information.
Want to join the conversation?
- What are the ACID principles?(138 votes)
- Atomicity, Consistency, Isolation, Durability
- May be it worth to add some example of how to make backup of database and how to restore data from backup. Just for understanding what tools are used. Thanks.(43 votes)
- In phpmyadmin you can just select the backup option in the GUI. It will keep a backup of all the information and you can also change the name of the backup tables if you don't want the default name assigned by phpmyadmin.(8 votes)
- In the XKCD comic there was an example of a SQL injection attack. If I add user input to my database, what are the best practices to prevent such an attack?
Thank you.(14 votes)
- This is a bit of an old thread, but it is a very good question. Some of the specifics of the approach you would take will vary depending on the programming language you're using to interact with the database.
You can also create your own regular expression filters like @spollei mentioned in basically any programming language out there, although it does take a little more effort on your part (but it's worth it!). If you use your own filters, it's a good idea to test them out first to make sure they are actually working. You can use tools like SQLMap (http://sqlmap.org/) or test it yourself using something like the exhaustive (and somewhat exhausting) guide from owasp.org (https://www.owasp.org/index.php/Testing_for_SQL_Injection_(OTG-INPVAL-005)).(6 votes)
- Can a backend engineer with full access lose their mind or get angry and delete everything in their employer's database? Or is there a safety measure used to prevent an angry employee from causing this much damage?(10 votes)
- Disaster may strike in many different forms. That is why data has redundant copies, including "backups" which made me rich.(13 votes)
- can someone like get rid of a column using alter table(11 votes)
- Yes, which is why you usually don't want users to have the rights to do such things.
But if the user has sufficient rights they can just do:
ALTER TABLE table_name DROP COLUMN column_name;(11 votes)
- Are these best practices for MS SQL or MySQL ?(3 votes)
- These apply to any situation where you're responsible for a bunch of data, of course the syntax could be a bit different for non-SQL databases, but defined privileges, regular backups and reliable replication & transactions are basic necessities when dealing with valuable, sensible, or private information of multiple people.
On a side note, while this course uses SQLite, its content applies to every SQL variant, as they all have to comply to the same ANSI standard.(14 votes)
- Well, at this point I think that I understand all the queries and interactions that we can have with the database, but there is a couple of concerns that still being on my head:
1. The database can store more than integers, text or stuff like that, also can store documents, videos, etc, so how is this made? For example, how I match a query made from my android app called "videoapp" with my videosdatabase.sql with the video "dogs birth.mp4" that is in certain part of my server to open it and show me in the screen. Obviously in general therms, I'm pretty sure that is not an easy work :) .
2. A real database I guess that just makes the link between the user and the real place where is what he want, or for real is just an .sql containing many tables with all the information?
Thanks a lot for your answer, I appreciate it a lot. :D(3 votes)
- A database doesn't usually contain actual documents, but it can, using the BLOB datatype. And there may be reasons to use it that way.
But more often than not, you would store the files on a file system the same way you store files on your PC. The database then just stores the path of that file.
A database is the place where the data is stored. It runs on a server. If you have an IP address, a username and a password, you can access that server and send it SQL queries. The server then sends results back to the client.
The "*.sql" files are files containing SQL queries. They make it easy to store a bunch of SQL queries, should you need them later.(8 votes)
- with the concept of "begin transaction", would it do the operations in order such as for example :
UPDATE people SET husband = "Winston" WHERE user_id = 1;
UPDATE people SET wife = "Winnefer" WHERE husband = "Winston";
Will user-id = 1 have husband "Winston" and wife "Winnefer" ?(3 votes)
- Yes, the updates still happen in order. In fact if the second one fails then the first one should get automatically rolled back.(5 votes)
- In the paragraph about the command GRANT, what would
GRANT SELECT ON TABLE users TO analyzing_user(3 votes)
- It would give the database user analyzing_user the rights to see data that is stored in the users table.(3 votes)
- Are transactions similar to locks and synchronized blocks in other languages? It's unclear from the article whether SQL using multithreading or it is just to protect against multiple commands from different places.(1 vote)
- Ummm... Different RDBMS have different CPU optimizations, but most of them do work with multiple threads.
Transactions are always there. But there are explicit, and there are implicit transactions. When you run a query, it's an implicit transaction. When you open a transaction - that's an explicit transaction (you control it).
Locking always occurs when you work with the database, to ensure data integrity. A SELECT query will usually lock the tables with a read only lock, that are being read at the table level until the query is done. INSERT and DELETE will lock the table for any other type of query. UPDATE will only place row-level locks on the affected rows.
If it's an explicit transaction (you opened it), the locks will persist until you commit or rollback.
Committing and rolling back are one of the main uses for explicit transactions, as they allow you to perform operations in bulk only if every operation is successful.
Speed is another important reason why explicit transactions should be used. Bulk modification queries work fastest if the transactions don't open/commit for every single query.(5 votes)