29.04.2022 – In the second week, we were introduced to the Structured Query Language (SQL). It is a descriptive language to retrieve data stored in the form of tables from a database in a customised way. The advantage is that the so called “SQL-query” will generate an on-the-fly algorithm to do the actual heavy lifting in the background before sending back the requested information to the user. Databases can store huge amounts of information but the Data Scientist might not need all the information in one go. For the administration of the database, in this case SQLite, we are using DBeaver. It is a handy tool to check out what’s inside the .sqlite file and generate queries before implementing them into Python code.
We worked on a database that holds information about movies and had to retrieve information such as “who were the 5 youngest directors at the time of their first movie” or “which movie titles contain the word ‘Love'”. Sounds easier than it is, because the machine doesn’t know that there is a difference between “Love”, “Love and some more words”, “Love, and some more words” or “some words before Love and some words after”, so you need to specify them. Using a placeholder for “L” (in SQL: _) will also give you “Alien: Covenant”, which is the last creature you want to encounter when talking about love… 😉
SQL seems “easy” at first sight, but can get quite complicated quickly once you start querying information from multiple tables, include conditions and join them together (left, right, inner, outer JOINs). Also, it is arbitrary to what extent you should use SQL or Python to get the information. That’s where the Python sqlite3 library comes into play. It helps you to implement SQL-queries into your code and fetching the information from the database. It then depends on you whether you have basically just write a print statement (or return in a function) because your query sent back the info in a neat way or whether you have to apply your coding skills in Python to rearrange the info to your liking. Some operations require a good deal of Python code compared to an “easy-looking” SQL-query. The point is that building these queries is an art in itself. However, we have just finished the “Basics” module. Tomorrow, we will dive into the “Advanced” aspects of it with a full day of Saturday SQL coding!
“Those who keep learning will keep rising in life.” – Charlie Munger