class: center, middle # Modern SQL is awesome! ## Loris Bognanni ### FundApps #### 20190522 .bottom-notes[
] --- class: center, middle ##Contrary to popular opinion, SQL databases are still the best choice for most data-storage and querying uses
--- # SQL has come a long way since 1993 ## So why use it like it's 1993? .centered[
] --- #Easy subtotals Given data like: |COUNTRY|CITYNAME|...|POPULATION| |---|---|---|---:| |Japan|Tokyo|...|13,515,271| |Pakistan|Karachi|...|14,910,352| |Japan|Yokohama|...|3,726,167| |Mexico|Mexico City|...|8,918,653| | ... | | | We want: |COUNTRY|CITYNAME|NUMBER_OF_PEOPLE| |---|---|---:| |Japan|Tokyo|13,515,271| |Japan|Yokohama|3,726,167| |Japan|NULL|17,241,438| |Pakistan|Karachi|14,910,352| |Pakistan|Lahore|11,126,000| |Pakistan|NULL|26,036,352| | ... | | | |NULL|NULL|309,219,535| --- # ~~Easy~~ subtotals ### The old way ```SQL SELECT COUNTRY, CITYNAME, SUM(POPULATION) AS NUMBER_OF_PEOPLE FROM @CITYSTATS GROUP BY COUNTRY, CITYNAME UNION ALL SELECT COUNTRY, NULL AS CITYNAME, SUM(POPULATION) AS NUMBER_OF_PEOPLE FROM @CITYSTATS GROUP BY COUNTRY UNION ALL SELECT NULL AS COUNTRY, NULL AS CITYNAME, SUM(POPULATION) AS NUMBER_OF_PEOPLE FROM @CITYSTATS ``` --- #Easy subtotals ### With `GROUPING SETS` and `ROLLUP` ```SQL SELECT COUNTRY, CITYNAME, SUM(POPULATION) AS NUMBER_OF_PEOPLE FROM @CITYSTATS GROUP BY GROUPING SETS (( COUNTRY, CITYNAME), (COUNTRY), ()) -- or SELECT COUNTRY, CITYNAME, SUM(POPULATION) AS NUMBER_OF_PEOPLE FROM @CITYSTATS GROUP BY ROLLUP (COUNTRY, CITYNAME) ``` `Sql Server >= 2008` `Postgres >= 9.5` .bottom-notes[
] --- #Common table expressions ### Before: hard to read queries ```SQL SELECT * FROM ( SELECT COL1, COL2, SUM(VAL1) AS SUM1, MAX2 FROM TABLE1 INNER JOIN ( SELECT COL3, COL4, MAX(VAL2) AS MAX2 FROM TABLE2 GROUP BY ... ) ON ... WHERE X=Y GROUP BY COL1, COL2 ) AS T LEFT JOIN TABLE3 ... ``` --- #Common table expressions ### After: much better ```SQL ;WITH Q1 AS ( SELECT COL3, COL4, MAX(VAL2) AS MAX2 FROM TABLE2 GROUP BY ... ), Q2 AS ( SELECT COL1, COL2, SUM(VAL1) AS SUM1, MAX2 FROM TABLE1 INNER JOIN Q1 ON ... WHERE X=Y GROUP BY COL1, COL2 ) SELECT * FROM Q2 LEFT JOIN TABLE3 ... ``` `Sql Server >= 2005` `Postgres >= 8.4` --- #Recursive CTEs ### A CTE can reference itself! This is useful when querying hierarchical structures ```SQL WITH CTE AS ( SELECT ID AS MENUID, ID, PARENTID, TEXT FROM MENU UNION ALL SELECT C.MENUID, M.ID, M.PARENTID, M.TEXT FROM MENU M INNER JOIN CTE C ON M.PARENTID = C.ID ) SELECT * FROM CTE WHERE MENUID=6 -- READ MENU#6 AND ALL OF ITS CHILDREN ``` `Sql Server >= 2005` `Postgres >= 8.4 (WITH RECURSIVE)` .bottom-notes[
] --- #Group Totals ##Example Given the data: |EXPENSEDAY | REASON | AMOUNT | |--|--|--:| |1 | Groceries | 10.00 | |1 | Restaurants | 20.00 | |1 | Gym | 10.00 | |2 | Groceries | 20.00 | |2 | Travel | 15.00 | |2 | Drinks | 8.00 | |3 | Restaurants | 40.00 | |3 | Travel | 30.00 | add a column with the total for the day --- #Group Totals ##Example |EXPENSEDAY | REASON | AMOUNT | DAY_TOTAL | |--|--|--:|--:| |1 | Groceries | 10.00 | 40.00| |1 | Restaurants | 20.00 | 40.00| |1 | Gym | 10.00 | 40.00| |2 | Groceries | 20.00 | 43.00| |2 | Travel | 15.00 | 43.00| |2 | Drinks | 8.00 | 43.00| |3 | Restaurants | 40.00 | 70.00| |3 | Travel | 30.00 | 70.00| --- #Group totals ##Before: ugly nested subqueries ```SQL SELECT E1.EXPENSEDAY, E1.REASON, E1.AMOUNT, DAY_TOTAL FROM @EXPENSES E1 INNER JOIN ( SELECT EXPENSEDAY, SUM(AMOUNT) AS DAY_TOTAL FROM @EXPENSES GROUP BY EXPENSEDAY ) E2 ON E1.EXPENSEDAY=E2.EXPENSEDAY ``` --- #Group totals ##Modern SQL: ```SQL SELECT EXPENSEDAY, REASON, AMOUNT, SUM(AMOUNT) OVER (PARTITION BY EXPENSEDAY) AS DAY_TOTAL FROM @EXPENSES ``` `Sql Server >= 2005` `Postgres >= 8.4`
.bottom-notes[
] --- #Running Totals |EXPENSEDAY|REASON|AMOUNT|RUNNING_TOTAL|DAY_RUNNING_TOTAL| |--|--|--:|--:|--:| |1|Groceries|10.00|10.00|10.00| |1|Restaurants|20.00|30.00|30.00| |1|Gym|10.00|40.00|40.00| |2|Groceries|20.00|60.00|20.00| |2|Travel|15.00|75.00|35.00| |2|Drinks|8.00|83.00|43.00| |3|Restaurants|40.00|123.00|40.00| |3|Travel|30.00|153.00|70.00| ##Before:
--- #Running Totals ##Modern SQL: ```SQL SELECT EXPENSEDAY, REASON, AMOUNT, SUM(AMOUNT) OVER(ORDER BY ID) AS RUNNING_TOTAL, SUM(AMOUNT) OVER(PARTITION BY EXPENSEDAY ORDER BY ID) AS DAY_RUNNING_TOTAL FROM @EXPENSES ``` `Sql Server >= 2005` `Postgres >= 8.4`
.bottom-notes[
] --- #The best functions in SQL ## `ROW_NUMBER()`, ~~`RANK()`~~, `DENSE_RANK()` ```SQL SELECT *, ROW_NUMBER() OVER(ORDER BY EXPENSEDAY) AS [ROW_NUMBER], RANK() OVER(ORDER BY EXPENSEDAY) AS [RANK], DENSE_RANK() OVER(ORDER BY EXPENSEDAY) AS [DENSE_RANK] FROM @EXPENSES ``` |EXPENSEDAY|ROW_NUMBER|RANK|DENSE_RANK| |--|--|--|--| |1|1|1|1| |1|2|1|1| |1|3|1|1| |2|4|4|2| |2|5|4|2| |2|6|4|2| |3|7|7|3| |3|8|7|3| .bottom-notes[
] --- ##The best functions in SQL ### Example uses: sample data |PERSON_ID|CONTACT_TYPE|CONTACT_VALUE|UPDATED_AT| |--|--|--|--:| |1|PHONE|555 123456|2019-05-19| |1|EMAIL|person1@example.com|2019-05-19| |1|PHONE|555 000000|2019-04-15| |1|PHONE|555 999999|2019-03-13| |1|EMAIL|an@old.email|2019-04-02| |2|PHONE|888 123456|2019-05-19| |2|EMAIL|person2@example.com|2019-05-19| |2|PHONE|888 999999|2019-03-13| |2|EMAIL|a2nd@old.email|2019-04-02| --- ## Getting the last Email & phone # for each PersonId ```SQL ;WITH NUMBERED_CONTACTS AS ( SELECT *, ROW_NUMBER() OVER( PARTITION BY PERSON_ID, CONTACT_TYPE ORDER BY UPDATED_AT DESC ) AS N FROM @CONTACT_REFS ) SELECT PERSON_ID, CONTACT_TYPE, CONTACT_VALUE FROM NUMBERED_CONTACTS WHERE N=1 ``` |PERSON_ID|CONTACT_TYPE|CONTACT_VALUE| |--|--|--| |1|EMAIL|person1@example.com| |1|PHONE|555 123456| |2|EMAIL|person2@example.com| |2|PHONE|888 123456| .bottom-notes[
] --- ##Last 3 emails/phone #s for with each person ### First, rank the contacts by type ```SQL SELECT PERSON_ID, CONTACT_TYPE, CONTACT_VALUE, DENSE_RANK() OVER ( PARTITION BY PERSON_ID, CONTACT_TYPE ORDER BY UPDATED_AT DESC ) AS RANKID FROM @CONTACT_REFS ``` |PERSON_ID | CONTACT_TYPE | CONTACT_VALUE | RANKID| |--|--|--|--:| |1 | EMAIL | person1@example.com | 1| |1 | EMAIL | an@old.email | 2| |1 | PHONE | 555 123456 | 1| |1 | PHONE | 555 000000 | 2| |1 | PHONE | 555 999999 | 3| |2 | EMAIL | person2@example.com | 1| |2 | EMAIL | a2nd@old.email | 2| |2 | PHONE | 888 123456 | 1| |2 | PHONE | 888 999999 | 2| --- ##Last 3 emails/phone #s for with each person ### Then, pivot the data ```SQL ;WITH RANKED_CONTACTS AS ( SELECT PERSON_ID, CONTACT_VALUE, DENSE_RANK() OVER ( PARTITION BY PERSON_ID, CONTACT_TYPE ORDER BY UPDATED_AT DESC ) AS RANKID FROM @CONTACT_REFS ) SELECT PERSON_ID, [1] AS PHONE1,[2] AS PHONE2, [3] AS PHONE3 FROM RANKED_CONTACTS PIVOT ( MIN(CONTACT_VALUE) FOR RANKID IN ([1], [2], [3]) ) AS PVT ``` |P..._ID |CONTACT_TYPE | CONTACT1 | CONTACT2 | CONTACT3| |--|--|--|--|--| |1 |EMAIL | person1@example.com | an@old.email | NULL| |2 |EMAIL | person2@example.com | a2nd@old.email | NULL| |1 |PHONE | 555 123456 | 555 000000 | 555 999999| |2 |PHONE | 888 123456 | 888 999999 | NULL| .bottom-notes[
] --- #But wait, there's more! - [Modern SQL](https://modern-sql.com/) - Goes in depth on this topic and i _might_ have based this presentation on [his](https://www.slideshare.net/MarkusWinand/modern-sql?ref=https://modern-sql.com/slides) - [Use The Index, Luke!](https://use-the-index-luke.com/) - mostly about indexing & performance - [Cast And Convert](https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017) - You'll need this if you do any kind of date manipulation in SQL Server - [JSON Functions in SQL Server](https://docs.microsoft.com/en-us/sql/t-sql/functions/json-functions-transact-sql?view=sql-server-2017) (`SQL Server >= 2016`) ---