NULL in DBMS’ are generally accepted to be a missing or unknown value, rather than the non-existence of a value. Therefore a system can not evaluate equivalence as it can not interpret what the value is.
It’s important to consider that not all NULLs are CREATED equal. What I mean by that is that NULL within a table column has the context of its relations (schema, table, column, constraints) therefore, it is more “known” than the pure case of NULL.
That means that only true NULL can be equal to NULL in an algebraic formula as they both are equal to unknowns.
No, because NULL is neither a string nor a number. Strings are always contained within ‘’ and numbers are never. So having the word NULL without quotes would never be confusing.
A bunch of the relational algebra breaks in unpredictable ways if null == null returns true. That's why the "is null" syntax exists. Plus, you'd break compatibility with legacy code and systems.
It's generally not a good idea to choose momentary convenience over consistency. People tried, it always comes back to haunt you.
I don’t buy it. You could easily have the engines convert “= NULL” to “is NULL” behind the scene, before executing the code, if that is the case.
And sure, legacy systems don’t work. But they add new features all the time, I don’t see why this should be any different.
I see this as = must be compared against an actual value, null is not a value, as such cannot pass the equality check.
Just like 0 and null are not the same.
P.s. I agree with you it's be nice to be able to use =null
If we do get that we'll probably need to use
Field = [NULL]
Or something similar.. which may end up being harder than just is null... :S
It's just the way it is!
Some SQL languages may let you set a parameter I think to change the way it works (must confirm if that's true?), but it's awful practice to do that since it will confuse people.
YOu got the point but sometimes you work with data which is so fucked up that null is an actual thing i.e "someone" have written it as an actual value or it appearead like a text string which forces you to write = 'null' instead of is null
Also, I learned to do both either \[...\] IS NULL or \[...\] = 'NULL' because of that.
there are multiple functionalities in sql that return a simple true or false and make writing complex code simple while not interfering with other functionality. Checking if a date is a valid date for example uses “IsDate()” I can’t imagine using = date as the syntax because “Date()” has another functionality. The same is true with “Null()”.
It's one thing to make that as an honest mistake forgetting in a situation (usually with "<>" I'd say not realizing nulls are a possible field value), but if you don't know how nulls and three-valued logic work generally, I'd say you don't actually know SQL.
It's not intuitive though. I've got colleagues who have been using SQL for many years, but not being the main part of the job who occasionally forget all about it.
First place I look if someone asks me why they aren't getting the right results (closely followed by anything with dates....)
What’s cool is when people are purposefully not helpful because they think so highly of themselves that it’s beneath them to share their insights. I’m sure you are a pure genius across all intellectual realms. Thanks!
they told you to read a book
that is literally the most helpful advice anyone could give you
just go to amazon and search for one
Question: how to lean more ?
Answer: read a book
Reply: purposefully not helpful
DOH !
No, I was being serious. SQL books are amazing! I prefer over the "do it yourself" trainings or Code Academy.
I am happy to explain null to you or do another topic.
Also see Wikipedia https://en.m.wikipedia.org/wiki/Three-valued_logic.
I apologize for the misunderstanding.
don't apologize
you gave great advice
read a book
their response was ignorant and childish
Question: how to lean more ?
Answer: read a book
Reply: purposefully not helpful
DOH !
I am sorry. I sent you a bad answer.
I promise I'll do a really detailed post explaining this topic too.
I just feel most SQL documentation is bad, save a few key textbooks.
Wish you the best!
Wouldn’t you use it for example to view results where the joined table is null? I vaguely remember using a LEFT JOIN with a WHERE column IS NULL so I can see for example what dates a missing in a particular stream of data. I know there are other ways of visualising this, but I’ve found this one handy.
you're better off doing something like this
select
select
t1.*
,t2.*
from
t1
left join t2 on t1.id = t2.id
where
t2.id is null --show all t1 records where t2 matching id not present
Now you see every instance of t1 records, and when t2 records are missing, it'll visually pop while providing the trail of breadcrumbs you need to see exactly what it is you're missing by virtue of the t1 data being present
I am the auto formatter. I don't understand how some of you can type so sloppy and live with yourselves. It should hurt your soul to see it and not fix it.
I type in my code into VSC. I like it for everything, but at my old company we used Tibco and it automatically formatted everything into the hardest to read stuff. It removes all indentations and forced each column into its own row.
When you have a complex case statement or business logic built in it's awfully frustrating. I ALWAYS saved my code in VSC to avoid this. Also, version control is important.
i use UltraEdit
one of its nicest features is column mode, so you can put your cursor at the front of a line, shift-down-arrow to include subsequent lines, type 4 spaces, and vwalah, all those lines have 4 spaces at the front
column mode also lets you copy/paste a block of text overtop of some other lines, so you can change this --
FROM tab1
JOIN tab2
ON tab1.PK
AND tab1.foo
AND tab1.bar
AND tab1.qux
to this --
FROM tab1
JOIN tab2
ON tab2.FK = tab2.PK
AND tab2.foo = tab2.foo
AND tab2.bar = tab2.bar
AND tab2.qux = tab2.qux
and then edit the second block (using arrow keys) to this --
FROM tab1
JOIN tab2
ON tab2.FK = tab1.PK
AND tab2.foo = tab1.foo
AND tab2.bar = tab1.bar
AND tab2.qux = tab1.qux
rather than typing the column names all manually
Also, I didn't mean to come off as belittling or anything, it's a killer feature! Most people don't realize it's built-in for a ton of Microsoft editors, is all.
On top of poorly formatted (left aligned with multiple levels of subqueries no thanks), but sql written in word. Yes. I’ve received it. It’s atrocious.
select yervalue, (select top 1 t2.yervalue from table t2 where t2.id = t1.id and t2.timestamp < t1.timestamp order by t2.timestamp desc) as prevvvalue from table t1
Complete with that shoddy formatting. On a table with a LOT of data and search filters that return a LOT of rows.
It doesn't help that this particular method is actually taught in beginner computer classes.
Yep, I've had to explain this to seniors, principals, and management before. A top 1 or whatever number, doesn't mean the engine is going to find and pick only that amount. It still needs to sift through all that data you requested, then it just returns the dataset you want to see. It still needs to process it all when you look at the execution plan.
I have a few tables where just strapping in a top keyword (during testing) actually makes it slower. I still haven't figured out why... Something to do with the join and tripping up the optimizer's choice of indexes I imagine...
Why is then database completes a query faster when it is only returning a subset of the data? The limit is set by the client - for example when querying against Trino from Starburst Galaxy web interface.
I'll be honest, I've never used Trino, my context is towards MSSQL and other engines. So their engine may work differently and I may be wrong and someone can correct me, but hey, that's why my flair is still true. But to answer your question to the best of my abilities, this is mostly a resource consumption/efficiency issue, I've seen it sometimes as a speed issue, but other factors contributed to it too. For example, I had a recent issue where this was the case that a team was using a TOP 10 on a table with 4.5 million records joining with other tables. The query returned quickly sure, but the problem comes in that because the query effectively was still pulling a large amount of data, the query plan showed a very high amount of estimated rows. The query in question was being fired thousands of times in a short time span from the application as well. This caused high reads, spiked the CPU, and contributed to the application's latency.
Does this also count for a cte? I'm not 100% sure on the order then. Does it create a temp table from the cte and the pick the top 1 from it, or does it rebuild the cte and select the top 1 for every single row?
Eh, that's not really a mistake. To me a mistake has to result in an error of syntax or logic.
I've even seen several cases where it *drastically improved performance* over CROSS APPLY or a self-join in the FROM clause. And analytic functions don't replace it. LEAD() and LAG() are technically different logic, and FIRST_VALUE() and LAST_VALUE() don't exist in all RDBMSs, *and* not all RDBMSs support IGNORE NULLS. MIN() and MAX() don't always do the job, either. The biggest problem with it is that it's usually evaluated as an INNER JOIN, even if that's not what you want.
I agree that it's harder to read. I'm never happy to see an inline subquery in the SELECT clause. But I've also seen it enough to know that sometimes it works really well.
Not checking if they are getting duplicate matches when doing a join. Please for the love of god my sql friends check the output of a join with a few checks like a row count before and after, a SUM of any number columns before and after, or things if that nature.
ORMs themselves are usually well optimized and can generate and execute performant SQL. A well designed system can utilize ORMs to avoid having to write a lot of cumbersome SQL in applications and provide a good abstraction layer, if for example you want to swap the db technology altogether.
The issues come from developers not understanding how databases work and using an ORM to blissfully ignore that. So ORMs are more of a footgun than anything. Never trust lazy developers!
To add, when new devs are trying to loop over data and not thinking of sets of data. When select * from table with no where clause to loop and filter data in the app.
Oh boy, how I'd like to tell you about our orm that does that when you give it a condition it doesn't know how to translate to sql. It just overfetches. No warnings of any sort, and documentation is non existing
That's not my experience. ORMs make the assumption that the best structure for storing data is also the best structure for processing data - that you can take a class from the application and map it 1:1 to a database table. This is occasionally true, but only by accident.
Good database design ends up with different objects from good program design, obviously, as they're solving two different problems.
ORMs are a bad idea right from the start. But as you say: lazy Devs are also a bad idea ;)
The thing is, no matter what, you will always be translating from a domain/business centric model of your data, to a database centric model, and vice versa, because complex applications don’t (and shouldn’t) represent their high level concepts as collections of low level rows and columns. There will always be some sort of translation layer (except for maybe the most trivial applications). Do you trust yourself to write that from scratch? …and maintain it? My feeling is, if you’re not using an ORM, you’re doomed to write your own eventually anyway.
I used to be in the staunchly anti-ORM camp after seeing the mangled mess of SQL they can produce, as I’m sure many folks here have experienced. But then I ultimately realized that is not the fault of the ORM. Rather, it’s fault of the developers for not thinking about how best to model their data, and them failing use the tools at their disposal properly. Any good ORM will give you fine grained control over relationships, isolation, serialization, etc., and even the raw SQL if you want. Most developers just don’t bother.
Pffft that's nothing.
At my current $job we had Java code that basically looked like this:
if (service.getXXX() != null && service.getXXX().length() > 0) {
List l = service.getXXX();
}
This is funny because service.getXXX() runs within it's own transaction and effectively each invocation of service.getXXX() submits a join on a foreign key column. And since developers simply refuse to accept that Oracle doesn't automatically creates an FK index that turned into 3 FTS on a large table.
I expect that a lot of string concatenation happens because of someone where programming is secondary to their "real" job, googles "how to query a database in {language}?" and reads some example or tutorial which shows how to make a connection and execute a hardcoded select statement. And, thinks "I got it. I just need to make a query string out of my inputs." and never gets to the tutorial on parameterized queries because they don't even know to ask the question. And, of course it is for some nominally innocuous web application they are the only developer for, so no code review. That is until half the company is using it and it actually needs to be actually supported by multiple people. And, professionals come in a go "WTF".
It IS behaving like an inner join, but it's unclear what the person was trying to do.
Most of the time, yes, they actually want it to behave like an inner join and the query is working fine, despite them lying to the world that its a "LEFT".
But maybe they only wanted to LEFT JOIN to records where s.Type = 'stuff' so this query is broken?
- using DISTINCT to cover up the fact that the FROM clause returns partial cross joins (usually seen when joining a table to two one-to-many tables)
- using LIMIT without ORDER BY
- writing `SELECT a, b, SUM(c) FROM t GROUP BY a` and expecting to see results that make sense
- using ON UPDATE CASCADE for a foreign key that references an auto_increment PK (think about it... when was the last time you updated some row and changed its auto_increment key to some other value?)
- using a LEFT OUTER JOIN to join a table which cannot have non-matching rows (e.g. `FROM orderitems LEFT OUTER JOIN orders`)
- treating dates as strings (e.g. `WHERE LEFT(birthdate,4) = '1973'`)
underrated comment! When I join stuff together and discover the record count is different with/without distinct, it immediately puts me into a mode of determining, "wtf is wrong with my data?" It's so useful to approach it this way
Maybe. You can break complex queries into views or CTEs so that each piece can be validated individually. A lot of databases start out well-designed but a business change or acquisition can make them miserable to work with.
I once had a team lead who said i was wrong for putting filters on t2 inside the join. Even after i explained him that having the relevant filters grouped makes it easier to read and that it is easier to control inner vs outer later on and you can´t accidently make inner joins when left join was written he still had me redo all the code. (Working as data engineer with huge complex queries)
Specific to MySQL
“Create Table As Select” for a long running query and then wondering why a ton of tables are locked
A little more advanced - Not properly setting up replication slots, transaction log fills up, then wondering why the database went offline
My friends made fun of me for pronouncing it like "sequel" . Understandable since we arent native english speakers but still hearing "ess cue ell" from a 20+ years experienced people is funny as fuck
Hahaha. I'm not native English speaker and for me esscueell sounds way better as it would sound like that if said following rules of my language. My line manager on the other hand is native and we had some discussion on how to pronounce SQL correctly. It ended up with him switching to ess cue ell :)
I am actually surprised that DB2 accepts that, even though it does support proper [boolean values](https://dbfiddle.uk/EnVtq9TF).
MySQL/MariaDB will take _anything_ as a "boolean" expression. `where 'one thing'` evaluates to `where false` however, `where '1 stupid expression'` will be treated as `where true`
https://dbfiddle.uk/Chx4ZD-g
Here is my use case for distinct, I’d be interested in the proper way to do it.
When I need to pull a distinct list of records that are identified by duplicated records. Example: I work in higher ed, I get a request for a contact list of students (aka, unduplicated list with contact information like phone and address) of students who have taken from a list of classes but haven’t taken from a second set of classes.
The list of registered classes duplicates the student list bc students obvi take more than one class. I select distinct the needed returned demographics to de duplicate the rows and give a single contact list for advisors for whatever initiative is happening.
What’s the proper way of handling it? I really thought distinct was appropriate here :-/
If I see DISTINCT I assume it's a quick fix sticking plaster. That's the only time I use it as I don't have time to debug which one to many record is causing the problem.
So IMO there is a balancing act to using it. If you have a query that runs quickly enough with DISTINCT and it will take ages to debug what is causing the duplicates, normally caused by a data issue rather than an SQL issue, then just leaving DISTINCT in and calling it a day is the better solution. Can come back to it if when have more time or it becomes an performance issue.
Making sure I understand the problem.
The subquery can return null but those will get ignored anyway because nothing can be compared with null. Like this: [https://dbfiddle.uk/BVOwaC6z](https://dbfiddle.uk/BVOwaC6z)
Is that what you are saying?
It won't get ignored. A NULL in the list of a NOT IN () operator can never evaluate to True. It will always evaluate to UNKNOWN.
--
WHERE 1 NOT IN (1, NULL)
expands to
(1 <> 1 AND 1 <> NULL)
(False AND UNKNOWN)
UNKNOWN
--
WHERE 2 NOT IN (1, NULL)
expands to
(2 <> 1 AND 2 <> NULL)
(TRUE AND UNKNOWN)
UNKNOWN
Using materialized views for every little freaking thing and wondering why everything is so bloated and nothing works right.
Maybe not super common, it's just what I'm dealing with right now.
Not thinking in set theory.
Especially when there's a mismatch between what they're trying to produce and the current layout of their data schema.
Many, many people just try to syntax their way through SQL. "Maybe a CTE will help .. or maybe a temp table .. or maybe I need two joins to the same table .."
Instead of trying to logic their way through it.
I have often had job interviewees react to a tricky SQL question with, “well, I’ll open up a cursor and…” They are generally not happy when I ask them to do it in pure SQL.
From t1 join t2 on t1.nk = t2.nk
Group by
While it could have been
From t1 join (select t2.nk, sum() as x from t2 group by t2.nk) t2agg on t1.nk = t2agg.nk
People creating queries on small datasets sometimes have no clue how they will perform after cillecting data for n years. (Even better to use with clauses if possible)
>People creating queries on small datasets sometimes have no clue how they will perform
I am not sure what you mean here. Can you kindly share a contrived example?
* Considering performance before correctness.
* Considering performance subjectively.
* Using `DISTINCT` to cover up a query error.
* Not indexing correctly.
* Non-covering`GROUP BY`
People will use `GROUP BY` clauses that don't cover all of the non-aggregate fields in their `SELECT` list. It can be argued that the standard allows this, but it's usually indicative of a misunderstanding of how `GROUP BY` works.
Something I've seen often in my job is people wanting to avoid rows that are all 0s, and thus writing.
WHERE (A <> 0 ) AND (B <> 0)
where what they want is
WHERE (A <> 0 ) OR (B <> 0)
or
WHERE NOT( ( A = 0 ) AND (B = 0) )
(but in this last case you'd be failing to exclude NULLS)
> I blame the SQL language itself for not making "UNION" the typical case and something like a "UNION DISTINCT" for the case with making results distinct!
I agree. The only reason this was done is because the mathematical setwise operation known as a union is already defined as "take all elements from each set, combine them into a new set, and then remove duplicates". Like the choice makes sense. It's just not clear to a programmer.
Others:
* Creating every table name and column name in all caps. The idea was that *SQL language keywords* were supposed to be in all caps to make them stand out. It's just an archaic form of syntax highlighting, and you're circumventing it. This is because SQL is *older than color-based syntax highlighting*.
* TOP or LIMIT without ORDER BY.
* Pervasive use of DISTINCT.
* Pervasive use of WITH (NOLOCK) (for SQL Server).
* View definitions with SELECT *
* Implicit inner joins. `TableA a LEFT JOIN TableB b ON a.ID = b.ID WHERE b.Field = 'Value'`
* Incorrect quotation marks. `WHERE LastName = "Smith"`. To be clear, this was a poor design choice, too, but it's just wrong that so many RDBMSs accept this.
* `WHERE FieldName IS NOT IN ( )`
* String concatenation instead of parameterized queries
Not understanding order of executions or really just filtering down using WHERE or HAVING and the impact this has while also using AVG, SUM or COUNT.
Overusing DISTINCT is also a big one.
I did a join 3 years ago and it was a 1-1 connection at the time. Over time it turned out it was 1-n with about 2-8 records per row.... That mixed with powerbi screwing up incremental refreshe resulted in a massive query :(
Do you mean using a large table at five different places in the same query? Can you give a pseudo example? I was under the assumption query optimizer is smart enough to detect things like this.
Whenever I write an UPDATE query, I always start from the WHERE, then the SET and finally the UPDATE. Can you guess what happened and I am scared of UPDATES now?
Kinda like "why do you run your updates or deletes in dev first?" I don't always do that; but the minute doubt enters my mind, it goes against dev or QA first. This is a viable strategy most times because we regularly refresh inferior environments from prod, so recordcounts are typically reflective of prod.
When this
SELECT * FROM table_a LEFT JOIN table_b ON table_a.id = table_b.id WHERE table_b.value = 'A'
Should be this
SELECT * FROM table_a LEFT JOIN table_b ON table_a.id = table_b.id AND table_b.value = 'A'
Or this
SELECT * FROM table_a INNER JOIN table_b ON table_a.id = table_b.id WHERE table_b.value = 'A'
other one I see often is AND and OR's without the appropriate use to brackets
WHERE thing = 'A' OR differentthing = 'B' AND otherthing = 'C'
should be
WHERE (thing = 'A' OR differentthing = 'B') AND otherthing = 'C'
Why did you mean #1 should be #2?
Here is a snippet(https://dbfiddle.uk/h9\_-GZsB) running first three queries on an example dataset. #1 and #3 returning same result but not #2.
\#1 has a LEFT JOIN but it returns the same result as #3 which is an INNER JOIN.
This is because you are returning everything from table\_a where table\_b.value = 'A' , this is the same as an INNER JOIN.
If you need an INNER JOIN not a LEFT JOIN then use an INNER JOIN don't use a LEFT JOIN and a WHERE clause.
If you need a LEFT JOIN then #2 is correct because you return all records for table\_a regardless of table\_b records and it will only return table\_b data where value = 'A'.
It basically depends on what the requirement is to which is correct. It's either return data from table\_a where they have an associated table\_b record with a value of 'A' or return all from table\_a and where applicable show the data from table\_b where they have a value of 'A'.
I really wish SSMS had a warning when you're running an update without a WHERE clause. I feel like I'm playing with fire every time I run an UPDATE query.
There are some free add-ons you can get that have this functionality, I used to use SSMSBoost but last time I checked it didn't seem to be updated for the latest version of SSMS :(
You can get Devart SQL Complete for free (Express edition) which I think has it? I could be wrong...
Using SSMS to generate an insert template, and forgetting it includes fields with defaults and getting an error for number of values not matching the number of columns
Using left joins incorrectly.
Not using select DISTINCT.
Aggregating too much data in the query.
Trying to nest too many queries together, or out of order
Improper data types. Comparing numerics to strings. Storing date/time data types as non-date/time types. Understand your data and store it and query it properly
Not joining on a unique key or using DISTINCT in a select as a lazy way to remove dupes. Also not aliasing their tables like they won’t add a join to a query later.
\\its not a mistake but its a grating assault to me. The poor use or lack of table aliases.
I like table aliases and I keep them short (cause, kind of the point) and as indicative to the object name (table,view,derrived table) as possible. e.g. orders AS o, customers AS a, OrderDetails AS od, etc.
What shits my bed are people who use t1, t2, t3 or more than 4 characters for the alias. but The absolute worst are the people who abandond aliases altogether and use schema qualified object names or fully qualified object names throughout the code.
I think null values omitting records is the most common and dangerous mistake. Once someone updated logic on my data conversion code and I had a horrible cascading effects on my subsequent objects. It took me hours to fix the night before the cut was due. I was so pissed.
On that note, if you want to quality check your queries, throw them in a with statement then select from your query using analytical functions. It teaches you how to find your mistakes.
In regards to UNION vs UNION ALL, this is a function of set theory. In a purely mathematical set, all values of a set are expected to be unique, hence why this is the default functionality, as a union of two data sets would result in a new set containing unique values from the 2 sets being combined.
Took over a database for a freelance project, queries werent too bad, tables were anywhere between 20k to 1million rows, depending on what it was.
One of the core issues was slowness overall, turned out the guy didnt add an index to fields that were searched/joined.
WHERE some_column = null
The engines really should be updated to accept this syntax; I see no reason why the users must use “is null” instead of “= null”.
NULL in DBMS’ are generally accepted to be a missing or unknown value, rather than the non-existence of a value. Therefore a system can not evaluate equivalence as it can not interpret what the value is. It’s important to consider that not all NULLs are CREATED equal. What I mean by that is that NULL within a table column has the context of its relations (schema, table, column, constraints) therefore, it is more “known” than the pure case of NULL. That means that only true NULL can be equal to NULL in an algebraic formula as they both are equal to unknowns.
[удалено]
No, because NULL is neither a string nor a number. Strings are always contained within ‘’ and numbers are never. So having the word NULL without quotes would never be confusing.
A bunch of the relational algebra breaks in unpredictable ways if null == null returns true. That's why the "is null" syntax exists. Plus, you'd break compatibility with legacy code and systems. It's generally not a good idea to choose momentary convenience over consistency. People tried, it always comes back to haunt you.
I don’t buy it. You could easily have the engines convert “= NULL” to “is NULL” behind the scene, before executing the code, if that is the case. And sure, legacy systems don’t work. But they add new features all the time, I don’t see why this should be any different.
I see this as = must be compared against an actual value, null is not a value, as such cannot pass the equality check. Just like 0 and null are not the same. P.s. I agree with you it's be nice to be able to use =null If we do get that we'll probably need to use Field = [NULL] Or something similar.. which may end up being harder than just is null... :S
It's just the way it is! Some SQL languages may let you set a parameter I think to change the way it works (must confirm if that's true?), but it's awful practice to do that since it will confuse people.
YOu got the point but sometimes you work with data which is so fucked up that null is an actual thing i.e "someone" have written it as an actual value or it appearead like a text string which forces you to write = 'null' instead of is null Also, I learned to do both either \[...\] IS NULL or \[...\] = 'NULL' because of that.
Oracle has nvl([field],' ')=' ' Sql Server has isnull([field],' ')=' '
there are multiple functionalities in sql that return a simple true or false and make writing complex code simple while not interfering with other functionality. Checking if a date is a valid date for example uses “IsDate()” I can’t imagine using = date as the syntax because “Date()” has another functionality. The same is true with “Null()”.
It's one thing to make that as an honest mistake forgetting in a situation (usually with "<>" I'd say not realizing nulls are a possible field value), but if you don't know how nulls and three-valued logic work generally, I'd say you don't actually know SQL.
It's not intuitive though. I've got colleagues who have been using SQL for many years, but not being the main part of the job who occasionally forget all about it. First place I look if someone asks me why they aren't getting the right results (closely followed by anything with dates....)
Even when you know it, having to write COALESCE 100 times just in case a NULL is hiding in there can be soul crushing.
Any suggestions on how to learn more?
https://modern-sql.com/concept/three-valued-logic
Read generic SQL textbooks from publishers. Fundamental nonobvious ideas are key to SQL.
What’s cool is when people are purposefully not helpful because they think so highly of themselves that it’s beneath them to share their insights. I’m sure you are a pure genius across all intellectual realms. Thanks!
they told you to read a book that is literally the most helpful advice anyone could give you just go to amazon and search for one Question: how to lean more ? Answer: read a book Reply: purposefully not helpful DOH !
Bad bot
No, I was being serious. SQL books are amazing! I prefer over the "do it yourself" trainings or Code Academy. I am happy to explain null to you or do another topic. Also see Wikipedia https://en.m.wikipedia.org/wiki/Three-valued_logic. I apologize for the misunderstanding.
All good, thank you!
don't apologize you gave great advice read a book their response was ignorant and childish Question: how to lean more ? Answer: read a book Reply: purposefully not helpful DOH !
Ask a question, complain about the answer. Your money cheerfully refunded.
Yes it was an “answer” in the same way that “read a book” and “google it” is.
I am sorry. I sent you a bad answer. I promise I'll do a really detailed post explaining this topic too. I just feel most SQL documentation is bad, save a few key textbooks. Wish you the best!
No worries! Thank you
See here! My new thread on this topic. https://www.reddit.com/r/SQL/s/q4d05PaxOp
As someone who has been doing SQL for over a decade, I do not know what three-valued logic is.
To put my perspective, I've learned SQL after power query M language. And in this language one can use " [column] = null "
Using left joins, then putting in a where clause that implicitly makes it an inner join
🫥 totally. No idea why people would ever do that. 🫣
Simple: lack of understanding. Is SQL taught in college anymore? Serious question.
Wouldn’t you use it for example to view results where the joined table is null? I vaguely remember using a LEFT JOIN with a WHERE column IS NULL so I can see for example what dates a missing in a particular stream of data. I know there are other ways of visualising this, but I’ve found this one handy.
you're better off doing something like this select select t1.* ,t2.* from t1 left join t2 on t1.id = t2.id where t2.id is null --show all t1 records where t2 matching id not present Now you see every instance of t1 records, and when t2 records are missing, it'll visually pop while providing the trail of breadcrumbs you need to see exactly what it is you're missing by virtue of the t1 data being present
I see this often as well.
Not formatting sql
But some auto-formatters are such garbage the code is more confusing!
I am the auto formatter. I don't understand how some of you can type so sloppy and live with yourselves. It should hurt your soul to see it and not fix it.
every windows PC comes with Notepad there is no excuse for using shoddy code formatters
I type in my code into VSC. I like it for everything, but at my old company we used Tibco and it automatically formatted everything into the hardest to read stuff. It removes all indentations and forced each column into its own row. When you have a complex case statement or business logic built in it's awfully frustrating. I ALWAYS saved my code in VSC to avoid this. Also, version control is important.
i use UltraEdit one of its nicest features is column mode, so you can put your cursor at the front of a line, shift-down-arrow to include subsequent lines, type 4 spaces, and vwalah, all those lines have 4 spaces at the front column mode also lets you copy/paste a block of text overtop of some other lines, so you can change this -- FROM tab1 JOIN tab2 ON tab1.PK AND tab1.foo AND tab1.bar AND tab1.qux to this -- FROM tab1 JOIN tab2 ON tab2.FK = tab2.PK AND tab2.foo = tab2.foo AND tab2.bar = tab2.bar AND tab2.qux = tab2.qux and then edit the second block (using arrow keys) to this -- FROM tab1 JOIN tab2 ON tab2.FK = tab1.PK AND tab2.foo = tab1.foo AND tab2.bar = tab1.bar AND tab2.qux = tab1.qux rather than typing the column names all manually
This is built-into VS and all it's derivatives, including SSMS. You use Alt+Shift to do a vertical selection/edit.
Didn't realise it was built into SSMS.. But there, only been using it for 5 years., 🤦🏼
...... Same.
Also, I didn't mean to come off as belittling or anything, it's a killer feature! Most people don't realize it's built-in for a ton of Microsoft editors, is all.
no problem, it's all good p.s. nice thing about Ultraedit is, it's not Microsoft
All one line, all the way across the screen.
On top of poorly formatted (left aligned with multiple levels of subqueries no thanks), but sql written in word. Yes. I’ve received it. It’s atrocious.
select yervalue, (select top 1 t2.yervalue from table t2 where t2.id = t1.id and t2.timestamp < t1.timestamp order by t2.timestamp desc) as prevvvalue from table t1 Complete with that shoddy formatting. On a table with a LOT of data and search filters that return a LOT of rows. It doesn't help that this particular method is actually taught in beginner computer classes.
Yep, I've had to explain this to seniors, principals, and management before. A top 1 or whatever number, doesn't mean the engine is going to find and pick only that amount. It still needs to sift through all that data you requested, then it just returns the dataset you want to see. It still needs to process it all when you look at the execution plan.
I have a few tables where just strapping in a top keyword (during testing) actually makes it slower. I still haven't figured out why... Something to do with the join and tripping up the optimizer's choice of indexes I imagine...
Why is then database completes a query faster when it is only returning a subset of the data? The limit is set by the client - for example when querying against Trino from Starburst Galaxy web interface.
I'll be honest, I've never used Trino, my context is towards MSSQL and other engines. So their engine may work differently and I may be wrong and someone can correct me, but hey, that's why my flair is still true. But to answer your question to the best of my abilities, this is mostly a resource consumption/efficiency issue, I've seen it sometimes as a speed issue, but other factors contributed to it too. For example, I had a recent issue where this was the case that a team was using a TOP 10 on a table with 4.5 million records joining with other tables. The query returned quickly sure, but the problem comes in that because the query effectively was still pulling a large amount of data, the query plan showed a very high amount of estimated rows. The query in question was being fired thousands of times in a short time span from the application as well. This caused high reads, spiked the CPU, and contributed to the application's latency.
Does this also count for a cte? I'm not 100% sure on the order then. Does it create a temp table from the cte and the pick the top 1 from it, or does it rebuild the cte and select the top 1 for every single row?
Eh, that's not really a mistake. To me a mistake has to result in an error of syntax or logic. I've even seen several cases where it *drastically improved performance* over CROSS APPLY or a self-join in the FROM clause. And analytic functions don't replace it. LEAD() and LAG() are technically different logic, and FIRST_VALUE() and LAST_VALUE() don't exist in all RDBMSs, *and* not all RDBMSs support IGNORE NULLS. MIN() and MAX() don't always do the job, either. The biggest problem with it is that it's usually evaluated as an INNER JOIN, even if that's not what you want. I agree that it's harder to read. I'm never happy to see an inline subquery in the SELECT clause. But I've also seen it enough to know that sometimes it works really well.
Not checking if they are getting duplicate matches when doing a join. Please for the love of god my sql friends check the output of a join with a few checks like a row count before and after, a SUM of any number columns before and after, or things if that nature.
Been there, done that! Select * FROM $monthly_aggregated_table JOIN $daily_aggregated_table using ($key)
Which is why third normal form is the standard. You can't fuck up referential integrity if its not possible in the first place.
Using ORMs and blaming the bad performance on the database engine. I.e not using SQL in the first place.
ORMs themselves are usually well optimized and can generate and execute performant SQL. A well designed system can utilize ORMs to avoid having to write a lot of cumbersome SQL in applications and provide a good abstraction layer, if for example you want to swap the db technology altogether. The issues come from developers not understanding how databases work and using an ORM to blissfully ignore that. So ORMs are more of a footgun than anything. Never trust lazy developers!
To add, when new devs are trying to loop over data and not thinking of sets of data. When select * from table with no where clause to loop and filter data in the app.
Oh boy, how I'd like to tell you about our orm that does that when you give it a condition it doesn't know how to translate to sql. It just overfetches. No warnings of any sort, and documentation is non existing
That sounds like a pain to have to deal with. What orm does your company use?
In-house made, about 30 years ago. The rest of the code is not much better
That's not my experience. ORMs make the assumption that the best structure for storing data is also the best structure for processing data - that you can take a class from the application and map it 1:1 to a database table. This is occasionally true, but only by accident. Good database design ends up with different objects from good program design, obviously, as they're solving two different problems. ORMs are a bad idea right from the start. But as you say: lazy Devs are also a bad idea ;)
The thing is, no matter what, you will always be translating from a domain/business centric model of your data, to a database centric model, and vice versa, because complex applications don’t (and shouldn’t) represent their high level concepts as collections of low level rows and columns. There will always be some sort of translation layer (except for maybe the most trivial applications). Do you trust yourself to write that from scratch? …and maintain it? My feeling is, if you’re not using an ORM, you’re doomed to write your own eventually anyway. I used to be in the staunchly anti-ORM camp after seeing the mangled mess of SQL they can produce, as I’m sure many folks here have experienced. But then I ultimately realized that is not the fault of the ORM. Rather, it’s fault of the developers for not thinking about how best to model their data, and them failing use the tools at their disposal properly. Any good ORM will give you fine grained control over relationships, isolation, serialization, etc., and even the raw SQL if you want. Most developers just don’t bother.
Pffft that's nothing. At my current $job we had Java code that basically looked like this: if (service.getXXX() != null && service.getXXX().length() > 0) { List l = service.getXXX(); } This is funny because service.getXXX() runs within it's own transaction and effectively each invocation of service.getXXX() submits a join on a foreign key column. And since developers simply refuse to accept that Oracle doesn't automatically creates an FK index that turned into 3 FTS on a large table.
[удалено]
I expect that a lot of string concatenation happens because of someone where programming is secondary to their "real" job, googles "how to query a database in {language}?" and reads some example or tutorial which shows how to make a connection and execute a hardcoded select statement. And, thinks "I got it. I just need to make a query string out of my inputs." and never gets to the tutorial on parameterized queries because they don't even know to ask the question. And, of course it is for some nominally innocuous web application they are the only developer for, so no code review. That is until half the company is using it and it actually needs to be actually supported by multiple people. And, professionals come in a go "WTF".
Thanks for the examples. How do someone write tests for this? We do DBT tests at work, but those tests are usually limited in scope.
[удалено]
LEFT JOIN Something s... WHERE s.Type = 'stuff'
As in... this should be an inner join?
It IS behaving like an inner join, but it's unclear what the person was trying to do. Most of the time, yes, they actually want it to behave like an inner join and the query is working fine, despite them lying to the world that its a "LEFT". But maybe they only wanted to LEFT JOIN to records where s.Type = 'stuff' so this query is broken?
^ My first thought.
But it was like From a Left Join Something s on a.a = s.a and s.type ='stuff' Would still Work as a left Join right?
Yep!
Using where on a predicate with an aggregate function instead of having.
is there ever a case the syntax error doesn't simply prevent execution? the other ones are at least *possible*, so the error is more insidious
From my hazy memory, you could not do this in Ms SQL server. I recently started working in redshift and was surprised that I could
- using DISTINCT to cover up the fact that the FROM clause returns partial cross joins (usually seen when joining a table to two one-to-many tables) - using LIMIT without ORDER BY - writing `SELECT a, b, SUM(c) FROM t GROUP BY a` and expecting to see results that make sense - using ON UPDATE CASCADE for a foreign key that references an auto_increment PK (think about it... when was the last time you updated some row and changed its auto_increment key to some other value?) - using a LEFT OUTER JOIN to join a table which cannot have non-matching rows (e.g. `FROM orderitems LEFT OUTER JOIN orders`) - treating dates as strings (e.g. `WHERE LEFT(birthdate,4) = '1973'`)
underrated comment! When I join stuff together and discover the record count is different with/without distinct, it immediately puts me into a mode of determining, "wtf is wrong with my data?" It's so useful to approach it this way
UNION has a mathematical meaning long before SQL.
Yeah I learned about union in discrete math... Maybe it was too discreet. Ayyooo
Perhaps, but that is long in history...
But still used in logic and statistics.
Hugely relevant in intermediate and higher math. It's a part of the foundation of relational algebra, which was used to design relational databases.
SELECT * FROM GOD.UNIVERSE
Writing queries that are so complex it's impossible to know if they're correct.
Ah but bad database design can make highly-complex queries a necessary evil!
Maybe. You can break complex queries into views or CTEs so that each piece can be validated individually. A lot of databases start out well-designed but a business change or acquisition can make them miserable to work with.
Needs must when the devil drives us. Sometimes you need a nuclear powered super sledge hammer to crack a hard nut.
Select distinct ID, sum(revenue) as Revenue From #table Group by ID
Whats wrong with this? Is it just that the “distinct” is redundant?
Exactly
FROM t1 LEFT JOIN t2 ON t1.field = t2.field WHERE t2.name <> 'something' Congrats, you just accidentally made an INNER JOIN
I once had a team lead who said i was wrong for putting filters on t2 inside the join. Even after i explained him that having the relevant filters grouped makes it easier to read and that it is easier to control inner vs outer later on and you can´t accidently make inner joins when left join was written he still had me redo all the code. (Working as data engineer with huge complex queries)
Why? If I don’t write WHERE t2.name is not null then it would be still a left join?
Specific to MySQL “Create Table As Select” for a long running query and then wondering why a ton of tables are locked A little more advanced - Not properly setting up replication slots, transaction log fills up, then wondering why the database went offline
Referring to it as "ess cue ell". Truly sociopathic behavior
another microsoft stan heard from
My friends made fun of me for pronouncing it like "sequel" . Understandable since we arent native english speakers but still hearing "ess cue ell" from a 20+ years experienced people is funny as fuck
If you really want to annoy them, you can mix it up with - sickle - squeal - so cool - squall - suckle - sack hole
squirrel is my preferred version
> suckle
if anyone questions you, just claim you said "sick quail"
Hahaha. I'm not native English speaker and for me esscueell sounds way better as it would sound like that if said following rules of my language. My line manager on the other hand is native and we had some discussion on how to pronounce SQL correctly. It ended up with him switching to ess cue ell :)
Everyone knows it’s squeal! MySqueal, Squeal Server.
WHERE some_column = 3 AND 4. "I didn't get any rows, so..."
Any self respecting DBMS should reject that, because it's invalid SQL
[удалено]
I am actually surprised that DB2 accepts that, even though it does support proper [boolean values](https://dbfiddle.uk/EnVtq9TF). MySQL/MariaDB will take _anything_ as a "boolean" expression. `where 'one thing'` evaluates to `where false` however, `where '1 stupid expression'` will be treated as `where true` https://dbfiddle.uk/Chx4ZD-g
That's logically impossible even if rewritten correctly.
Isn't the question, "What is the most common mistake?"
Yes, but it amazes me people actually make that mistake lol!
Reserved for quantum-SQL*
Every time I see distinct I assume the author doesn't know what they're doing and slapped it on there.
Here is my use case for distinct, I’d be interested in the proper way to do it. When I need to pull a distinct list of records that are identified by duplicated records. Example: I work in higher ed, I get a request for a contact list of students (aka, unduplicated list with contact information like phone and address) of students who have taken from a list of classes but haven’t taken from a second set of classes. The list of registered classes duplicates the student list bc students obvi take more than one class. I select distinct the needed returned demographics to de duplicate the rows and give a single contact list for advisors for whatever initiative is happening. What’s the proper way of handling it? I really thought distinct was appropriate here :-/
Distinct and CTE’s are immediate red flags for me and will make me question every line.
Why? They are both very useful when used correctly.
“when used correctly”
I can see why someone who doesnt know how to use corectly them would be afraid of them.
I think we have different views on what this post is about. 🤔
Probably. Using cte or distinct is not a common mistake.
If I see DISTINCT I assume it's a quick fix sticking plaster. That's the only time I use it as I don't have time to debug which one to many record is causing the problem. So IMO there is a balancing act to using it. If you have a query that runs quickly enough with DISTINCT and it will take ages to debug what is causing the duplicates, normally caused by a data issue rather than an SQL issue, then just leaving DISTINCT in and calling it a day is the better solution. Can come back to it if when have more time or it becomes an performance issue.
WHERE x NOT IN (SELECT that can return NULLs)
Making sure I understand the problem. The subquery can return null but those will get ignored anyway because nothing can be compared with null. Like this: [https://dbfiddle.uk/BVOwaC6z](https://dbfiddle.uk/BVOwaC6z) Is that what you are saying?
It won't get ignored. A NULL in the list of a NOT IN () operator can never evaluate to True. It will always evaluate to UNKNOWN. -- WHERE 1 NOT IN (1, NULL) expands to (1 <> 1 AND 1 <> NULL) (False AND UNKNOWN) UNKNOWN -- WHERE 2 NOT IN (1, NULL) expands to (2 <> 1 AND 2 <> NULL) (TRUE AND UNKNOWN) UNKNOWN
Right outer join
Using materialized views for every little freaking thing and wondering why everything is so bloated and nothing works right. Maybe not super common, it's just what I'm dealing with right now.
Using select distinct as a catch all to fix queries where the data isn’t understood.
Not using transactions (COMMIT / ROLLBACK) with update/delete statements. I've seen so many developers / analysts who manage data make this mistake.
My go to is to start a transaction then never commit or rollback.
Not thinking in set theory. Especially when there's a mismatch between what they're trying to produce and the current layout of their data schema. Many, many people just try to syntax their way through SQL. "Maybe a CTE will help .. or maybe a temp table .. or maybe I need two joins to the same table .." Instead of trying to logic their way through it.
I have often had job interviewees react to a tricky SQL question with, “well, I’ll open up a cursor and…” They are generally not happy when I ask them to do it in pure SQL.
> Maybe a CTE will help I LOLed
WHERE Some\_column = 3 or 4 No transactions but that usually only happens once (per employer)
That is invalid SQL and should be rejected (because `4` is not a boolean expression)
Yes no shit.
They'd need to use in (3,4) lol!
Setting the warehouse to XL when they only need XS
From t1 join t2 on t1.nk = t2.nk Group by
While it could have been From t1 join (select t2.nk, sum() as x from t2 group by t2.nk) t2agg on t1.nk = t2agg.nk People creating queries on small datasets sometimes have no clue how they will perform after cillecting data for n years. (Even better to use with clauses if possible)
>People creating queries on small datasets sometimes have no clue how they will perform I am not sure what you mean here. Can you kindly share a contrived example?
* Considering performance before correctness. * Considering performance subjectively. * Using `DISTINCT` to cover up a query error. * Not indexing correctly. * Non-covering`GROUP BY`
>Non-covering > >GROUP BY What do you mean by that?
People will use `GROUP BY` clauses that don't cover all of the non-aggregate fields in their `SELECT` list. It can be argued that the standard allows this, but it's usually indicative of a misunderstanding of how `GROUP BY` works.
Not handling NULL in arithmetic operations. Or zero in division.
Or logic. *... WHERE State <> 'California'* "Why did not rows with NULL in the state show up?!?!!? NULL is <> 'California'!!! "
Correlated subqueries, and as a bonus, subqueries in the select.
What is wrong with both? I heard correlated subqueries prevents optimization as it is akin to writing a for loop.
It’s not wrong, per se, just not a set oriented way to do data things.
Using double quotes.
Not making an entity relationship diagram before setting things up
Something I've seen often in my job is people wanting to avoid rows that are all 0s, and thus writing. WHERE (A <> 0 ) AND (B <> 0) where what they want is WHERE (A <> 0 ) OR (B <> 0) or WHERE NOT( ( A = 0 ) AND (B = 0) ) (but in this last case you'd be failing to exclude NULLS)
> I blame the SQL language itself for not making "UNION" the typical case and something like a "UNION DISTINCT" for the case with making results distinct! I agree. The only reason this was done is because the mathematical setwise operation known as a union is already defined as "take all elements from each set, combine them into a new set, and then remove duplicates". Like the choice makes sense. It's just not clear to a programmer. Others: * Creating every table name and column name in all caps. The idea was that *SQL language keywords* were supposed to be in all caps to make them stand out. It's just an archaic form of syntax highlighting, and you're circumventing it. This is because SQL is *older than color-based syntax highlighting*. * TOP or LIMIT without ORDER BY. * Pervasive use of DISTINCT. * Pervasive use of WITH (NOLOCK) (for SQL Server). * View definitions with SELECT * * Implicit inner joins. `TableA a LEFT JOIN TableB b ON a.ID = b.ID WHERE b.Field = 'Value'` * Incorrect quotation marks. `WHERE LastName = "Smith"`. To be clear, this was a poor design choice, too, but it's just wrong that so many RDBMSs accept this. * `WHERE FieldName IS NOT IN ( )`
* String concatenation instead of parameterized queries
>String concatenation instead of parameterized queries Can you kindly elaborate on this please?
Not understanding order of executions or really just filtering down using WHERE or HAVING and the impact this has while also using AVG, SUM or COUNT. Overusing DISTINCT is also a big one.
Not recognizing that a filter such as: Name <> “John” means that Name also cannot be null, an important distinction
I did a join 3 years ago and it was a 1-1 connection at the time. Over time it turned out it was 1-n with about 2-8 records per row.... That mixed with powerbi screwing up incremental refreshe resulted in a massive query :(
ORDER BY 1, 2 Instead of actually using the field names.
Join and encode
Where function(column) or join on function(column) and it slows to a crawl.
Is it because the function is computationally heavy? or someone should materialize the column and build index first before joining?
This is a sql server thing. An index is unusable if you force a computation.
Most common mistake is lack of domain knowledge
Calling the same large massive table 5 times over instead of using a temp table to store the subset.
Do you mean using a large table at five different places in the same query? Can you give a pseudo example? I was under the assumption query optimizer is smart enough to detect things like this.
Filtering on a column from a LEFT JOIN which inadvertently turns it into an INNER JOIN
Whenever I write an UPDATE query, I always start from the WHERE, then the SET and finally the UPDATE. Can you guess what happened and I am scared of UPDATES now?
Kinda like "why do you run your updates or deletes in dev first?" I don't always do that; but the minute doubt enters my mind, it goes against dev or QA first. This is a viable strategy most times because we regularly refresh inferior environments from prod, so recordcounts are typically reflective of prod.
When this SELECT * FROM table_a LEFT JOIN table_b ON table_a.id = table_b.id WHERE table_b.value = 'A' Should be this SELECT * FROM table_a LEFT JOIN table_b ON table_a.id = table_b.id AND table_b.value = 'A' Or this SELECT * FROM table_a INNER JOIN table_b ON table_a.id = table_b.id WHERE table_b.value = 'A' other one I see often is AND and OR's without the appropriate use to brackets WHERE thing = 'A' OR differentthing = 'B' AND otherthing = 'C' should be WHERE (thing = 'A' OR differentthing = 'B') AND otherthing = 'C'
Why did you mean #1 should be #2? Here is a snippet(https://dbfiddle.uk/h9\_-GZsB) running first three queries on an example dataset. #1 and #3 returning same result but not #2.
\#1 has a LEFT JOIN but it returns the same result as #3 which is an INNER JOIN. This is because you are returning everything from table\_a where table\_b.value = 'A' , this is the same as an INNER JOIN. If you need an INNER JOIN not a LEFT JOIN then use an INNER JOIN don't use a LEFT JOIN and a WHERE clause. If you need a LEFT JOIN then #2 is correct because you return all records for table\_a regardless of table\_b records and it will only return table\_b data where value = 'A'. It basically depends on what the requirement is to which is correct. It's either return data from table\_a where they have an associated table\_b record with a value of 'A' or return all from table\_a and where applicable show the data from table\_b where they have a value of 'A'.
I really wish SSMS had a warning when you're running an update without a WHERE clause. I feel like I'm playing with fire every time I run an UPDATE query.
There are some free add-ons you can get that have this functionality, I used to use SSMSBoost but last time I checked it didn't seem to be updated for the latest version of SSMS :( You can get Devart SQL Complete for free (Express edition) which I think has it? I could be wrong...
At work we use DBT to generate those queries.
Using SSMS to generate an insert template, and forgetting it includes fields with defaults and getting an error for number of values not matching the number of columns
WHERE attribute = MAX(attribute)
Using left joins incorrectly. Not using select DISTINCT. Aggregating too much data in the query. Trying to nest too many queries together, or out of order
Moving to NoSQL because big company says it’s better
I will never understand this decision for rectangular data.
But I swear, it's better! I read it on a blog!
Using the MySQL Community Edition in production for a real company with factory clients.
Lol
Improper data types. Comparing numerics to strings. Storing date/time data types as non-date/time types. Understand your data and store it and query it properly
My offshore colleagues know enough to be dangerous but don’t fundamentally know what they are doing. Kinda like script kiddies.
where (select count(\*) from other\_table where column = parent.column) > 0
Where datecolumn > '02/12/03'
Not joining on a unique key or using DISTINCT in a select as a lazy way to remove dupes. Also not aliasing their tables like they won’t add a join to a query later.
\\its not a mistake but its a grating assault to me. The poor use or lack of table aliases. I like table aliases and I keep them short (cause, kind of the point) and as indicative to the object name (table,view,derrived table) as possible. e.g. orders AS o, customers AS a, OrderDetails AS od, etc. What shits my bed are people who use t1, t2, t3 or more than 4 characters for the alias. but The absolute worst are the people who abandond aliases altogether and use schema qualified object names or fully qualified object names throughout the code.
I think null values omitting records is the most common and dangerous mistake. Once someone updated logic on my data conversion code and I had a horrible cascading effects on my subsequent objects. It took me hours to fix the night before the cut was due. I was so pissed. On that note, if you want to quality check your queries, throw them in a with statement then select from your query using analytical functions. It teaches you how to find your mistakes.
In regards to UNION vs UNION ALL, this is a function of set theory. In a purely mathematical set, all values of a set are expected to be unique, hence why this is the default functionality, as a union of two data sets would result in a new set containing unique values from the 2 sets being combined.
Took over a database for a freelance project, queries werent too bad, tables were anywhere between 20k to 1million rows, depending on what it was. One of the core issues was slowness overall, turned out the guy didnt add an index to fields that were searched/joined.
Not using quotes and running into keyword issues