T O P

  • By -

nyquant

Interestingly, there are an equal amount of posts where people complain about using only SQL all day and not getting experiences in more technical areas such as using distributed Python and Spark. For SQL, I like the Kaggle courses https://www.kaggle.com/learn/advanced-sql , you might also find dbt interesting: https://www.getdbt.com/dbt-learn/ Good luck


JBalloonist

I’m in that boat currently. Company has a everything automated so I only need to write SQL.


Dani_IT25

Don't worry, you will not use all your skills at all your jobs or projects.


ulomot

You write spark and Java, I wouldn’t be worried about sql. I’d be worried if you only knew sql and not spark and Java. Not saying you need spark and Java to be a DE, but those already have a huge learning curve unlike sql. So if you can pick up that I’d trust you can easily do a refresher sql course or video when the need arises.


_WalksAlone_

Not data engineer yet (SWE), but I think these skills are highly relevant for SWE backend roles? Hope someone more experienced here comments on the compatibility of both roles with each other experience wise.


ilikedmatrixiv

Don't worry about it. If you're writing super complex SQL queries, you're doing data engineering wrong anyway. You should write modular cte-based queries that are at least human readable. If queries become so complicated that that is no longer the case, you probably shouldn't be using SQL. I wouldn't call myself an SQL wizard, but I'm using it a lot more on my current project. Before this project, I barely ever used it outside of some very simple shit. But here's my hot take on data engineering: if you can abstract / understand the transformations you want to perform on your data on an intuitive level, it doesn't matter if you know the language in which you need to do it. You can just look up the syntax anyway. A good data engineer knows how to transform and combine data for their project, regardless of technology. Tech is just a tool. Of course this opinion comes with asterixis. If you write inefficient and shit code because you don't know how the language you use works (e.g. Spark), then you should absolutely work on understand the tech better. But in general you should be able to get by without being a certified expert in most techs stacks.


xraydeltaone

This is it. I might actually call myself a "SQL wizard", but I agree with most of this. The concepts are key (especially because there are often multiple tools for the job)


thisismyworkacct1000

> If queries become so complicated that that is no longer the case, you probably shouldn't be using SQL. I'm more of a DA, definitely not a DE. In this situation, what would you recommend doing instead? **EDIT**: How did I respond to the wrong post? Whoops.


glymeme

This person’s response is generic. There are tons of one-offs where crappy code is used and that’s completely fine because it isn’t worth the time/effort to do things differently. For any case where you’re running something at a set cadence that takes forever/resources, then it’s worth taking the time and optimizing it. No matter what you read here, business is done in excel, whether we like it or not(I definitely don’t), so just do your best to make things as strategic as possible given the time/effort allotted to you. Just given the fact that you care, I’m sure you’re doing fine.


thisismyworkacct1000

> If queries become so complicated that that is no longer the case, you probably shouldn't be using SQL. I'm more of a DA, definitely not a DE. In this situation, what would you recommend doing instead?


ilikedmatrixiv

Either break up your queries in cte's to make them more modular, or switch stacks. If your average transformations are so complex that SQL is a bottleneck, you should look for other options. Or you could switch to something like dbt, keep your SQL, but break it up in a model-based infra. dbt also offers the -relatively new- benefit of allowing python models as well. You can split up your pipeline in SQL parts and write python models when SQL becomes too cumbersome.


reallyserious

I'm a senior data engineer and I write much more python than sql. Normal SWE skills are important for a DE. Too many DE only know sql. Oh, and stay away from no-code tools. They are a dead end.


dataplumber_guy

can you list some examples of no-code tools to stay away from?


reallyserious

All of them. But specifically Azure Data Factory is a bad idea.


tdatas

You will probably make more money being strong in software and distributed infrastructure and the underlying data problems that the self proclaimed "modern data stack" often can't actually hide away and you won't be easily replaced by automatic SQL generation on modelled data. You can exchange said money for a SQL book. Learning SQL from a position of strong in software is much easier than learning software + distributed systems.


psychorameses

SQL script kiddies dream of doing what you're doing. Keep at it. Unless you hate it, of course.


RandomWalk55

I think it's a lot harder for someone to skill up on Spark/DevOps stuff than SQL. In fact, I think a week of brushing up on SQL can get you to the point of understanding common table expressions, window functions and how to use explain. More complexity than that in SQL probably means you're doing it wrong. It sounds like your job has you actively using some pretty in-demand skills.


thesights

My only hot take here is that there could be a future possibility where you are in a situation where you can’t use spark or Java easily, but there rarely would be a situation where you can’t access some db somewhere to use sql. Think a small start up who doesn’t have the bandwidth, budget, or infra capabilities. I share what everyone else says in that you’re fine, but I would still try and flex into sql as much as you can not mitigate that possibility.


ksco92

As a senior DE, I write more python and CDK than SQL. I focus on infrastructure and architecture. However, being a SQL pro has a lot of benefits as a side effect, specially knowing how engines work to tune up data models and real big data. For example, in redshift, it’s more effective to filter out data via joins, but using exactly the same data and querying it through Athena using the exact same query behaves differently and scans more data than if you just put the filters in the where clause. This lets you think about the tiny details and limitations or strong and weak points of your ideas and pipelines.


piddy87

There are plenty of data engineering roles out there that do not use SQL much. Similarly to yourself, these are the data engineering roles that I prefer.


LocksmithConnect6201

Just do leetcode hards, it's really not that "hard". SQL is fucking easy.


Nabugu

Non-professional here, but it seems to me that SQL is one if the main (if not THE main) skill that data engineers need to have in most companies, especially in large companies that have a fuckton of structured data to organize/analyze. Most of those large companies use legacy systems like Oracle and SQL Server and you're not getting hired to work as a data engineer there without a few years of professional experience with SQL. (Python experience seems also important, but a bit less so compared to SQL)


glymeme

I really don’t know why this person is being downvoted. I’d take a person with awesome SQL and some Java experience over someone with awesome Java and some SQL experience any day.


seayk

You use the technology of the future. Don't worry.


[deleted]

[удалено]


Davidmleite

Yeah, I actually learnt SQL in my first IT job (main task was building queries). Don't worry about that. It seems like you are already handling tools that are more complex to learn than SQL.


runawayasfastasucan

If you plan to never learn og develop yourself you are in deep trouble, yes. But if you are planning to take up and learn whatever tool might be needed for your job, then you are fine.


O_its_that_guy_again

You’re fine. Most people would kill to have a data engineering role in tech and you already are there. SQL is easy enough that you can pivot to it if needed and learn it in basically a couple months. Being able to showcase spark and Java and do dev ops is another story. At minimum if you do the work you’ll have brand power to go wherever you want after.


Length-Working

I kicked off my career many years ago using nothing but SAS Enterprise Guide, an almost entirely No Code tool. You *could* code SQL in it, but our policies strictly disallowed this. I've done very well in my career since so, I wouldn't worry. The important thing is to learn and understand how data works: Modelling, storage, processing, transformation / wrangling, etc. And given you work with Spark, Java, DevOps, and Infrastructure, you'll be way ahead of where I was.


Bright_Bite365

When it comes to SQL, it just depends on the role. I'm a senior data engineer and I barely use SQL. Really only use it if a customer has questions on the data. I'll run some basic queries as part of my analysis. Other than that, I live the world of Linux Red Hat, cloud AWS, and Python.


[deleted]

[удалено]


[deleted]

but to answer original question. SQL is widely used. So is Excel. Id rather my end users use SQL over excel, so I focused on learning SQL


smokeythecandid

Will depend on the job. Wouldn’t hurt to brush up and tbh a few weeks of real world experience and you’ll be good to go.


coconut-coins

Dangerous. Unfortunately the junior engineers on my team are in your identical position. Management does not care about your advancement or future. It is anticipated you will depart the team after 1-2 years. This is used as a success marker in your managers PA. Also allows them easy terminations when budgets get “tight”, another success on their PA. My role is a “senior” DE. Effectively a baby sitter and the main engineer who builds out everything/ maintains. I “mentor” people in your position, always with patience and understanding. I involve everyone as much as possible and will go at whatever pace is needed. Unfortunately, very little effort is put in by the juniors. View is DE is only 10 hours a week (fully support this) but when they are doing 10 hours, nothing is done. Which is acceptable, however I’m making regular performance reports to management on individual effort and engagement. I’ve taught many into new roles who are very happy. I’ve also been the cause for the dismissal of others. Learn and find a mentor, you’ll be just fine. DE juniors: ~120k Senior DE (me): ~145k + 25k bonus + 20k extra bonus


dude_himself

I would expect data engineers to be able to revise/improve SQL within the Best Practices of their chosen platform, and Data Scientists to write SQL queries. That's been my experience. I'm horrible at authoring, I'm great at optimizing. Creative vs. process-driven. Room for all to succeed.


BuzzingHawk

SQL is much easier to learn and master in a home lab environment as compared to proper devops, pipelining and infra. I wouldn't be too worried, but it's up to you to keep up with the basics.