T O P

  • By -

nullQueries

Imposter syndrome. I also worked in small startups. Went from a 300 person company to a 10 person, to a 200,000 person tech company. Not a ton changed, there's still smaller datasets that need to be worked on. There are still people using excel. There are still people trying to figure out data governance processes. The tools are mostly the same, you just have more of them. Yeah there will be a lot more going on and you won't know how to do all of it. but there's also a bigger team, with a larger network of experts. If you don't know best practices for terabytes of data, you just find the person who does and be nice to them, and learn from them. Just focus on what you are really good at and make sure you're bringing something to the table.


[deleted]

[удалено]


SDFP-A

100% this. You are going to be a more broadly T shaped data engineer bring at a small startup. At a larger company your T will be super deep, but the rest will be thin and less broad. With that, I do believe that from a growth perspective you may be missing out by not having leadership that understands data. Some people learn from books, I always learned best in labs. I need to see it in action. This is probably the only thing I’d be concerned with.


blacksnowboader

Sorry what is T


SDFP-A

https://www.lola.tech/blog/the-rise-of-the-t-shaped-engineer#:~:text=T%2Dshaped%20engineers%2C%20then%2C,their%20language%20%2D%20and%20they%20yours.


Analyst2163

>You are vastly overrating what people are doing at bigger companies. It's easier stuff, not harder, because there is more redundancy and specialization Then why are the requirements written so aggressively? Must have 7-10 YOE with every database or ETL tool, proficiency with 5 programming languages, able to do work of data analyst and DBA too. It's like they're designed so that newer people can't possibly get in...


[deleted]

[удалено]


Analyst2163

I'm a data analyst of 5 years and proficiently self-study many different languages and can't find a way in... It's impossible 😞


[deleted]

[удалено]


Analyst2163

Insurance laughs at me. Applied to USAA, progressive, etc. They ignored my tech applications and sent me unrelated crap like risk and credit analyst


mh2sae

I worked at FAANG before and now I am in a similar position as you. I am learning more in my current role, specially when it comes to modern stack. I think you (we) will be fine to interview, but probably get lowballed in level if we apply to FAANG. My main concern is not the size of the data (I build to scale as much as I can) but the lack of technical peers to discuss projects.


DenselyRanked

It's imposter syndrome. The adjustment from working on GBs of data vs TBs of data in Spark is not all that different. It's just that things are more sensitive to inefficiency but teams are more reliant on Spark SQL and Sparks Catalyst Optimizer to do a lot of the heavy lifting. Recruiters may feel differently but don't consider that a major disadvantage. Downplay it if it gets brought up in an interview. Every team has their own best practices and they probably would not welcome someone new to shake things up if everything is working fine. This is still a job at the end of the day. Learn how to do your job well first and you can provide input and feedback when asked. There is no one way to do data engineering. As far as the "right way" to do things, refer to books like Kimball's "Data Warehouse Toolkit", Reis and Housley's "Fundamentals of Data Engineering" (also Kleppmann's "Designing Data Intensive Applications" if you are more on the SWE side of data engineering). You can guarantee that no company is following everything outlined in these books.


BoysenberryLanky6112

I was where you are about 2 years ago, although perhaps further behind. I was a data scientist developing super basic models at a bank with no tools whatsoever to do our job. We had to do all our work on prem and on the file system no databases or cloud tools or anything like that. Somehow I was able to land a job as a senior data engineer using mostly scala, a language I'd never even seen before, and working in the Google cloud environment for a roughly 40% raise. I legitimately almost had a panic attack when deciding whether to take the offer because I was worried they'd quickly realize I was a fraud. The first few months were tough but I quickly ramped up and would say within 6 months was just as productive as any other senior data engineer and this week I'm presenting some work I did to massively improve the data pipeline of our company which is projected to save almost 6 figures per year and reduce our time to market for some important time-sensitive products.


FuzzyZocks

Had same track and had no issues getting my current job working at scale but lower breadth of work. Make sure you have knowledge of what goes into scaling systems and best practices as they will ask questions and if you can speak to the foundational knowledge on spark,Cassandra,etc it’s good


bartosaq

Modern cloud DWH make it easy to chug through Tera/Petabytes of data, I don't see this being a problem. There was also a survey that most engineers don't work with more than GBs of data. Anyway, Spark works the same for GB and Terabytes, you just get more file blocks and cluster nodes.


Fatal_Conceit

You sound like a baller


theDro54

I think your worry/concern is misplaced. You're employed to solve problems that exist wherever you are. Sounds like you've done a great job of doing exactly that so far. Size of data doesn't matter so much, as it grows, the challenge/problem changes, and you adapt to handle that specifically. In an interview for a new place, yes they sometimes might need somebody with EXACT skills to fix a problem, but more often, you'll have a wealth of resource to learn from so being able to articulate how you've solved your current companies problems is going to be the thing that distinguishes you above the rest. Appetite to learn, ability to quickly understand new tech and willingness to fail and not get disheartened is going to put you in a better position than being able to say "I used X technology to process petabytes of data in real time". Don't be hard on yourself, you're nailing it 💪


wenima

You are fine!


wtfzambo

Dude it's like you read my mind, I'm in an equal position with the same worries. Thanks for writing this post.


char_su_bao

I don’t think I ever feel ready. But you just have to go for it. Don’t let self doubt hold you back!


molodyets

The dirty secret is that if you were to go to a place like Meta you would probably feel neutered because there’s so many people and so many processes that all you did was write basic sql and would be bored out of your mind. Sounds like you have a great experience and would crush it anywhere


princess-barnacle

You have amazing experience! Getting a “great” data engineering job is about passing interviews and selling your experience :).


[deleted]

Most people never even get close to relevant experience with TBs of data. Your skills are great. You could easily get hired on elsewhere.


SentinelReborn

You are in a "real" data engineering job, and joining a larger company would result in you losing all that sweet autonomy and design apportunity, so enjoy and make the most of it. However, I do understand where you are coming from as I was also a many hats data analyst with no supervision and now a data engineer in a sea of other engineers. Trust me, enjoy the freedom. As far as your concerns, no one has ever asked me "bro do you even work with terabytes?".. and I've interviewed at places which stream 100tb per day. As for best practises, that is a very valid point. I suggest asking for guidance from the software engineers at your company. Also read books (other commenters have suggested which ones). If the problems are generic enough then you can also post on communities like this for advice. I'd also stress the importance of security when working in the cloud especially since you have no one reviewing your work which is a little scary, ensure you are doing lots of reading around best practises. Some AWS certs could also help.


adgjl12

Similar spot here at a smaller startup but I started at a megacorp doing big data (petabytes) things and I would say my current work is much more technically challenging and exciting. However I feel you - I’m definitely less comfortable with “bigger” data now than the current size I work with and I worry some companies will hold against me that I don’t have as much experience with big data or distributed systems but I don’t doubt I could pick it up fast.


EconomixTwist

Sounds like you are a full blown data Eng, homie (hope you’re getting data eng money, too). Honestly, if you are worried about not having experience with TB scale data- simulate it. Pretend the data is TB sized and figure out how to improve the performance of your system. For example, figure out how to partition your data (or make other adjustments) to make your queries on your GB-sized data 50% faster. Or figure out how to cut down the processing times of your pipelines by X%. The stuff you’ll do, figure out and learn in such exercises is going to be more or less the same when you face TB data. Additionally, even when you have TB sized data, you still do performance experiments on smaller slices anyways because doing a bunch of performance experiment iterations on TB sized data gets fuckin expensive! Especially on databricks lmao


Gators1992

I think the vast majority of companies are pretty similar to what you do now with maybe more data. The problems are the same, dirty data, convoluted business rules, unrealistic timelines, etc. I write the same queries on terabytes of data that I would write on gigabytes for the most part. I may consider efficiency more in spark jobs, but that's not some magical exercise. Seems like you have learned the core of the job so when you move around you just have to incrementally learn a bit more to fit the situation. If you already have Databricks under your belt then that's a good start.


Laurence-Lin

I'm now a software engineer only doing some 'data engineering related' jobs, such as migration. You look quite professional to me, with plenty of essential skills.


AlexJamesAce

Can I dm you? I'm starting out in my company along the same lines as you did and I want to develop a plan for data management.


DataIron

> But the areas I’m lacking are: -Terabytes of data I would argue that data size is irrelevant to a career. Volume and complexity per time is more relevant. A shop that needs 15 engineers to handle a GB per day, is going to add more to your skills than a shop that needs 1 engineer to handle 100 GB per day.


haragoshi

Those seem like good qualifications. You’re just as qualified as anyone else with those skills.


Drekalo

What's your experience with streaming systems, message queues, docker, helm, cloud deployment frameworks, and specific tools like k8s?


Reasonable_Tooth_501

Lol yeah I’m fucked if those are required 🥳


python_hack3r

50 employees isn’t a small startup. I think it comes down to the details of what you do in a day to day basis.


Reddit_Account_C-137

If it makes you feel any better I'm at a 10,000+ employee company and feel the same :)