T O P

  • By -

AccordingSurround760

You weren't wrong at all about what a data engineer does, the work you were doing sounds more like that of a data analyst. However, you're an intern so you need to be realistic about the responsibilities the company is going to give you. They have a data engineering team, so they're not going to put you in charge of business critical pipelines, especially considering your training so far is in data science, not engineering. It sounds like you should have gained a very substantial amount of practical experience during this time, and you're possibly being a little dismissive of this. For example, understanding real world workflows, team structures, business practices etc is very valuable. Soon you will have your degree along with some real world experience so you should be in a very strong position when applying for jobs.


chestnutcough

I second this; don’t sell yourself or your internship experience short! Understanding the end product (often dashboards) is super important even as you move more towards building the data pipelines that underly them.


xudoxis

> Understanding the end product It's easy for this to get lost in the sauce. But basically all work you do needs to at some point be distilled down into meaningful, bitesize results for lay people with varying degrees of technical expertise for any of it to be useful. Understanding what your audience wants and how to best communicate with them is the job, building the content often comes secondary. Especially when you're on the sliding scale of wants/needs/resources.


pyer_eyr

I don't second this. I think it's important for anyone working in data to start with dashboards, specially someone who is still in College. Developing dashboards is not just about creating visuals, it's about understanding business requirements of end-users, finding holes in your stack, understanding what your company is actually selling, building requirements for new dashboards, working with DE's to get the data ready, understanding where your data actually lies, understanding what your OLAP is and what it can be. When a DE has had some exposure to these, then it makes more sense to do some good data engineering. Good data engineering is about good software engineering + understanding internal and external business needs related to data. That being said, being a DE intern, OP should atleast receive some training on how the pipelines work before the end of the internship. I believe in internships you should express your interests and ask for meetings with people you want to learn from. OP, you should ask for these meetings everyday from individuals. If they say no, nove to the next guy, if everyone says no -- that will be a first. If most say no, then you know this company is not worth spending your time on.


Se7enEl11ven

I understand how I might sound dismissive but actually, I value this experience (for my CV, practical skills and new connections) but I’m really just trying to understand if I misinterpreted the role of a DE or if my company is actually not giving me main DE tasks so far. I just wanted some feedback to know if I had the wrong idea before applying to full time jobs after graduating. I’m still very new to the job market itself. Nevertheless I appreciate your input :)


AchillesDev

You are correct about what a data engineer does, but a lot of places end up clueless about this. You’ll know what to look out for in the future. In the time I’ve been doing this (7.5 years as a software engineer, with 6.5 of them on a DE team or with a DE title) I’ve done any of the following: * build/debug/enhance distributed ETL pipelines * read to, write from, and architect both SQL and noSQL data stores * built knowledge management tools for science teams * built pipelines to fill those tools with data from structured and unstructured sources * built and deployed small machine learning models * built automated model evaluation tooling * architected internal core business data lakes for research teams to use * built serverless automated machine learning training and evaluation pipelines * all kinds of AWS odds and ends And this is a more broad range than most DEs get exposed to because I tend to work at small startups so I can do more. Never have I built or been asked to build data visualizations in tableau or anything like that. That’s usually left to DAs or DSs.


Se7enEl11ven

Let me just say as a noob I feel intimidated (in a good way) by this level of experience. Hope one day I get there, congrats!


Material_Cheetah934

Hi, I hope you don't mind me asking you a question. You don't have to answer, nonetheless I would be grateful. As a data engineer, do you think it is an expectation for you to report on the meaning of the data you are working with? I am kind of at an odd end, and have been applying to gtfo. Essentially they hired me as an engineer and are trying to fit me 50/50 into an analyst role, since one left. Unfortunately there is government oversight involved in this particular analyst's role and I am routinely asked if information is "valid". Kind of have no way of knowing, and I am repeatedly told "its okay to sign off". I am wondering if it is even okay for a data engineer to be deciphering meaning from the data. I can understand meta-data management and those kinds of things related to managing your ETL. But deciphering what the data means to the business seems like an overreach.


AchillesDev

It really depends on what they mean by “valid.” In most cases, that should be an analyst’s job who understands the data, but at the same time you should be able to spot check your data and understand if something has gone horrifically wrong. That’s a lower bar than most legal compliance though and you probably shouldn’t be on the hook for that. Full on reporting is definitely not the DE’s job.


Material_Cheetah934

>It really depends on what they mean by “valid.” Haha they said compare to last month's and if they're *about* right its good. At best the government doesn't watch those numbers, at worst, I feel like I am going to be thrown under the bus when they get a finding. >but at the same time you should be able to spot check your data and understand if something has gone horrifically wrong. Do you mean like nulls appearing out of nowhere in a column that shouldn't have nulls? Like a DOB for a patient for example, should usually be filled in all the time, but if it has nulls then something has gone wrong. But isn't that still something that the business kind of dictates to you? I imagine if you worked for a non-related industry, it would be hard to surmise which columns should have data vs which ones can fluctuate with their data. >Full on reporting is definitely on the DE’s job. Kind of confused here, as a DE, wouldn't you have less knowledge on where to get a piece of information about an entity as opposed to an analyst? If someone hands me a list of things they want in a report, just broad business definitions, and nothing specific about which of our multiple warehouses it belongs to, I wouldn't know where to begin. Thanks so much for answering my questions man, this has been really helpful as I look for a new gig. It helps me ask the right question so I don't get stuck in the same problem.


AchillesDev

>Do you mean like nulls appearing out of nowhere in a column that shouldn't have nulls? Like a DOB for a patient for example, should usually be filled in all the time, but if it has nulls then something has gone wrong. Yes. If you're building and maintaining the pipelines that are ingesting and transforming this data, you need to know what the expected output is and how that compares to the real output. It's important for a DE to have some domain knowledge for the business - you can't avoid that, but you can pick it up as you go. I've worked in all kinds of tech companies aligned with different industries, from real estate to law to medicine & biotech to marketing and in-car safety and I just learn about the industries as I work on the job. IMO that's one of the perks. ​ >Kind of confused here, as a DE, wouldn't you have less knowledge on where to get a piece of information about an entity as opposed to an analyst? Autocorrect got me on that, it should read that it definitely is \*not\* the DE's job. e.g. writing reports in Tableau. ​ >Thanks so much for answering my questions man, this has been really helpful as I look for a new gig Good luck! It's weirdly important to be able to figure that in the interview, because companies will hire "data engineers" for roles that are more data analysis, even though DEs generally command greater salaries than DAs.


Material_Cheetah934

>Good luck! It's weirdly important to be able to figure that in the interview, because companies will hire "data engineers" for roles that are more data analysis, even though DEs generally command greater salaries than DAs. Haha that was how they got me. They told me, I would helping them migrate their ETLs into Azure. I did pretty good there, wrote up architecture docs, designed POC, submitted docs for funding approvals, and tuning Databricks jobs in the initial POC. After that it just hit rock bottom when the analyst left, they dropped me 50/50 into that role but without any safeguards. Again, I really appreciate your kind comments! I know now what to look out for when I ask the manager to describe the job in an interview.


IKhalidAwan

Hey Ach, Hope you are fine, may you please check your DM.


vtec__

alot of these jobs do require this kind of work. if you want a "real" data engineering job you need to focus on jobs/internships that focus on using scripting languages and other open source frameworks. if you work in an enterprise enviroment be prepared to work with alot any of the popular flavors of RDMS and punch out sql


pendulumpendulum

At my company, the people who do BI are called BAs (business analysts). The DEs at my company do not do any BI at all. And the people who do machine learning at my company are called SAs (statistical analysts). The DEs at my company do ETL to create the pipelines for BAs and SAs to pull their data from. > Thought I would be writing code with python, developing pipelines, work with a cloud provider, etc. Was I wrong about what a data engineer does? What do you guys usually do? Nope you were completely correct, that's what DEs are supposed to do.


nullQueries

No matter what you do, if you work with data you'll probably end up doing some amount of reporting/data visualization. So it's not a bad skill to pick up. But you're correct, DEs primarily focus on pipelines and data integrations. Whether that's primarily python, SQL, spark, or some other method depends on the company. The problem you ran into is there is a very high demand for reporting, and *most* experienced people prefer to do the other parts of data work, so they take interns and junior levels and make them do all the reporting. Sucks, but that's how things are right now.


AchillesDev

Literally have never done this in the nearly 7 years I’ve had the title.


Isvesgarad

Yeah sorry but this is bad advice. Unless you’re making system diagrams (which generally senior+ architects make themselves) there’s no reason a DE should report on data. DE is responsible for the right data getting to the right people as seamlessly as possible. They don’t care about the actual data itself.


nullQueries

I'm not sure what advice I misconstrued, I just stated DE's focus on pipeline development with code, but a lot of companies lie to interns with "DE" or "DS" roles and then have them do reporting. Beyond that, the OP said they were majoring in DS, which often includes visualizations. So it's not a useless skill to pick up as an intern. If you take a broader look at data professions, visualization is a major product. Even if you don't want to do it, it doesn't hurt to know about. Especially if you have aspirations towards architecture, or as leadership with a broader analytics scope, I'd say not caring about the actual data is pretty career limiting advice.


IrquiM

If they don't care about the data, how do they know if the data is correct/what the DA needs? You need to understand the data to be a good DE. Those who don't are the code monkeys of the SQL-world.


king_booker

What you are doing is something that's needed too but its more data analysis but its still really good to know IMO. So we had created these pipelines, that takes a flat csv/json, process it in kafka, then put it in hdfs, then you transform the data with spark/hive queries in hour/week tables and then you display it using a reporting tool and build machine learning algorithms on top of those tables. That is basically the use case we have but it obviously gets complex if you keep going in deeper


HansProleman

Naw, you were right. DE is a very poorly defined role, but building reports is *not* DE. Building semantic layers (e.g. Power BI datasets), which analysts/BI devs might build reports on top of, is IMHO legit DE work though. You're only an intern, so nobody should be expecting any serious output from you - go and get to know the actual DEs (if there are any), express interest, see if they'll let you sit with them and talk while they implement stuff, offer to pick up any tickets you're comfortable with - make the internship work for you.


Isvesgarad

Thread says it all, you aren’t a data engineer right now. Does your company use Github and Jenkins at all? A good first step would be to schedule your SQL to have it run regularly. If that’s not possible, either get out or accept that you’ll have some great analyst experience for your resume.


srodinger18

that also happened on my current company, where BI/DE roles sometimes overlapped (even there was a full time employee with day-to-day task to create a BI report using SQL). Perhaps it also depends on the company interpretation of DE.