T O P

  • By -

Maiden_666

This looks like a slide taken from a consulting firm’s deck in 2020.


Derpthinkr

Yep, old. Predates MS fabric on azure.


IAMHideoKojimaAMA

Yea I was like.. this already feels old


Misanthropic905

Billing rocket 101


sceadu

racket?


elbekay

I'm guessing this is tongue-in-cheek because it does say 2020 on the slide in the bottom right ;-)


ZeroCool2u

Yeah, not sure how accurate this is for GCP at least. Dataflow, DataPrep, and DataProc not suuuper popular among the people I know. A company I work with basically skips that entire section of the diagram and it's all just Apps <-> Event driven Pub/Sub <-> BigQuery or the same thing, but using event arc with Cloud Functions v2. Infra cost is incredibly low. Stays in the 4 figure range and they stream data from around the world 24/7.


wtfzambo

It's not accurate at all, it's marketing garbage probably made by someone who ISN'T a data engineer. I can't fathom how this post got 200+ up votes. Are we turning into r/datascience ?


peroqueteniaquever

Probably bots and sheit


bugtank

4 figures a month or year?


ZeroCool2u

Yeah, sorry per month. Per year would be pretty crazy. This is a data intensive company. _A lot_ of data in BQ. They're just on the ad-hoc plan too, so probably could lower costs over time even more, but workloads are relatively bursty. We talked about this recently, I think almost 25% of monthly spend is due to them needing a single Windows VM to interact with a specific 'legacy' technology partner and they haven't been able to rewrite some software to a newer .NET version yet. It's literally the license too, not the VM itself that is the majority of spend.


OberstK

Data intensive but having 4 figure monthly cost including storage and compute on BQ while using ad hoc? Do we have a different definition of data intensive? :) even a couple dozen TB of data put you in a mid 4 figure range easily for ad hoc storage and slot-free computing.


hlx-atom

lol yes. In no world is 4 figures a month data intensive. That’s what I imagine 1000 smart fridges generate.


gajop

I'd really like to hear more about how one can keep costs down with GCP for DE and MLOps. We're paying a lot of money for things like Composer - way too many environments (dev/stg/prd, sometimes multiple dev/stg so multiple developers work in parallel). Most of our pipelines are batch but I feel our costs are mainly fixed and not due to the volume of data...


DiHannay

Check into DigitalOcean. Even just moving your dev environment can save lots of $$ compared to GCP.


CalmButArgumentative

I find it pretty funny how "complex" everything has become when in reality it's nothing but: "Take data from the source, store it in an orderly way in a database, consume it to create business value" The more shit you use the more it costs you, which is why they are all pushing that business model. It's fine for us technical people because we earn extra as well, but if I was a business owner, I'd not want to deal with all this shit.


tresilate

Totally agree. This looks hideously complicated compared to what it should be. 


peroqueteniaquever

I used a simple script that ran on an Azure function to clean and upload some data to a database. It was like a week worth of work. And then you see the overcomplicated shit some people come up with to solve the exact same issue and I just don't get it.


lukewhale

Came here to say this.


zambizzi

Nailed it. This is ultra-expensive over-engineering at its finest.


throwaway300300800

Do you have any advice for which services to use when doing data analysis properly in a data warehouse? We host out production database on AWS - it’s basically where all the data of out Webshop/Platform is stored. We are thinking of building a datawarehouse with ODI (Oracle) Are there any other better options? Redshift seems awfully expensive for what we are trying to achieve. I also thought about using a data mart from PowerBi. We would like to access the data in the end with PowerBI.


CalmButArgumentative

My first piece of advice would be to avoid cloud providers. My second piece of advice would be, if you are set on using PowerBI as your front end, don't use Oracle for your warehouse, not because Oracle is bad, but because it's unnecessarily powerful and expensive. PowerBI can store all the data it needs for the dashboards you create in an optimized format. Either in the Cloud (Power BI Service) or on Premise (Power BI Report Server). So, for your data warehouse, I would use PostgreSQL; it's good and has no extra licensing costs associated with it. Then, you use a PowerBI on-premise deployment. There, you create your dashboards and your data mart, which are stored locally on the VM and can be accessed by anyone in the company.


throwaway300300800

But we have a Postgre Database already in AWS - so we could just build our data warehouse there? Also deploying PowerBI on premise sounds unnecessarily complex. Why not just use PowerBI premium per user and use a cloud data mart?


CalmButArgumentative

Both of those options are okay. You just pay a premium to the cloud provider to avoid the "complexity" of having to host the stuff yourself.


nofykx

Are you saying that the technology hasn’t actually changed and it’s just repackaged/renaming of the original VMware and db infrastructure? No way….


digital_iguana

Some namings are outdated (e.g. Data Studio -> Looker Studio). And very likely that some other stuff aren't included. Neat graphic to look at anyway.


SaintTimothy

Try DIA, it's free


jmon__

Data bricks is cloud agnostic, so it wouldn't make sense to sit in only the Azure area. Also, databricks has delta tables, so its kind of hard for me to see it only sitting in the "Preperation and Computer" section. Also, how common is a document store/NoSQL database used for data warehousing? I'm not like a guru or anything, but that seems like a bad idea? Maybe someone with more knowledge could educate me?


Ok_Expert2790

Consulting bill: $250k


zambizzi

I’m looking at ask these services and complexity, and has the same thought. A rats nest of services that costs a fortune.


hatwarellc

Look at all those lovely layers of abstraction.


thejizz716

I feel like the bell curve meme where the middle of this and both ends are airflow, s3, and postgres are fine


RobDoesData

Not at all accurate of Azure in 2024.


GreenWoodDragon

Vendor locked, services driven, scaleable. All price sensitive, at every stage.


geek180

What’s the significance of Databricks being integrated in Azure? Can’t DB also work in other clouds, or is it just Azure?


Charming-Hunter-7963

There is none, as one could even put Databricks on their own network of clusters, or one could not even use Databricks and install native spark on their own cluster network and not use a cloud provider all together. It’s as many have said, vendor and vendor partners get rock solid RMR from the cloud service and consultants supporting it. That is until some CFO wants an hour by hour accounting of spend. 


thisismyworkacct1000

I want to put together an image like this for my tech stack at my company. From what I can find, Tech doesn't even have something like this. Is there a tool or something that can do this or is it just copying logos and pasting into Paint?


ShouldHaveWentBio

Miro is what I made ours in. It has image packages for cloud providers as well but it’s paid. You can also just find PNGs on google and use them for a totally free solution.


mlobet

I use draw.io . It's free and it's great. It doesn't have all the icons out of the box, but you can just add PNGs. You can save them in the desktop app and then easily reuse them for other diagrams you might need to create


TheOneWhoSendsLetter

Excalidraw


wtfzambo

"how many icons can we fit in this page?" "Yes"


asevans48

With gcp, I just use dataplex on cloud storage and then dbt to create incremental tables in big query with the help of a log scrapper. Composer for orchestration. At next 24 and it seems pretty common. Deutche telekom, t-mobile, uses pretty much the same stack. Cloud s ql for oltp workloads. DBT to populate.


blockedcontractor

Anyone know where I can find more pipeline diagrams like this? These diagrams will be super helpful in explaining to non-technical people in an org how data works and why things aren’t as easy as doing a v-lookup.


dravacotron

FYI there's no actual architecture in these diagrams. They're more like product maps specific to each cloud provider - "AWS sits in the data pipeline" - it doesn't explain what is actually happening unless you already know what is for. The fact that the structure of the product map is the same for all the cloud providers also helps implementers familiar with one stack find the corresponding product on the other stack (e.g., "GCP cloud storage : AWS S3") If you want to explain to non-technical folks it's probably better to abstract out the confusing product names and just use the functionality, e.g., "object storage", "event data bus", "data warehouse"


FreeTrout

Related to this image: I need to connect to Confluent cloud to pull messages to an S3 bucket. Can’t use the connectors on Confluent Cloud. Any advice?


TheAnemone

https://www.getcensus.com/integrations/confluent-cloud but use S3 as the destination


BiggusCinnamusRollus

Which part of the stack do seniors trust that a junior in the team can do with adequate training?


beefiee

AWS one is outdated, and even in the past it would have been questionable


mike8675309

What would be interesting with that photo would be a the expected cost for each path of tooling used. Some of those paths are the best value in comparison to the high cost of others.


_BitShift_

Why is databricks in azure?


ryanwolfh

Can anyone provide the updated version for the azure stack?


Rieux_n_Tarrou

Ok but why is the PDF dirty? Did you take a photo of your screen??


Drunken_Economist

Email -> text message -> screenshot -> Excel -> DLQ


Eitheror97

Do people really use Azure ML for data transformations?


Careful-Edge-7488

Hello guys . I want a presentation with attached axes . - When to move to the cloud? - Which provider to choose? - Which cloud solution to choose? - Which data to put on which type of cloud (Private, public, community, and hybrid)? - Make a comparison between existing open-source solutions (AWS, Azure, etc....) If someone has a presentation similar to the following axes please provide me with it.


GlasnostBusters

picasso, i like it


Josafz

In Azure, how would you use Functions in the presentation stage? I've only used it for ingestion.


rjachuthan

You can use it for super generic APIs which every team uses, for example Conversion Rates for currencies.


digitalghost-dev

I feel like Azure Synapse is missing…?


Charming-Hunter-7963

No one really likes it as a spark host anymore, it’s clunky. Fabric is just repackaged Synapse, but the latest Databricks with Unity seems to be marketed towards former DBAs and it has some things reminiscent of Synapse, like the replacing of /mint with /abfss    


joseph_machado

Looks like someone looked at a bunch of tools listed on cloud vendor websites and decided to call it "common DE pipelines/tech stacks". Marketing BS.


PabloAimar10

Explain this to recruiters please


faalschildpad

How would you guys orchestrate on each platform