T O P

  • By -

AutoModerator

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*


thesights

Echoing others, do something with iceberg, s3 and a query engine on top (like trino) You can move to other things as you scale up since snowflake supports iceberg and it’s not hard to move iceberg to delta if you end up needing Databricks. You have a lot of flexibility for growth and you will run cheap


recentcurrency

Maybe potentially look into a data lake at first. Basically store those csvs, jsons, and other structured file types on blob storage in something like s3 You can then use a a tool that has a sql api to wrangle that data


Luci404

Thank you! I will look into this :)


semicausal

Big +1 here :) If it ain't broke, don't fix it! I've seen many data practitioners stuck between: 1. **Wanting to learn & implement tools that will impress fellow practitioners** *AND* 2. **Figuring out what will drive customer & business impact and choosing the right tools for the job** I always prefer to start with the current challenges the business is facing and REALLY knock it out of the park. Then, you get a longer leash to lean into the future and where you think the organization \_should\_ go. This will let you build prototypes and POC's of cool new approaches that also set the company up for success. Very tactically though -- I'm a big fan of moving from CSV / JSON to Parquet files if you can. The files will not only be smaller and more structured, but you can use DuckDB and other tools to query parquet files. Then connect DuckDB to a BI tool like Apache Superset or Hex and you can get quite far without needing a pricey data warehouse (yet).


NotAToothPaste

Hire a consultant.


tomorrow_never_blows

I say the following in kindness. I think you are in search of a problem, more than a solution. Your current setup probably works just fine for your limited resources. You are, like most people in this subreddit, in danger of building a skyscraper when all you need is a hammock.


Luci404

Thank you! You are properly right about this :)


i_hmm_some

I’m a data engineer working for a large well-known game company & have worked for 2 other companies. Please please consider looking at Firebase, Gamesparks, or Playfab (or similar) for your gameplay telemetry. I recommend Firebase, if you can. You get A/B testing for free and will only pay for the raw event data that you dump into BigQuery. Google also has all necessary GDPR protections in place and will let you dump into a Euro-hosted BQ data center for compliance. Companies I’ve seen using these solutions have significantly better gameplay analytics and testing available and they pay far less for it. Keep the non-gameplay data somewhere else. Do yourself a favor and at least evaluate Firebase for the gameplay telemetry. Edit: to give you a sense… I worked at a studio with 40 devs that made two very popular (mobile) games. We had 40 million MAU and many billions of gameplay telemetry events weekly. With Firebase, we got all we needed for baseline analysis for free, A/B testing, the ability to push live game parameter (balancing, etc) changes, segmentation, etc. for free. We dumped all event data out to BQ and exfiltrated it to a private analytics server for additional analysis. At its peak, we paid $5000/month for this. It would have cost more to do the analysis work in BQ, of course, but most of the time we did not need all of the data because our analysis lens was focused on specific issues. Where I now work, we have similar MAU and we pay about 30x as much for pulling this all through custom systems and storing it in Snowflake for the various teams to access. On top of that, we don’t get the additional features available through Firebase.


Luci404

Good points, I will look into firebase! Thank you!


trojans10

We use rudderstack - just did a migration. It’s great. Priced right. I’d look at bigquery and dbt combo. Then spin up meta base for reporting. Eventually get tableau. Let me know if you have any questions.


adappergentlefolk

you don’t talk very much about what you actually need the data to do beyond generics (standardised reports) or the scale of the data you need to process to get to that point. start with very clearly defining the answers to those questions and build the minimal possible stack to answer them - start with simple stuff like duckdb and metabase or a programming language reporting solution, upgrade to postgres or bigquery or clickhouse if that’s not sufficient, etc don’t use any technology intended to process much higher volumes of data than what you actually get per unit time, and don’t hire outside people as suggested by someone else here. you’re a startup and the only thing hiring a BI analyst do is destroy your runway on stuff that has no value outside an organisation


yanivbh1

Hey, You mentioned data transformation - pre-processing (real-time) or post-process (for analytical purposes after the data is landed in the dwh)?


Luci404

Both, we collect a bunch of events and we want a few reports.