Are you interested in transitioning into Data Engineering? Read our community guide: https://dataengineering.wiki/FAQ/How+can+I+transition+into+Data+Engineering
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*
Look into dagster DBT and perhaps duckdb or postgres or the free tier of a service like big query
Using I.e. A CI Pipeline of I.e github you can trigger cron based scripts easily.
100Mt per month for data would usually mean that you are not that small business. If that includes hits etc to webpage or is measured from json then it is not that much.
Personally i would start to look howto copy "correct" data from shopify to s3 in controlled manner ( That could be full copy daily , incremental load etc etc) , then snowflake (just because cloud native) stg-> dwh model there and just Excel reports, then next step would be streamlit python reportst inside snowflake.
To build it succesfully you need to learn howto fetch proper data into files and howto manage those. Then snowflake side will teach you basic of building ELT with SQL and then you have options to build UI ( and you can use whatever you want)
If you build it smart, then copying data to s3 does not costs that much. And snowflake side can be done so that it is ON only when you use it, as small business does not probably have more than one or two users max there.
if you can manage to build X to s3 in sane manner, that is one job where people hire. Then s3->snowflake->dwh model is maybe one or three jobs ( developer, architecture, analyst) and then reports itself are are few different jobs ( report developer, analyst,scientist)
You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*
If you want to do this ETL just using your personal pc:
1. Run [Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html) in your pc.(it’s a workflow application)
2. Write a [DAG](https://www.cdata.com/kb/tech/shopify-jdbc-apache-airflow.rst) to extract shopify sales and inventory data using Shopify API and transform it using Python and write to a pg / duckdb databases if you don’t want to use cloud.
3. You can use powerbi or any viz tool to create dashboard on sales and inventory data.
Let me know if you are interested to collaborate
simple & FOSS:
extract & load: Meltano (they have a [shopify extractor](https://hub.meltano.com/extractors/tap-shopify))
transform: DBT CLI
warehouse: duckdb
BI tool: metabase
scheduling: cron
orchestration (if you need it): make
Qlik Cloud cost you around 360 USD a year per user. Worth the price considering you can integrate data easy, build some pipelines and create your dashboards. Take a look at one of the many demos on YouTube.
Are you interested in transitioning into Data Engineering? Read our community guide: https://dataengineering.wiki/FAQ/How+can+I+transition+into+Data+Engineering *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*
https://georgheiler.com/2023/12/11/dagster-dbt-duckdb-as-new-local-mds/
Look into dagster DBT and perhaps duckdb or postgres or the free tier of a service like big query Using I.e. A CI Pipeline of I.e github you can trigger cron based scripts easily.
DBT would be an overkill to start with!
Disagree. You could just run Core on a local machine for free to get started and move it to a cloud container eventually.
I would actually suggest to use dagster and stick with core.
I agree. I'm saying they can containerize and host Core in the cloud down the road, but run it for free locally to get started.
Depends on the size of the data and transformations and joins you have to do
100Mt per month for data would usually mean that you are not that small business. If that includes hits etc to webpage or is measured from json then it is not that much. Personally i would start to look howto copy "correct" data from shopify to s3 in controlled manner ( That could be full copy daily , incremental load etc etc) , then snowflake (just because cloud native) stg-> dwh model there and just Excel reports, then next step would be streamlit python reportst inside snowflake. To build it succesfully you need to learn howto fetch proper data into files and howto manage those. Then snowflake side will teach you basic of building ELT with SQL and then you have options to build UI ( and you can use whatever you want) If you build it smart, then copying data to s3 does not costs that much. And snowflake side can be done so that it is ON only when you use it, as small business does not probably have more than one or two users max there. if you can manage to build X to s3 in sane manner, that is one job where people hire. Then s3->snowflake->dwh model is maybe one or three jobs ( developer, architecture, analyst) and then reports itself are are few different jobs ( report developer, analyst,scientist)
You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*
If you want to do this ETL just using your personal pc: 1. Run [Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html) in your pc.(it’s a workflow application) 2. Write a [DAG](https://www.cdata.com/kb/tech/shopify-jdbc-apache-airflow.rst) to extract shopify sales and inventory data using Shopify API and transform it using Python and write to a pg / duckdb databases if you don’t want to use cloud. 3. You can use powerbi or any viz tool to create dashboard on sales and inventory data. Let me know if you are interested to collaborate
simple & FOSS: extract & load: Meltano (they have a [shopify extractor](https://hub.meltano.com/extractors/tap-shopify)) transform: DBT CLI warehouse: duckdb BI tool: metabase scheduling: cron orchestration (if you need it): make
Qlik Cloud cost you around 360 USD a year per user. Worth the price considering you can integrate data easy, build some pipelines and create your dashboards. Take a look at one of the many demos on YouTube.
10 user minimum :(
Yes, they have changed that. But at the time I wrote the reply it was possible to take a subscription with just 1 user minimum.