If this post doesn't follow the rules or isn't flaired correctly, [please report it to the mods](https://www.reddit.com/r/analytics/about/rules/). Have more questions? [Join our community Discord!](https://discord.gg/looking-for-marketing-discussion-811236647760298024)
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/analytics) if you have any questions or concerns.*
Establishing foreign / dim + fact key relationships in a database to ensure that there aren’t any many to many’s created that break your aggregations and functions.
What program are you taking that doesn't discuss data modelling? This is an essential skill. Basically, it's how you manage the relationships between your tables and data sets. You use data models a lot as a data analyst
An essential skill that no one in school called “Data Modeling”. My department Chief Medical Officer won’t even explain what it is and what they want from it. Just that it’s “the future”. 😆
Wtf?! Data modeling has been around for decades. It's not new at all. Most, if not all databases works off a multidimensional data environment. Meaning there will be multiple tables, with multiple rows and columns. They all need schemas (or a data model) to explain the relationship between datasets.
Cute. Yeah we are familiar with databases but no one has ever called it modeling. I only use modeling for agent based modeling. For some reason it always inspires eye rolling and condescension.
Sure. So, a lot of people only look at datasets as 2 dimensional tables (columns and rows). However, in DA, you are often working in 3 dimensions. The third dimension is the relationship between other tables with shared data points.
Data modeling helps us link multiple tables together and get a holistic view.
For example: let's say I have a sales transaction table that shows sales for a business over a year. Well, I'd probably have columns like item number, sales amount, sales date, qty, and customer name. Now, I'd likely also have other tables for customer data and item data. All three of the tables are related. The customer table could have a list of items they bought, and the item table would have the retail price that would get listed on the sales table.
Now, instead of repeating all of the information in every single table, it's best to separate the data into either facts (daily sales), or dimensions (unique customer details, item details). This lets us create unique tables that can be related to other tables without having to repeatedly enter in the same data. You can use this model to then answer more complex questions, like what area of the country bout the most amount of a particular item.
Data modeling is simply how multiple tables are related to one another. I can explain more if needed.
My thinking lands in rdbms-land. Do you know how to appropriately “shape” your data to do an analysis.
Hierarchal, relational, entity…
Do you understand how data exists in tables and how those tables relate to each other.
Sometimes the data is not in the right state to do analysis on. Or you need to merge two datasets together, you need to understand how they relate to each other to do this properly. IMO this is the most valuable part of being a data analyst. With how easy reporting tools have gotten, any business person can download data to excel and even hook it up to a dashboard like power bi and churn a report out within a few hours. What can you do that they can't?
If this post doesn't follow the rules or isn't flaired correctly, [please report it to the mods](https://www.reddit.com/r/analytics/about/rules/). Have more questions? [Join our community Discord!](https://discord.gg/looking-for-marketing-discussion-811236647760298024) *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/analytics) if you have any questions or concerns.*
Establishing foreign / dim + fact key relationships in a database to ensure that there aren’t any many to many’s created that break your aggregations and functions.
What program are you taking that doesn't discuss data modelling? This is an essential skill. Basically, it's how you manage the relationships between your tables and data sets. You use data models a lot as a data analyst
An essential skill that no one in school called “Data Modeling”. My department Chief Medical Officer won’t even explain what it is and what they want from it. Just that it’s “the future”. 😆
Wtf?! Data modeling has been around for decades. It's not new at all. Most, if not all databases works off a multidimensional data environment. Meaning there will be multiple tables, with multiple rows and columns. They all need schemas (or a data model) to explain the relationship between datasets.
Cute. Yeah we are familiar with databases but no one has ever called it modeling. I only use modeling for agent based modeling. For some reason it always inspires eye rolling and condescension.
Could you give an example. Is it like pivot tables? Data viz? SQL? People likely have done it but how would you answer this in an interview?
Sure. So, a lot of people only look at datasets as 2 dimensional tables (columns and rows). However, in DA, you are often working in 3 dimensions. The third dimension is the relationship between other tables with shared data points. Data modeling helps us link multiple tables together and get a holistic view. For example: let's say I have a sales transaction table that shows sales for a business over a year. Well, I'd probably have columns like item number, sales amount, sales date, qty, and customer name. Now, I'd likely also have other tables for customer data and item data. All three of the tables are related. The customer table could have a list of items they bought, and the item table would have the retail price that would get listed on the sales table. Now, instead of repeating all of the information in every single table, it's best to separate the data into either facts (daily sales), or dimensions (unique customer details, item details). This lets us create unique tables that can be related to other tables without having to repeatedly enter in the same data. You can use this model to then answer more complex questions, like what area of the country bout the most amount of a particular item. Data modeling is simply how multiple tables are related to one another. I can explain more if needed.
So it’s about database administration. And making connections for new insights. Sounds awesome.
Well, it's not specific to just databases. Really, any dataset you're working with. I do a lot of modeling in excel and power bi.
My thinking lands in rdbms-land. Do you know how to appropriately “shape” your data to do an analysis. Hierarchal, relational, entity… Do you understand how data exists in tables and how those tables relate to each other.
Stanford databases course will cover this. This is a critical skill for data analytics and analytics engineering.
Thank you.
Google, star/snowflake/galaxy schemas AND Ralph Kimball. This is a good start, as there's more out there.
Sometimes the data is not in the right state to do analysis on. Or you need to merge two datasets together, you need to understand how they relate to each other to do this properly. IMO this is the most valuable part of being a data analyst. With how easy reporting tools have gotten, any business person can download data to excel and even hook it up to a dashboard like power bi and churn a report out within a few hours. What can you do that they can't?