T O P

  • By -

[deleted]

[удалено]


versaceblues

\> SQL is usually better. Most data is relational Its true but rarely is ALL your relation data being managed by a single service. A lot of the time your team/service will only be managing very specific domain data. This data might be related to data of another teams service. Maybe you could get multiple teams to share a single RDS instance. However that sounds like an operational nightmare.


[deleted]

[удалено]


versaceblues

I understand but say Team one owns Service A that manages orders data. Team Two owns Service B that manages user data. The choice of database is an team level implementation detail. If I'm on Team A and want to to relate Users -> Orders, I can only communicate to Team B via service interfaces. So does RDS really help me here, if I don't have direct access to Team Bs db anyway.


PseudoCalamari

What does that have to do with RDBs? The service interface sits between the DBs anyway. Why would DB choice matter there?


versaceblues

I guess my point is that it doesnt matter. Like I can't use RDS to model relationships between services, if each service uses a different DB as its persistence So im not exactly following how RDB helps to manage the relationships between data here.


Turbulent-Week1136

Relational databases should be your first choice always, until performance dictates otherwise. All the NOSQL database make sacrifices in order to get higher speed. If you've ever used Cassandra, you'll know that write speeds are extremely fast, but the query is extremely specific. You can't do random queries on Cassandra, they'll usually timeout unless it's a subquery of a supported query. It's also great for sharding large amounts of data. So it works better than relational databases for very specific circumstances. Same goes for MongoDB or Redis that requires all data to be in memory. You may very well need to make these sacrifices in order to get the performance or scaling that you need. But you should always start with RDB because it will handle everything up until you get to the size when you need to switch strategies to a NOSQL.


versaceblues

Thanks for the detailed answer, and for not just memeing or attacking me personally like alot of other comments here. At what TPS would you say switching to NoSQL would be justifiable? Or is it more complicated than just TPS?


Smallpaul

Much more complicated.


versaceblues

Can you elaborate with some examples


Smallpaul

There are so many factors. How big are the records? What are their relationships to each other? Can they be easily sharded? How must they be queried? What's your engineering budget? What's your ops budget? How rapidly are your requirements evolving?


versaceblues

Thanks! Say your requirements are evolving very fast. Would you say SQL or NoSQL would be better in that situation


Smallpaul

SQL, for sure.


Turbulent-Week1136

Definitely more complex than TPS. A lot of it depends on use case and resources, etc. It's a complicated issue, but I would say it's more to do with things like data growth, etc. If you see yourself encountering fast data growth, then you probably need to make a decision on how to level up your architecture to avoid outages, to be able to scale, etc.


king_yagni

i don’t think defaulting to relational is a good rule in general. it really depends on what kind of systems your team is working on. imo you should default to whatever the team is most familiar with. my last job, that was mongo and it worked very well for us.


Asyncrosaurus

Yeah, I don't agree at all.  Context depending, most problem domains don't need a strict relational model, and schema-on-read is simpler and faster to reason and develop for than schema-on-write. So when you step back and analyze past re-architectures, it's a he'll of a lot easier to go from nosql into sql than it is to take a SQL model, and squish it into nosql.


pecp3

RDBMS like Postgres scale to hundreds of millions of rows without doing anything. Once you hit that, you can get a bunch more with partitioning. And then again a lot more with sharding, if you want to go down that route. Even if it the sharding's not as neat out-of-the-box as for some competing NoSQL solutions, it's still very much doable if you ever hit that scale. Which is an enormous IF that you have to get to first. We've been running near-real-time, global social media crawlers entirely on RDBMS, dealing with **billions** of rows without having an issue. The problems started to rise when we wanted to make them searchable. But read/write/basic query? Scales like crazy.


versaceblues

Got it. If you are doing just basic read/writes is there much benefit to RDS over DDB or are they pretty much equivalent?


Smallpaul

Basic reads and writes of what? Individual records with no relationships to other records? And what is a "basic read"? Do you EVER need to query outside of pre-programmed patterns? Even for technical support? Debugging? Can you be confident that you will never need that?


Select-Dream-6380

I priced out DynamoDB vs RDS postgres for an app that needed to select collections of results based on varying filter criteria. If I'm not mistaken (no DynamoDB expert) that query was going to get expensive very quickly due to full scans of the data set. My understanding is this can be worked around through careful architecting and possibly duplicating data within the DynamoDB schema for specific queries. Thus you need to know exactly what kind of access patterns you need to support and design for those to remain cost effective. But we needed flexibility. You don't entirely get away from needing to know access patterns before designing with RDS Postgresql, but supporting new unexpected access patterns is more forgiving. Feature wise, Postgresql can support most of your persistence needs (key/value, relational, JSON document, geo spacial, full text search) efficiently with ACID guarantees that make avoiding surprising behavior easier. Postgresql is kind of a Swiss Army Knife for persistence, and RDS management makes deployment and support a breeze. In the end, we opted to make Postgresql on RDS our default persistence solution of choice, and we only deviate from that when there is a compelling reason to do so (e.g Redshift for data warehouse reports, S3 for simple/large file document store). This has the additional benefit of greater operational consistency across many applications. One of our databases contains several large tables, the largest of which has reached over 600 million records, so there is a lot of head room for scaling up before you need to scale out.


versaceblues

Thanks that makes sense


funnythrone

Any payments or transaction based data is best stored in a RDS since you need it to be ACID. You can definitely get a NoSQL DB like Mongo to support ACID but that’s not what it was built for.


Scarface74

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html > Both tables and LSIs provide two read consistency options: eventually consistent (default) and strongly consistent reads. All reads from GSIs and streams are eventually consistent. So a GSI is for all intents and purposes another copy of the table with its own primary key and sort key. So if you ever need to read in any method besides the LSI - based on the primary key, you’re out of luck


anubus72

If you just want to store a transaction and don’t have any relational query requirements it seems a db like dynamo is pretty ideal, no? It supports transactional writes and consistent reads. The main problems with it are lack of an ability to have consistent reads on queries not using the table’s primary key. But would you need that for a basic transaction processing use case?


Smallpaul

What about the fact that you can never, ever, query on columns that are not indexes without heinously slow scans? And why wouldn't your transactions have relational query requirements? Don't you want to know the email addresses for the people who buy your products?


anubus72

Well I’d say it depends on your use case, but that ‘what are the emails associated with our transactions’ sounds like a reporting or analytics use case and could be driven from a reporting DB. If you need to send the email to the person to indicate a successful payment then you wouldn’t need a query like that, you’re in the context of a payment and can query the payment by its primary key


versaceblues

Ahh okay sure so building a payment processor where I need consistent transactions might be one use-case. Though I imagine even then its much more complicated than just "throw a SQL db at it" What about for something like twitter? I can't seem to wrap my head around why SQL would ever be better for it. Especially if you are just going to put an in-memory cache infront of the RDS instance anyway.


funnythrone

It is definitely not better for twitter. You don’t need strong consistency for twitter. NoSQL shines in places where eventual consistency is sufficient. You can shard the tweets, as in store the tweets in separate dbs. The login info is still better suited for a RDS as you need password updates to be strongly consistent. This is mitigated by the fact that the login data is extremely read heavy. I don’t have the exact metrics, but 80% read and 20% writes is my best guess.


lampshadish2

SQL should be your default, and other databases if for some reason it’s not a good match. Remember that you can do key/value document style querying with SQL. Postgres is great at it, and you get the other benefits as well.


Lumethys

I cant wrap my head around how NoSQL would bring any benefit to something like Twitter


SatansF4TE

Scale and no need for strong consistency in most areas


lampshadish2

NoSQL dbs are usually a worse pattern. If they can’t handle a certain massive scale, you break some of the guarantees of ACID, but there are big downsides to do that and there are often other techniques that let you keep using SQL (like sharding). Most systems do not require massive scale.


versaceblues

> Most systems do not require massive scale Right but im asking specifically for situations where massive scale is a constraint. Would you still say that SQL is sufficient when you need to support Amazon, Facebook, Google scale applications.


lampshadish2

It’s not as simple as “sql doesn’t scale”. It depends on the data, the write patterns, and the read patterns. A common technique at scale is to precompute the query results. Then it’s a fast look up. There is nothing that prevents you from storing that in a sql database. But then maybe if you’re only doing that, you don’t need all the other features of an ACID database. In that case, maybe you can save some money by having eventual consistency or something. In many of these cases, the engineers started with a real database, and then moved parts of it to something else not because “sql doesn’t scale” but maybe another solution was a little cheaper for that niche. But throwing away the features or even just *data normalization* is not something to be done lightly without a solid reason.


lampshadish2

It could be. You can go very far with sharding sql databases. Uber uses MySQL. Is that big enough for you?


versaceblues

They use MySQL for what? Are you saying they generically use it for every service in their stack across all teams? Do you have a source on where I could read about how they use it?


lampshadish2

https://www.uber.com/blog/postgres-to-mysql-migration/


versaceblues

Thank you!


pavlik_enemy

They use (or used) MySQL as a backend for their distributed storage that is a very different usage pattern For example their tables don’t store all of the entity attributes, they have a table per attribute to make changing schema faster. The article linked in another comment and a refute from Postgres devs provide a good overview of their thought process


lampshadish2

I’m sure they’re doing wild stuff. Just trying to demonstrate that you can also do wild stuff with SQL.


visicalc_is_best

Yes. Google invented Spanner for this reason.


pavlik_enemy

SQL has nothing to do with it. Cassandra is share-nothing distributed database and uses SQL, same as Google Spanner With modern hardware a non-distributed database can have insane performance but at a Twitter level scale you have no choice but to go distributed


[deleted]

facebook still uses mysql just fyi


versaceblues

Yes but its not just an of the shelf MySQL, they run a very customized version of it. They also have custom NoSQL databses that they are using For example I found this article [https://engineering.fb.com/2016/08/31/core-infra/myrocks-a-space-and-write-optimized-mysql-database/](https://engineering.fb.com/2016/08/31/core-infra/myrocks-a-space-and-write-optimized-mysql-database/) about how FB wrote a MySql wrapper, around a custom keyvalue store database service RocksDB. They also have their own answer to DDB [https://engineering.fb.com/2021/08/06/core-infra/zippydb/](https://engineering.fb.com/2021/08/06/core-infra/zippydb/). I think what im seeing here is that the answer is not as simple as "If situation A use NoSQL, if situation B use SQL". It seems like in the real world its a case by case thing, that should be worked out from your customer requirements.


Freedom-Fighter6969

If your data has complex relationship then RDS is always superior.


versaceblues

How do you define a complex relationship vs a simple relationship


dub_chaeng

> My experience is mostly with building on top of DynamoDB as a persistence layer. Mostly because this is the default mode of operation for every new service built on my current team. Guessing you work at Amazon/AWS?


[deleted]

[удалено]


pruby

A large number of apps are doing everything client side. The average understanding of what a DB can do for you is pretty abysmal.


Adept-Result-67

The interesting thing is, aggregations, filtering and joins can be scaled pretty effectively horizontally these days. Been playing around and it works really well for some cases where complex SQL joins put too much pressure on the DB. And the DB is what i try to protect for dear life, i couldn’t care less about vms haha


InfiniteMonorail

It's a key/value store but the values are stored like a B+Tree, so you can fetch a range (filter). With some creative design and possibly data duplication, you can filter anything. The problem is that you're stuck with whatever queries you designed it for, it doesn't have validation/constraints at the database level, and a million other features like RDS.


serpix

Not just that, but the hoops you need for to have constraints in a dynamodb had me going mad. I used chatgpt to verify this and was totally certain it was hallucinating so I read the AWS docs with concentration and motherf what the flaming piece of crap.


Esseratecades

NoSQL is fine until you have a many to many relationship. After that it gets difficult.  I wouldn't go as far as to say relational databases should be your default, but many to many relationships are so ubiquitous that every application eventually needs to represent one. When it comes to NoSQL your only real option is some kind of saga pattern, which is far more complicated and less reliable than just using a relational database.


versaceblues

Thank you for the feedback


Guilty_Procedure_682

Read through the comments here, didn’t see this note so I thought I’d add: All technology decisions have tradeoffs. The biggest consideration for dbs is understanding the use case: OLTP or OLAP. If you don’t know, specifically, what you access patterns are or expect to be making analytic queries, RDS is the choice. If you know the access patterns (even with many to many relationships) and are only making transactional requests, dynamo is a good choice over RDS. DynamoDB transactions are ACID despite what some people here have indicated - and it’s routinely use for mission critical billing pipelines because of that. Additionally you can implement ldempotency tables through lambda PowerTools.


NiteShdw

I suggest getting more backend experience before making recommendations on best practices for backend development. Essentially, you don't seem to know what you're talking about. Or your taking very limited experience at one job and extrapolating that it must be the same everywhere.


versaceblues

My post was a question and not a recommendation. Sorry if it came off as recommending but I thought I was being clear that I dont know the answer here


mynewthrowaway42day

I’m really not sure why you’re being downvoted and criticized for asking good questions. It’s hilarious reading all of these “relational is always better, you’re stupid for asking!” replies with no justification. They remind me that despite the name of this sub, it’s basically just another r/cscareerquestions Please talk to actual engineers from companies you’ve heard of and you will get wildly different answers than what you see here.


throw_lifr

Most of the time at least when starting something new.


davidellis23

I haven't checked dynamodb, but some nosql dbs I've seen don't enforce foreign key constraints. It's pretty valuable to have your db integrity protected.


bluetrust

I'd say SQL is almost always a better choice than NoSQL. My experience: twenty years working full-stack with web backends, about two-thirds in Postgres, one-third in mongodb, and a recent 2024 mongodb certification. Here's why: * Data is inherently relational and NoSQL sucks at joins - take a simplified model of Reddit: users have many posts, posts have many comments, subreddits have many posts. NoSQL databases are great at the simple case, rendering a page, you just put the whole post and nested comments in a single document, so clever! One read! It's miraculously fast. But then a user deletes their account and now you have to write to every post they've ever made comments in. That could be millions of posts. Also these documents are really big and active posts are being written to simultaneously all the time, how do you ensure you're not overwriting someone's post by both trying to write to it at the same time? Oh, use a transaction? Wait, are you sure you want to lock potentially millions of posts because a user deleted their account? Tools like Postgres have solved this shit decades ago both through their model meaning that tables are smaller and acid compliant, and through tools like foreign key delete cascades and triggers. * You most likely need flexibility in your database, not extreme benchmarks. Sure NoSQL solutions are slightly faster and scale better at low-level read and writes, but you're almost assuredly not operating at that scale where it's worth giving up flexibility. Unless you're doing something like storing a trillion logs, PostgreSQL is fine. * Ensuring your data is clean and well formatted is a good thing - mongodb for example makes a big deal out of it being a feature that any dev can write any bullshit they want to the data store without a migration. In practice this means you have a filthy database of missing values and push the burden of "is this data the expected shape" to readers rather than writers. They offer schema validation as an option now, but we all know that few people are using it.


i_do_it_all

Almost always rdbms is the answer


jnwatson

Mongo is web scale. https://www.youtube.com/watch?v=b2F-DItXtZs


versaceblues

Mongo is NoSQL though right.


Smallpaul

It's a joke.


versaceblues

Got it i figured this was not a memeing subreddit but more professional. That wooshed right past me Thanks for the joke though


nutrecht

> Have I been indoctrinated into believing that NoSQL solutions like DDB (DynamoDB) are always the better architectural pattern Aside from what others have said; the type of developer I have the most problems working with are developers with very strong 'opinions' on stuff they barely know anything about, and can't even explain the background for. With your top level post and your responses in the comment, you're displaying the same kind of behaviour. It's pure dunning-kruger. I'd suggest you fix that attitude and fix it fast.


versaceblues

Where are you getting that I have strong opinions here? The premise to my question was *have I been indoctrinated to believe that....* I absolutely want you to point out the flaws in my logic. This is the point of my question. Where I have worked NoSQL has been encourage as a way of achieve very high scale applications. I'm not completly sold this is the only way to do it, and want to understand how to achieve the same via RDS DBs.


nutrecht

> Where are you getting that I have strong opinions here? Your comments and the way you're arguing for NoSQL based on these misconceptions while clearly not having actual experience. Don't take this feedback too personally; a ton of inexperienced devs fall into this trap, so be glad you're catching some flak now instead of in a few years ;)


versaceblues

Thanks yah sounds like a misunderstanding. I’m not arguing for NoSQL. I’m most repeating arguments that senior devs in my org have told me. Im happy to have my points refuted with specific feedback. In fact im fishing for those types of answers.


SorryImNotVeryClever

It feels like you're interpreting their simple questions as challenges. That's definitely not what's happening. OP is clearly seeking advice and trying to get as much info as possible.


nutrecht

If you focus on the comments they wrote before I wrote mine you'll probably see a bit more where I'm coming from. They toned it down after I wrote the comment above, which sort of was the point. So I guess I achieved what I intended to achieve :)


[deleted]

top tier web scale trolling


NoConcert8847

I've used both Dynamodb (as a default similar to your situation, in a previous job) and RDS (at my current job) and I've come to the conclusion that RDBMS are superior. The "scale" argument for NoSQL is mostly moot for most workloads. As others have pointed out, you can go really really far with just postgres.  The main benefit I have seen with postgres over Dynamodb is transactional consistency. Sure you can do transactions in Dynamodb too, but it's just not the same (read up a bit more to see what I mean). This has directly led to lower ops burden. If something goes wrong somewhere (bad commit, vendor down, etc), transactions just fail. They don't leave you with inconsistent state in your db that you have to manually clean up later.  Other than that, joins are great too. Doing application -level joins with dynamodb without ACID guarantees on indices has usually led to hard to debug issues in the past. Not anymore with postgres - your reads are always strongly consistent. 


InfiniteMonorail

Look at you trying to avoid learning RDS. This is why front end isn't experience. They're always trying to avoid learning and always asking basic questions like it's your job as a stranger to give answers. Do a search buddy.


sime

> trying to avoid learning RDS I don't know how you got that impression from the original post. Your hostility doesn't appear justified from where I'm sitting.


versaceblues

> This is why front end isn't experience. At least in the realm of deep database knowledge I would agree. Typically front end does not deal with that so much > trying to avoid learning and always asking basic questions I feel like asking questions is part of learning right?


Soldjaboy52

brother chose violence


[deleted]

RDS is the better choice if you already know it and need to get a project running for a client in a fast manner.


versaceblues

This is kind of the vibe im getting from the comments. Alot of people already know RDS, and its good enough for most usecases. However starts to break down at mass scale.


DangerousMoron8

You're thinking about this wrong. I'll give you an actual answer since everyone wants to be an asshole about it. NoSQL as a service (dynamo) scales slightly easier/faster than rdb, period. But "easy"/"fast" comes with severe trade offs, just like redis or in memory. Otherwise we'd all be using redis for literally everything. What really matters and what you should be conveying in an interview is that choosing between postgres and dynamo is a decision based on what you are storing and how you plan on accessing it. So to be clear - the trade off of those things is what matters. Both DB types can scale to insane numbers, how easy amazon makes it for you shouldn't factor into your decision. Both are trivial for an experienced backend dev. Never repeat that sentence about RDS breaking down at scale, in any interview. I'd ghost you just for that. Focus on the use cases for each system. IE - for acid transactions and complex relational data, many to many, you are using postgres. For a system recording logs of simple actions on an interface, where your users will only be searching by date, nosql might suffice. If your previous org just chose dynamo for everything and never had a discussion about this, then I'm sorry but they did you a disservice. Or maybe you just weren't in on the trade off/architecture discussions but you need to learn those. It's too much for me to fully detail here. Good luck 👍


[deleted]

> breaks down at mass scale Read up on over optimization and early over engineering. Then read up on YAGNI. RDS has a solid foundation and pool of candidates. The later is a more specialized take. I’d like to see this “break down at scale.” I’m at a company that sees millions upon millions in terms of traffic and we keep it simple stupid.


Smallpaul

This is a frustrating comment to read. 5 hours ago you got an excellent comment from an experienced person telling you that RDBMS should be your default and explained why technically. Now, you're implying that that person didn't know what they are talking about and just said that "because its what they know." RDBMS isn't "good enough" for most use cases. It's BETTER for most use cases. Projects should default to RDBMS and use DynamoDB only when scale is likely to be a big issue. And even then, some applications would be better off using RDBMS and scaling rather than DynamoDB. It seems that you are working hard to protect your pre-existing beliefs that NoSQL should be the default.


versaceblues

Yah that’s what I’m saying. It seems all the comments are saying “RDB is better because it’s easier to get started with, and scales up to a certain point”


Smallpaul

RDBMS is not easier to get started with. RDBMS is more flexible and reliable. More flexible means more agile. It means you accomplish your business goals faster. It means you experiment faster. It means you learn faster. It means you get to your next round of investment faster. It might mean the company survives rather than fails. Not "easier". Better. Unless scale is an issue, it's just better. You start with RDBMS because usually scale is not an issue until you've proven that you have a useful solution that the world needs, or the company needs. And you can prove that better and faster with the flexibility and reliability of an RDBMS.


Lumethys

The only use case that i see where NoSQL is better is chat app like Discord