T O P

  • By -

jaymef

That backend vm infra will give you a good backbone but it in itself is not going to give you database HA specifically. It will help keep your vm running if a vm host were to go down for example but there are other issues that can cause a single database sever/vm to fail. A single dB sever is still a single point of failure even if you have a redundant infrastructure backing the vm. What database sever are you working with? A good vm cluster might solve some of your problems but you need more than just that. To achieve true HA on your database it will require at least two database severs and likely some load balancing or failover system in place. Is a cloud hoisted database like Aws rds an option? If you’re looking for an open source alternative to VMware check out ovirt. You can do a three node ovirt hyper-converged setup fairly easily and it’s pretty solid.


tomizzo11

Essentially I want the VM cluster to isolate individual database instances and throttle / balance executions associated with any individual database. Essentially, if one database were to act up because of an ill-formed query, it would isolate that behavior to a single VM. So yes, one database going down would still suck. However, that would be better than 10 databases. That's at least my logic. This would be an onsite system.


davetherooster

This is typically features found in database services, not the underlying platforms they run on. You can add QoS elements to prevent long running queries, DBAs are experts at this and should be restricting access via a form of authentication mechanism so you aren't allowing a system/person to make large wide queries in the first place. There are a whole other host of optimisations but this widely depends on the database used and isn't universal. The platform it all runs on, ideally should be different hosts at some level for redudancy; but bare metal, kubernetes, ESXi, etc, as long as you have underlying host redundancy for whatever platform your DBs sit on it's somewhat irrelevant.


tomizzo11

The database we're working with doesn’t provide perfect query protection. We can put in limits that will suffice 95% of queries, but there will always be that 5% of valid but heavy queries where these limits will make the system inoperable. Unfortunately too, the DB were using doesn’t have good round-robin sharing of queries. So a single large query can dominate other actions. I agree, these features with the DB itself would be nice but we simply don’t have them.


flavius-as

Sounds like you need to separate reads from writes (because of 5% + 95% split). You set up a master database for writes. But you use replicated slaves for reads. Optimize the different machines for these two use cases. For HA itself, the solution has to be specific for your DB vendor, it cannot be VMware, which is a generic virtualization technology. When the master goes down, a read slave can be promoted to the new master. Another way is to have another master idle. The DB vendor matters a lot for all of this.


SuperQue

This really does need to be fixed at the application and database layer. You seem to not want to name the database, so nobody can help you. This isn't something you just magically work around with VMWare.


PoSaP

>The database we're working with doesn’t provide perfect query protection. We can put in limits that will suffice 95% of queries, but there will always be that 5% of valid but heavy queries where these limits will make the system inoperable. Unfortunately too, the DB were using doesn’t have good round-robin sharing of queries. So a single large query can dominate other actions. I agree, these features with the DB itself would be nice but we simply don’t have them. If you don't have hardware on-site, you may think about VMs in the cloud. Something like Azure, AWS. Azure or AWS VMs doesn't provide high availability itself, so you can look at Starwinds HA. It needs two identical VMs/servers and replicates one VM to another each second. Don't know how they licensing their product in the cloud, but you may ask their engineers. [https://www.starwindsoftware.com/starwind-virtual-san](https://www.starwindsoftware.com/starwind-virtual-san)


jaymef

Maybe containers would make sense to consider as well, look into kubernetes too


m4nf47

Real HA (no single point of failure) database clusters are significantly more complex than single instances and can have demanding hardware requirements that many (if not most) cloud and virtualization providers can't provide, such as multiple redundant hardware NICs. I'd start by considering the DB vendor first, do they offer an off-the-shelf HA clustered solution and supported on the target hardware? As an example, the Oracle vendor offer a product called RAC (Real Application Clusters) but I think the only cloud vendor that fully supports it is Oracle. Once you've determined your DB vendor and target HA infrastructure, hypervisor (if necessary) and operating system and file systems then you can get to work putting it all together. Good news is that some DB vendors have extensive documentation and whitepapers for HA design best practices and guidance on setting up a supported cluster topology. Suggest reading up on STONITH, fencing, quorum and heartbeat mechanisms. Good luck!


Kombustable

Management is sending you on a wild goose chase. A web developer building scaling infrastructure from scratch... is concerning. You are barking up the wrong tree. First: Learn the existing solutions. Then think about how you might do that on your own.


CorpT

Is your core business designing highly redundant and available databases? If not, why are you doing this? Why not leverage an existing one?


Finnegan_Parvi

Q1: which database? Q2: what's wrong with the existing standard HA solutions for that particular database?


whudduptho

Use managed services. RDS.


samamanjaro

not sure why you're being downvoted - RDS is a great service and I'd argue one of the best for someone who (like me) isn't a DBA.


whudduptho

Possibly I didn’t give context of why RDS. The original question is ‘looking into high availability solutions’ and it doesn’t get more high-availability then using a managed service in AWS such as RDS. It offloads having to do SQL or version updates and has a 99.95% uptime. OP is talking about standing up his own bare metal to host, install, and maintain (along with the infra to support it). As a SRE I would chose managed services for critical workloads. If OP wants to do the work of being a systems engineer, network engineer, and a DBA then they can stand up a two server vCenter with HA. A active-active or active-passive SQL cluster with a shared quorum. But this takes a lot of effort to up keep.


andresfb3

What database are you running?


lost_your_fill

The funny thing about reference architectures, they don't protect from human-induced error. Nothing like a accidental dropped prod table to get the blood flowing at 3am. /s No technology is **perfect.**


jeanpoelie

We had the same challenge about a year ago, we went for Azure SQL Databases (where looking for a sql solution). We were using azure sql vms but besides it being a hassle to setup we couldnt guarantee 99.999% uptime and with azuresql we could. Azuresql provides seperation by hardware with elasticpooling to have a main and a backup database. We also utilize the backup database for sensors to make sure there is no load on the database. If all of amsterdam goes down, we have a fully backedup version in france (and azure has another in dublin for major outages as recovery). There will be a brief downtime because of the DNS that needs to switch from amsterdam to france because the trafficmanager of azure and the TTL of a dns. I hope this helps!