Multiple Servers - Hot Spare (DRBD/Load Balancer)

quadium

Verified User
Joined
Sep 29, 2007
Messages
44
Okay, so this is our plan:

We have a current DA server but wanted to add redundancy. I know you get roasted by jlasman for not searching the forums, but all I keep reading is him telling people to search the forums, and then finally to search Google. haha

Nobody seems to have any solid answers.

My problem is that I'm not a fan of single points of failure. I'd prefer to run a hot spare so that our client's don't experience any interruption, at least none that are our fault.

So Rsync and DRBD and Clustered DNS all seem to answer some questions, but not others. DRBD seems like the closest to our needs, but we still run the problem of multiple IPs. We aren't really a fan of Round Robin DNS because you never know which server you are going to get, and if one goes down you don't want them to end up there.

At the bigger level vSphere seems to answer other questions, but we aren't at that level.

So here's our plan:

Set up DRBD on both servers to replicate the data between them in real-time. Again, redundancy is the goal, not capacity. We have extra NICs available just for this task. Each machine has a 4 disk RAID 5 array, and experiences light to moderate activity.

From what I understand, DRBD makes Rsync irrelevant, as it copies blocks of data as they are written to the other, in essence, making a network RAID1.

So this leaves us with one server with the IP that everything's assigned to and one server with a different IP.

We are planning on introducing a Failover Appliance (Such as a Barracuda Load Balancer), to which we will assign the server's existing IP, then assign a new IP the old server. SO in theory, all request will come into the Load Balancer which has the old DA server IP, then the Load Balancer sends all traffic to the primary node, which is then replicated to the secondary node (hot spare). If the primary server fails, the LB should then send all traffic to the secondary node, causing a fairly seamless transition. Aka most people shouldn't notice, aside from a short delay it takes for the LB to realize the server's not responding.

With DA is where things get dicey, as I'm not sure how well DA will like being on two different servers where everything's replicated.

We've already paid for two licenses because we have two serves so it's not like we're trying to scam the system or something.

So my question I suppose, is what's the best way to set up DA to work successfully in this environment? Is it possible? Jlasman, I've searched around the forums, and I didn't find someone who's trying to do exactly what I am...
 
Nobody seems to have any solid answers.
Because there really aren't any good answers in the price range of most DirectAdmin users.
My problem is that I'm not a fan of single points of failure. I'd prefer to run a hot spare so that our client's don't experience any interruption, at least none that are our fault.

So Rsync and DRBD and Clustered DNS all seem to answer some questions, but not others. DRBD seems like the closest to our needs, but we still run the problem of multiple IPs. We aren't really a fan of Round Robin DNS because you never know which server you are going to get, and if one goes down you don't want them to end up there.
My poor man's replication works in some respects, because each machine serves its own IP#. If it's down, it doesn't serve. Which means only people already visiting the site are likely to notice the downtime.

DRBD appears much faster than RSync, but I haven't been able to tell if it would work over a WAN, so you've still got the network as a single point of failure. For several years we ran two networks, two routers, two switches, to NICs for each machine, but we found that's not where the failure ever was.

To me DRBD looks better than clustering MySQL, which has always appeared to be the biggest problem.
At the bigger level vSphere seems to answer other questions, but we aren't at that level.
Isn't it then a single point of failure?
So here's our plan:

Set up DRBD on both servers to replicate the data between them in real-time. Again, redundancy is the goal, not capacity. We have extra NICs available just for this task. Each machine has a 4 disk RAID 5 array, and experiences light to moderate activity.

From what I understand, DRBD makes Rsync irrelevant, as it copies blocks of data as they are written to the other, in essence, making a network RAID1.
That's how I understand it as well.
So this leaves us with one server with the IP that everything's assigned to and one server with a different IP.

We are planning on introducing a Failover Appliance (Such as a Barracuda Load Balancer),
Can the Load Balancer work as a failover device? Even if it can, then that's a single point of failure.
to which we will assign the server's existing IP, then assign a new IP the old server. SO in theory, all request will come into the Load Balancer which has the old DA server IP, then the Load Balancer sends all traffic to the primary node, which is then replicated to the secondary node (hot spare). If the primary server fails, the LB should then send all traffic to the secondary node, causing a fairly seamless transition. Aka most people shouldn't notice, aside from a short delay it takes for the LB to realize the server's not responding.
Let's be simple; only one IP#.

Network topology:
1.2.3.0 = failover device
1.2.3.1 = first machine
1.2.3.2 = second machine

(I know in the real world we wouldn't use .0 as a machine address; I did it this way to make the example easy to follow.)

So a packet comes in addressed to 1.2.3.0. Is it simply redirected to 1.2.3.1, but does the destination IP# inside the packet stay 1.2.3.0? If so, then this will work, though you'll still have to have one IP# for each server that's not changed, as the main IP# for DirectAdmin registration/licensing.

If the packet contents are changed, though, it won't work unless you set up each machine with sites on it's own IP#, which means lots of stuff to leave out of DRBD (can you do that?).

And don't forget if you've got multiple IP#s you'll need to run them all through the load balancer.

I'm not positive but I believe the outgoing packets may have to go out through the load-balancer as well.
With DA is where things get dicey, as I'm not sure how well DA will like being on two different servers where everything's replicated.

We've already paid for two licenses because we have two serves so it's not like we're trying to scam the system or something.
See my notes above. It's not a matter of scamming the system; it's a matter of how the license code works.
So my question I suppose, is what's the best way to set up DA to work successfully in this environment? Is it possible?
You're going to have to try it and let us know :).
Jlasman, I've searched around the forums, and I didn't find someone who's trying to do exactly what I am...
Probably no one is. Interesting stuff. Keep us posted (no pun intended) as you move forward.

Jeff
 
I dont see why its needed at all. Keep backups incase something happens. Have spare hardware ready incase something happens. Dont use cheap hardware that is gonna fail easy.

What about database replication you didnt even mention that at all?
Why use drdb and not iscsi or nfs?
Why would you pay for anything barracuda...you know you are just paying for opensource software with a nice gui over it?

There are many guides on howtoforge how to setup clustering and load balancing. Clustering becomes a huge pain in the shared hosting environment.

What kind of ip would you put on your directadmin boxes behind the cluster 10.? 192?

Why doesnt someone setup a testbed for this to see how it would work in the real world?
 
I dont see why its needed at all.
Lots of people need it. Generally they don't buy inexpensive (cheap?) hosting from shared hosting companies. Many of us here run shared hosting companies and don't need it, but some of us might.
Keep backups incase something happens. Have spare hardware ready incase something happens. Dont use cheap hardware that is gonna fail easy.
None of that is going to destroy your 5 nines (or even 3 nines) rating :).
What about database replication you didnt even mention that at all?
Becuase you don't need it with DRBD.
Why use drdb and not iscsi or nfs?
Becaues DRBD writes to both servers at the same time, block by block. Much faster, much lower level. As both the original poster and the DRBD site point out, it's like RAID over the LAN.
Why would you pay for anything barracuda...you know you are just paying for opensource software with a nice gui over it?
Because if your customers are really asking for almost no downtime, and are willing to pay for it, then they probably like seeing brand names.
There are many guides on howtoforge how to setup clustering and load balancing.
None of which discuss the DirectAdmin related issues, which is whole purpose of the original post.
Clustering becomes a huge pain in the shared hosting environment.
Just because you and I don't have clients that can lose a few thousand dollars an hour when a site is down, doesn't mean some of us don't.
What kind of ip would you put on your directadmin boxes behind the cluster 10.? 192?
I've already discussed that issue in my reply. Many of us who aren't network administrators don't realize this, but there's no reason why you have to NAT to a private IP#. You can as easily NAT to a public IP#. And since in a local area network you can forward by nic-ID, you can have two machines with the same public IP# behind your box. I touched on that, and I asked if the Barracuda would do that. That way only one IP# on the machine would have to be excluded from the DRBD (perhaps a separate drive) each DirectAdmin box would have it's own IP#.
Why doesnt someone setup a testbed for this to see how it would work in the real world?
Great idea. Anyone with the desire and a VPS host with multiple VPS clients, and some networking experience, could test it at low cost.

Jeff
 
I've been thinking about this as well.

My idea is similar to that of TS, with the exception of using a separate failover device. I would let the two servers check each other using heartbeat - as soon as one of the servers seems to have failed, it will simply switch over the IP from the failed server to the standby server.

For licensing this wouldn't be an issue, just license it to a unique IP per server. The IP used for your webhosting (in heartbeat terms a 'virtual IP', I believe) will differ from the IPs you license DA to.

There is a problem though: as soon as you have to fail over, you've got an exact replica of your failed server. But the state of the server is not similar: for instance configuration files that might have changed, have to be loaded. This would reloading of quite a few services, which could cause all kinds of unexpected problems.

Also you have to be sure that DA on you secondary device is not writing to the same files as you primary box is, as that might confuse DRDB (or even corrupt the file - I haven't tested that yet).
 
in all honesty, i dont think DA was designed for this type of redundancy.

this topic is sorta old so may i ask if you got it working?
 
I know you get roasted by jlasman for not searching the forums

We tell people to search the forum ONLY when we know we have seen the answer here but don't remember exactly where. To give anyone an exact link would require that we search the forum ourselves, something which anybody can do themselves as well.

We don't tell people to search the forum just because we want to post something. We tell people to search the forum because we know the answer is here and they can probably find it themselves a lot faster by searching rather than sitting around waiting for somebody to spoon feed them the answer.
 
Back
Top