Failover cluster


Failover cluster

Author
Message
Dan Morgan
Dan Morgan
Forum Sage
Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)

Group: Forum Members
Posts: 10, Visits: 29
I'm using a 2 node failover cluster in Windows server 2012 to provide file and SQL servers for Myriad.
During testing I can see that the cluster is failing over to the second node, however, Myriad still crashes - indicating that it has lost connection to the SQL/file servers.
This is obviously not the desired outcome, is anyone running Myriad from a failover cluster, can you suggest whet I might be missing?
Edited
3 Years Ago by Dan Morgan
Peter Jarrett
Peter Jarrett
Forum Sage
Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)

Group: Broadcast Radio
Posts: 1.8K, Visits: 3.7K
Hi Dan, Are you running v4 or v5?

V5 is the first version that was designed to run from a Cluster, and in *most* scenarios we've tested, it successfully follows the SQL and files as the roles rollover to the new host server, but of course this is in lab conditions here!  The currently playing item will of course stop, but Myriad should then carry on from the next one on.

I have to say this was using Windows 2016 rather than 2012 - but 2012R2 is broadly similar to 2016 in terms of the failover support, but I have a vague recollection that there were some additional parts to 2016 that might be in play here as well of course.


------------
Peter Jarrett, Technical Director
Broadcast Radio Ltd.

Bill Bailey: No win, no fee, no basis in reality. Just a room above a minicab office in Acton and a steady stream of greedy simpletons whose delusion is only matched by their clumsiness


Dan Morgan
Dan Morgan
Forum Sage
Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)

Group: Forum Members
Posts: 10, Visits: 29
Hi Peter - we're running v4

This probably illustrates my limited understanding of clustering - isn't the idea that the fail over is seemless, or is there a slight delay in switch over which causes myriad to halt?

Peter Jarrett
Peter Jarrett
Forum Sage
Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)Forum Sage (276K reputation)

Group: Broadcast Radio
Posts: 1.8K, Visits: 3.7K
Don't worry, clustering is simple in theory, but VERY complicated in practice, as we found out when building 5! Smile

With v4 and earlier, we maintained a persistent connection to the database on the SQL Server which worked just great as long as the server stays running - which of course if 99.999% of the time ideally! This persistent connection was the usual way of connecting to databases for SQL2005 (which was the main version at the time!) plus essential for users still needing to run with the old Jet file based databases.

But with clustering, when SQL-A fails, the cluster realises it needs to spin up SQL-B, (which yes, takes a short delay) but when it's running, SQL-B has no record of the connections that SQL-A had back before it died, so Myriad can no longer get any data from that server. 

With v5 we changed to a per-request connection system which is how all modern DB systems expect connections, and is essential for things like clustering to work. This would actually be very slow if every time a connection was needed a new one had to be created, so In the middle there is actually a very clever SQL Connection Cache (managed by Windows itself) which  keeps track of connections and re-uses them between requests, so the performance is kept incredibly fast. This is "cluster aware" so if the server dies then the cache knows to kill off it's contents and start working with the new server.

Myriad v5 also has a fairly extensive multi-retry system baked into the code as well so that in the event of a SQL Server running slow (or in the case of a cluster, taking a little while to spin up the new server)  Myriad actually retries certain queries multiple times to try and get some data to keep running - this is why v5 is much more tolerant when run on flaky hardware or over slow links that often have packet timeouts as we see on large WAN systems in some overseas countries.

Hope that makes sense?


------------
Peter Jarrett, Technical Director
Broadcast Radio Ltd.

Bill Bailey: No win, no fee, no basis in reality. Just a room above a minicab office in Acton and a steady stream of greedy simpletons whose delusion is only matched by their clumsiness


Dan Morgan
Dan Morgan
Forum Sage
Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)Forum Sage (2.8K reputation)

Group: Forum Members
Posts: 10, Visits: 29
Peter Jarrett - Tuesday, September 25, 2018 11:39:33 AM
Don't worry, clustering is simple in theory, but VERY complicated in practice, as we found out when building 5! Smile

With v4 and earlier, we maintained a persistent connection to the database on the SQL Server which worked just great as long as the server stays running - which of course if 99.999% of the time ideally! This persistent connection was the usual way of connecting to databases for SQL2005 (which was the main version at the time!) plus essential for users still needing to run with the old Jet file based databases.

But with clustering, when SQL-A fails, the cluster realises it needs to spin up SQL-B, (which yes, takes a short delay) but when it's running, SQL-B has no record of the connections that SQL-A had back before it died, so Myriad can no longer get any data from that server. 

With v5 we changed to a per-request connection system which is how all modern DB systems expect connections, and is essential for things like clustering to work. This would actually be very slow if every time a connection was needed a new one had to be created, so In the middle there is actually a very clever SQL Connection Cache (managed by Windows itself) which  keeps track of connections and re-uses them between requests, so the performance is kept incredibly fast. This is "cluster aware" so if the server dies then the cache knows to kill off it's contents and start working with the new server.

Myriad v5 also has a fairly extensive multi-retry system baked into the code as well so that in the event of a SQL Server running slow (or in the case of a cluster, taking a little while to spin up the new server)  Myriad actually retries certain queries multiple times to try and get some data to keep running - this is why v5 is much more tolerant when run on flaky hardware or over slow links that often have packet timeouts as we see on large WAN systems in some overseas countries.

Hope that makes sense?

It does make perfect sense - thanks for a very succinct explanation.  So, in fact it isn't possible to make v4 fault tolerant? If so, at least I now understand why...and we also have clustering in place for when v5 comes along (which I will now be making a case for!).
Thanks again, Peter!
GO


Reading This Topic


Login
Existing Account
Email Address:


Password:


Select a Forum....





























Broadcast Radio Forums


Search