Atlassian high availavility: cold failover?

Xabier Davila September 26, 2011

We are trying to setup a Disaster Recovery solution for our Atlassian applications (so far, Crowd, JIRA, Confluence and Crucible). The production environment consist of the following servers:

  • Crowd: running on Windows, linked with AD to provide user authentication
  • JIRA, Confluence and Crucible: Each of this runs on its own Ubuntu 10.04 server
  • MS SQL Server 2008: Database server shared by all the above

Our approach is to have every server replicated to a cold server (in a different geographical location), do an rsync to keep the different data folders up to date and have a secondary database server that we keep up to date with database replication.

First issue is to make sure we filter what files are replicated through rsync, so we do not overwrite the cold server settings like database configuration (should point to the failover DB server).

Second problem is to filter what tables get replicated for the databases. The last releases of Atlassian apps have the User Directory configuration stored in the DB. This means that if we do not filter this settings, we'd have the failover JIRA server pointing to the production Crowd, instead of the failover one.

Still haven't completed this setup, but would like to hear of any thoughts about this setup and other possible solutions to provide resiliance to our Atlassian environment. I'm specially concerned of the administrative burden that this will bring when upgrading the live environment. Also, any changes in the configuration files and/or configuration settings stored in the DB in future releases would probably mean our cold failover environment will be broken.

2 answers

1 accepted

3 votes
Answer accepted
JamieA
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 26, 2011

Sounds unnecessarily complex... your JDBC url should contain the DNS alias for the database server, such that if the database is failed over then it the same url automatically points to the DR database system. Unless you are a very small company this should be provided for you by the DBAs I would have thought.

I don't use Crowd, but the same thing applies to LDAP servers. You point to one that gets round-robinned by DNS, and any that are down get dropped automatically. So I'd suggest you just set up DR for Crowd and use F5s or whatever to automatically have the crowd url directed to the correct crowd server.

We use a clustered filesystem so in the event of failover the filesystem is automatically mounted on the DR machine. If we had to change configuration files or ensure that they had not been synced that would just increase the chance of a problem in an already panicked situation.

In short, at least for the DB thing, try to leverage whatever your DBAs recommend.

JamieA
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 26, 2011

Thanks for the tick, hopefully other people will chime him with more information. One final piece of advice - test it! And then again every 6 months or so.

Xabier Davila September 26, 2011

Have to say your solution is embarrassingly simple :)

I agree it'd be good to hear from other people implementations.

I'm thinking of creating static entries in the failover servers hosts files to point to LDAP and DB server. This way we can test it without bringing the prod environment up and there'll be less steps to follow in case of failover. We are thinking of doing this manually, no F5s ;)

3 votes
Stefan Broda
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
June 5, 2012

On this topic: Atlassian has just released a dedicated best practice guide for High Availability. It covers a cold failover scenario and includes implementation details on reverse proxying, monitoring, replication and failover mechanisms:

https://confluence.atlassian.com/display/ATLAS/Failover+for+JIRA

Tony Licavoli July 30, 2014

how does one access this document? We're about to start a migration/combination and this doc would really come in handy

Scott Harman
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 30, 2014

No... I can't access it anymore -presumably as data center is available, then this document has been retired?

It would be useful for the rest of us, as I need to test our cold standby environment, and it's been a few months since I last reviewed this doc!

Can someone at Atlassian free it up from it's black hole?

CB
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 30, 2014

Hi, you can find the newest version of the document here: https://confluence.atlassian.com/display/ENTERPRISE/Failover+for+JIRA+Data+Center

Scott Harman
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 30, 2014

Hi Christine, I don't see any data other than a basic image.

Your previous doc had heartbeat and brbd information and a bit on database replication.

Cheers

Bryan Karsh
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
February 3, 2015

Sadly, the new link doesn't have much information at all. There are many of us who are either not using Jira Data Center yet, or choose not to for various reasons. For example, my company has datacenters in different geographical regions. Jira Data Center doesn't cluster between different geographic locations yet. So for us, the cold failover approach makes more sense.. But I can't seem to find cold failover documents for Jira *anywhere* on atlassian -- the few pages that still exist appear to be restricted. I see stuff for Confluence, Bamboo, stash... but not Jira. If I were a conspiracy theorist, it would appear that we are being heavily encouraged to use Jira Data Center. ;)

Suggest an answer

Log in or Sign up to answer