Given the recent outage today with IU XSEDE – we should consider putting some of our infrastructure into AWS – I’ll draft a budget soonish about how much this may cost. If we need money for anything, it’s infrastructure.ts We have all critical infrastructure on IU XSEDE (JIRA, Confluence, Bamboo (and all Bamboo agents), and Modulus). This is bad. If a fire ever occurs in that data center, we’re in big trouble. The other option is to spread out our services between AMS1 and AMS2 DigitalOcean data centers. I will also price that out and see how much it would cost. We currently have two VPSs (Virtual Private Servers) in each of those data centers.
Relying on free is nice, but we get now SLA – which means uptime is a bit hard to guarantee. As of a few minutes ago, I had go to disable the globalnavbar in Discourse because it was causing Discourse to load slowly.
We need to definitely consider paying for Amazon Glacier(Amazon S3 is also an option) to store database backups nightly. It’s cheap to do it – but getting it out is pricey, we shouldn’t ever need to touch the backups (hopefully). We currently do not have backups.
+1 to what @r0bby said. I can’t comment on the cost, but the slowness + outages have definitely limited my productivity.
Adding to the services Robby mentioned, our Maven repo is there as well (I assume, because it was down yesterday as well). This is actually one of the biggest blockers, because the SDK relies on it. (Which actually may be a bigger deal for those in limited bandwidth setting, but that’s another issue @raff).
@mogoodrich, you can switch to offline with -o to continue using SDK in limited bandwidth setting or no connectivity at all.
Personally, I do not see much point in maintaining OpenMRS maven repo, when we can publish to bintray and/or oss.sonatype.org for free as do many OSS projects. Actually, I’m planning a QA sprint, which will add releasing to bintray from travis-ci for as many Reference Application modules as we can fit in our timeline. The migration will be transparent as OpenMRS maven repo proxies maven central so we can keep OpenMRS repo running and continue to use both as long as it takes.
Cool, great. I know little about bintray or oss.sonatype.org, but if there’s a free cloud alternative to maintaining our own repo, it’s by all means worth pursuing.
I think relying on free is a bad idea. We’ve been doing that and it’s not working – when no outages occur, we’re great but as you saw on Thursday, the community came to a screeching halt because we host vital infrastrce ucture all in one place, this is bad. We also need a reasonable SLA. We also need to ensure that if one data center has a network outage, we don’t lose everything.
I need to dedicate some time on Monday and just pull the trigger on Amazaon Glacier, because we need it now and I can’t wait and sit on my hands. Amazon Glacier is surprisingly cheap to put data in, but really expensive to get it out. What I’m proposing is that we we keep the latest db backup on the servers in case we need to do a restore (to eliminate the need to pull it out of Glacier – which will not be cheap) .
I am also considering S3 – S3 has support to use Glacier in some cases. Glacier is expensive and some of our databases are HUGE (in Gigabytes!).
@r0bby, what’s are the rough sizes of our Confluence & JIRA backups (i.e., .tgz of SQL backups & .tgz of instance folder)? Are we talking 10 GB, 100 GB, or 1000 GB in Glacier?
I wouldn’t just backup confluence/jira – I’d back up Bamboo, ID Dashboard, and Crowd. too. It would probablyry cost us less than $1.00 to put it in there; to get it out will be another story. I’d actually like to use S3 over Glacier – it lets us retreieve it easier.
I’d still like to use Amazon S3, it’s cheaper to get data out of and S3 can actually use Glacier behind the scenes. I will look into the size of the backups.
Thanks @r0bby. I’m not surprised that the SQL dumps are relatively small. How about the instance folders – i.e., not the location where Confluence or JIRA runtimes are installed (e.g., /opt/confluence or /opt/jira), but the instance data (e.g., /var/confluence or /var/jira) where configuration settings and attachments are stored? The instance should be considerably larger than the SQL, even with bzipped. A backup would include both files: bzipped mysql dump + bzipped instance data.
We have to move JIRA/Confluence/Bamboo off of XSEDE – XSEDE machines can be used for less critical applications.
As I understand it, XSEDE offers no real availability guarantees. This is pretty bad wh can’'t handle that when we rely on them. The community came to a screeching halt.
Okay – learned that the hard way – we ran out of disk space. We need to move the files off the server immediately. The size of the uncompressed sql dumps are too much. We don’t have much free space on that box.
I agree we should try to avoid single points of failure for high availability; however, moving services from one VM to another doesn’t overcome the single point of failure. The recent interruption of IU services was caused by road construction cutting through a fiber line (not something that should be happening frequently).
XSEDE is being sunset anyway, so we’ll need to move. Jetstream has some potential:
IU continues to graciously donate resources for us (i.e., these are not free resources, they’re donated)
With Jetstream, we are given an allocation of resources (much like AMS or DO) and have full control to create our own networks, subnets, assignment of IP addresses, and spinning up VMs within that allocation.
Jetstream services are spread across campuses. For example, our current allocation for testing is hosted in Austin. It’s likely with Jetstream, we will at least be able to spread our services across Indiana University and Austin sites if not additional sites.
That said, our fundraising efforts are aiming to support infrastructure as well as a sustainable future for OpenMRS development. So, there’s a good chance we’ll have more options going forward.
LOL. Been there. That’s why I always run sudo df -h before creating new backups.
How big is the disk and how much space is available? Are there old copies or backups taking up space? Have you run sudo du -sh * to see where the space is being used?
This is good. OpenLDAP is a pain in the butt with docker and our current configuration method.
We definitely could use more servers. First things first – let’s try to spin up a docker ELK stack container – @darius’ PM tool will have to wait (sorry!)
Is there any way to get some kind of an idea how much money we have to play with right now?