I’m pleased to announce that we now have Datadog for our infrastructure! We did qualify for they opensource program, and it was recently approved! I’ve been using datadog for a few months at work, and I was very happy to use a cloud service for our monitoring and alerting.
The credentials are shared as usual; if you are not part of infra team and want a user, let me know and I can create it for you.
I configured (almost) all our machines to have the datadog agent, and I also tagged them based on environment and provider:
I also created some basic alerts (CPU, disk, memory, swap…) https://app.datadoghq.com/monitors#triggered
Because of those small things, I discovered that wiki service was eating all of the CPU. I increased the memory of the JVM, and… magic happens!!
Here’s where I ask for help. We are using the official ansible role. I’d love some help to setup the following integrations with all our machines:
And even to create awesome dashboards!