Name: Monitoring in Trove: Of DBAdmins and Buses
Start: 2014-05-15T14:20:00-0400
End: 2014-05-15T15:00:00-0400

Back To Schedule

Monitoring in Trove: Of DBAdmins and Buses

Trove today has very limited monitoring through agent heartbeats.
Going forward, this will be somewhat insufficient if we want to achieve goals like active slave promotion, failover and agent remediation.

Specifically we should discuss improving monitoring along these fronts:

- Better Agent Monitoring, and remediation:
What should we do about "Lost Agents"

- Upgrade Monitoring:
How do we ensure that all agents have been upgraded to a "baseline" version correctly.
How do we deal with agents that haven't?

- Connectivity monitoring:
The datastore agent might be up and running, but how do we monitor instances to ensure that a customer is actually able to connect to it?

- Replication Monitoring:
How do we monitor the master / slave, and achieve quick and active failover from them master to the slave in case the master goes down?
How do we provision a new slave to replace the old one?

- Self healing clusters:
How should we monitor cluster nodes and what is our remediation strategy in case a cluster node goes down?

Etherpad at: https://etherpad.openstack.org/p/TroveMonitoring

(Session proposed by Nikhil)

Thursday May 15, 2014 2:20pm - 3:00pm EDT
B305

Trove

Juno Design Summit

Attendees (0)

Juno Design Summit

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Attendees (0)