Loading…
Juno Design Summit has ended
Wednesday, May 14 • 11:00am - 11:40am
Heat Scaling, Robustness and Convergence

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

This session will include the following subject(s):

scaling & robustness for heat:

Lets make heat more robust and scalable when dealing with real world clouds.

TripleO has spent a year working with Heat and learning about common failure modes and glitches that make production use by non-experts hard (at best) and impossible at worst.

The issues we've encountered:
- scaling of single large stacks (e.g. I have a 10K node cluster, why is that constrained to run in a single heat engine)
- dealing with the real world: backend APIs can and do fail - in myriad ways - manual intervention to fix these is pointless - given a desired cluster definition, it is heats job to keep pushing to converge on that state
- fast, graceful failover [e.g. look more like something like galera to clients] of failed heat engines - a failed heat engine is a fact of life in production environments (e.g. due to deployments) and having that cause user visible issues is a significant confidence issue).
- stacks mid update cannot have their templates/parameters updated until it completes.
- heat doesn't notice that resources have failed or stopped behaving correctly

Clint and I will give a quick walk through a possible underlying architecture to address scale and robustness from the ground up, and then the rest of the session can be a mix of poking holes in that approach / coming up with alternative designs.

(Session proposed by Robert Collins)


Wednesday May 14, 2014 11:00am - 11:40am EDT
B302

Attendees (0)