14 Comments

After reading the blog, I realized that Cloud Native is primarily associated with irreversibility.

Expand full comment

How so?

Expand full comment

It appears that the concept of CloudNative lacks a precise definition. According to the Cloud Native Computing Foundation (CNCF): "Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”

The scalability of applications necessitates the irreversibility of both deployment and execution.

Resilience pertains to the recovery from temporary errors or resource unavailability, while the ability to make frequent, high-impact changes implies the reversibility of the deployment process.

Looking at the 12 factors, I add my comments according to four distinct heads:

Codebase - One codebase tracked in revision control, many deploys.

- States: Minimize the states of deployments, ensuring consistency across various deployment instances.

Dependencies - Explicitly declare and isolate dependencies.

- Interdependencies: Recognize and manage interdependencies

Configuration - Store configuration in the environment.

- States: so the environment configuration is immutable

Backing Services - Treat backing services as attached resources.

- Interdependencies: enforce public standard interactions

- States: a resource is either exist or non-exist

- Irreversibility: circuit breaker , retries is about recover from the fault satuation

Build, release, run - Strictly separate build and run stages.

- States: Similar to Division of labor in Ford

Processes - Execute the app as one or more stateless processes.

- States: Implement stateless processes to simplify application management and ensure consistency.

Port binding - Export services via port binding.

- Interdependencies: enforce public standard interactions

Concurrency - Scale out via the process model.

- States: requires stateless, share nothing design

Dev/prod parity - Keep development, staging, and production as similar as possible.

- Uncertainty: Mitigate uncertainty

Logs - Treat logs as event streams.

- Maybe not directly related to the 4 heads

Admin processes - Run admin/management tasks as one-off processes.

- States: manage the admin process as code so the state change is immutable

After the review, I might mean to say resilience/zero-downtime is about Irreversibility

Expand full comment

I think of a complex system as one for which the outputs are _incongruent_ with the inputs. That is to say: it's impossible to attribute outputs or outcomes to the interactions among and between inputs within the system. A system can be made _less_ complex by understanding and modeling interactions therein. Describing a system as complex doesn't imply sophistication; rather, it's a bad thing and should be addressed.

Expand full comment

Wouldn’t it be great if the people using the system could decide to reverse or opt-out of some new feature that public web companies provide?

I’m reading Johann Hari’s book “Stolen Focus” and the data he presents on the epidemic of attention span decline makes me wish I could turn off the attention grabbing features being added to all the social media and commerce systems.

Sorry if this is off topic or a stretch of intent but maybe it will spark some interest.

Expand full comment

That's definitely a different topic.

Expand full comment

I once said I would pay for Twitter if I had the option of locking myself out for a day or a week. I don't think you can expect that kind of features from tech companies - guarding your focus is really on you as a consumer.

I actually referenced Hari's book in my own post about this: https://renegadeotter.com/2023/08/24/getting-your-focus-back.html

Expand full comment

Any chance that there is somewhere video of profesor Enrico Zaninotto keynote talk?

Expand full comment

Thank you, I've read the transcript. I was hoping for the video. I know it's old but I had to ask.

Expand full comment

Nope, it was before we invented visuals 🤪

Expand full comment

You wrote that the complexity head to cut off— if you’re Facebook— is the irreversability head, because the other 3 are not approachable. Is that true of any multi-datacenter-scale computing organization? Or was Facebook different because it was harder to predict its future feature set and scale? Do other computing organizations have hope of attacking state, interdependencies, or uncertainty?

Expand full comment

Different sources of complexity make sense to address at different scales and in different contexts. For example, an individual programmer might write automated tests to reduce uncertainty. Teams with lots of interdependencies might choose to duplicate effort to reduce them.

Expand full comment

I work for an online dating company that has ~40 dating properties serving a large, global community. We've focused on the reversibility aspect because, frankly, it's just plain easier to control. As a baseline, we've automated testing and deployment so we can take changes (or fixes) to production multiple times an hour if necessary (although our goal is once a week). We use a combination of feature flags, A/B percentage tests, and predicate-based rules, all controllable by an "admin" app that the business team can use to adjust pretty much any behavior they want. The predicate-based rules use an English-like query language that allow them to target segments of users based on pretty much any attributes of their profile, their purchase history, or their behavior on the site.

It allows us to try out new ideas on smaller sites or geographically or subsets of the user population and either roll it out further if it looks good or roll it back if it doesn't. It dramatically reduces the risk of any changes we make, and it encourages everyone in the company to suggest possible improvements (both to the user experience and to our bottom line) since we can experiment pretty freely and reverse anything that doesn't work.

We rely very heavily on observability at all levels to see how any experiment impacts performance across all sorts of dimensions (both infrastructure and business).

We're not even "multi-datacenter scale" and our entire company is less than a dozen people.

Expand full comment