Friday, March 27, 2026

Cloud Native Without Enforcement: Why principles are not enough

Cloud Native is often used as a direction, but rarely defined in a way that influences actual decisions. It appears in strategies and architecture documents, yet teams still make their own interpretations in practice. Platforms diverge, implementations vary, and architecture struggles to keep pace.

This is not caused by a lack of intent. It is caused by a lack of enforceable constraints. Most teams understand the direction and are willing to work towards it. They adopt containers, introduce automation, and try to align with what is described as Cloud Native. The issue is that none of this is bounded by clear, enforceable constraints. There are no hard rules that define what is allowed and what is not. As a result, teams are left to interpret principles on their own.

One team may run containers on a managed platform, another may run them on virtual machines. One team may use standardized pipelines, another may rely on manual steps. Both can argue they are working “Cloud Native,” because there is no shared definition that can be verified.  In that situation, alignment becomes optional. Decisions are made locally, based on speed, familiarity, or immediate constraints, rather than on a consistent architectural baseline.

Without enforceable constraints, architecture cannot guide behavior. It can only describe an intention, while implementation continues to diverge.

As long as Cloud Native is described in general terms, it does not guide behavior. Statements such as “we use containers” or “we automate everything” do not determine what is allowed and what is not. They leave too much room for interpretation. In that space, teams optimize locally, often with valid reasons, but without overall consistency.

The result is predictable: multiple platforms, inconsistent deployment models, and unclear boundaries between workloads. Architecture becomes descriptive instead of directive.

This perspective is based on my practical experience in enterprise architecture within a multi-tenant government environment, where platform choices, integration patterns, and delivery models are not theoretical concerns but daily decisions. In that context, the gap between architectural intent and implementation becomes visible quickly.


The effect of this becomes visible when you look at how teams actually implement these principles in practice.The following illustrates how this plays out in practice when no enforceable constraints are in place.

Without enforceable constraints

This diagram shows what happens when Cloud Native is defined as a direction, but not enforced through concrete rules. Teams interpret the principles in their own way, leading to different implementation choices and a fragmented platform landscape.


Cloud Native is defined at a high level, but not translated into rules that guide implementation. Teams interpret the direction based on their own context, priorities, and constraints. This leads to different approaches to platforms, deployment, and integration. Over time, these differences accumulate into a fragmented landscape where consistency is lost and architectural control becomes difficult to maintain.


With enforceable constraints

This diagram shows how clear, enforceable constraints translate Cloud Native from a general direction into consistent implementation. Workloads are classified, and rules are applied through the platform and delivery process, resulting in predictable and controlled outcomes across teams.


This does not remove flexibility, but it makes deviations explicit and manageable.


From intent to enforceable constraints

If Cloud Native is to have any impact, it needs to be defined in terms of what can be enforced. The question is not what we prefer, but what we are willing to require.

A first step is to make explicit that not every workload is Cloud Native. Treating all systems the same creates ambiguity and weakens decisions. A simple classification such as Cloud Native, transitional, and legacy removes that ambiguity. Each category comes with consequences. A Cloud Native workload is not just containerized. It runs on the designated platform, follows a defined deployment model, and meets requirements for isolation and lifecycle management. Transitional and legacy workloads are handled differently, without forcing them into a model they do not fit.

Deployment is where most of the inconsistency becomes visible. As long as manual deployments are allowed, consistency remains optional. Requiring all changes to go through standardized pipelines creates a single path to production. This is where behavior becomes predictable and where policies can be applied in a consistent way. It also makes it clear how systems are built and operated.

Isolation needs to be defined beyond logical separation. In many environments, constructs such as namespaces are treated as sufficient, while actual isolation depends on network controls, access boundaries, and runtime constraints. Without a defined baseline, shared platforms are difficult to govern. Setting minimum requirements makes it clear which workloads are allowed and under which conditions.

Integration follows the same pattern. Without constraints, systems connect directly, driven by immediate needs. Over time, this leads to tight coupling and limited visibility. Defining APIs and messaging as the standard integration model introduces consistency and makes deviations explicit. It also aligns with established practices around controlled access and traceability, as reflected in frameworks such as ISO/IEC 27001 and NIST Cybersecurity Framework.




Maturity models, such as those from the Cloud Native Computing Foundation (see also Post CNCF Maturity Model), are widely used in the Cloud Native ecosystem to describe progress and capability. They provide structure, but they do not define enforceable boundaries. In practice, teams at the same maturity level can still make very different implementation choices. Without constraints, maturity does not lead to consistency.

The Cloud Native Maturity Model

The platform is where these constraints become real. It defines how workloads are deployed, how isolation is implemented, and how integrations are exposed. If teams are free to select their own platforms, differences in behavior will follow. By setting platform boundaries, architecture ensures that constraints are not only defined but also applied.

Enterprise Architecture: Making Cloud Native enforceable

This changes what enterprise architecture actually does. Instead of describing intent, it defines the conditions under which solutions are acceptable. That reduces interpretation and limits unnecessary variation. It also makes deviations visible, so they can be discussed and managed.

In reality, platform direction is often not fully established. Multiple solutions coexist, and teams move forward because delivery cannot wait. Architecture defines direction, while teams are already moving. Ignoring that does not help. Defining constraints that apply regardless of the final platform choice is what keeps control during that transition.

Enterprise architecture does not need to prescribe every detail. It needs to define boundaries that can be verified and enforced. Within those boundaries, teams remain free to design and deliver.

How to see this in practical terms?

In practice, this does not require a complete redesign. It starts by making a small number of decisions explicit. Define which workloads are allowed on which platform. Require all deployments to go through pipelines. Make integration standards non-optional. These constraints don't need to be perfect, but they need to be enforced.

Cloud Native is not a label or a technology choice. It is a set of constraints. Without those constraints, architecture describes intent. With them, it shapes outcomes.

Monday, December 29, 2025

Policy Belongs in the Pipeline

A practical perspective on build-time governance in CI/CD

Governance arises quickly when people discuss modern CI/CD. Compliance too. Everyone agrees it matters.

And yet, when you look at how pipelines are actually built, policy enforcement is often surprisingly thin. Not absent, but just fragile. That gap is not usually caused by bad intent. More often, responsibility simply ends up in the wrong place.

Where policy enforcement tends to drift

In many environments, policy enforcement gradually assumes familiar forms.

  • rules that live in documents instead of pipelines
  • scripts added late in the delivery flow
  • checks that only run after deployment
  • tools developers are expected to install locally

None of these is an unreasonable choice on its own. The problem is what they have in common.

When something fails, it becomes difficult to answer basic questions: where the rule was enforced, when it was evaluated, and why the pipeline made that decision.

That uncertainty is rarely a tooling problem. It is almost always an architectural one.

Stepping back from tools

At some point I stopped asking which tool would solve this best. That question tends to lead nowhere.

A more useful question turned out to be simpler: Where should policy enforcement actually live?

Not on developer machines. Not only at runtime. And not as an afterthought added to an otherwise finished pipeline.

The answer I keep coming back to is uncomplicated: policy enforcement belongs inside the CI/CD pipeline, at build time. Once you accept that, many design decisions stop being optional.

What Changes Occur When Policy is Embedded in the Pipeline

Integrating policy enforcement into the pipeline promotes transparency. Inputs must be explicit, and assumptions can no longer be relied upon. Hidden states become a liability.

Decisions must also be predictable. If the same pipeline is executed twice with identical inputs, the outcome should remain consistent. 

Moreover, when a rule is violated, the pipeline should stop immediately; no warnings, no deferrals, just a halt. This approach may seem strict, but without such clarity, governance quickly becomes negotiable.



Why local enforcement keeps failing

One pattern that repeatedly causes trouble is policy enforcement that depends on local developer setups.

Different machines behave differently. Versions drift. People work around issues “just this once”.

Over time, ownership becomes unclear. Was the rule enforced by the pipeline, by the tool, or by the developer?

By enforcing policy only inside the pipeline, those questions largely disappear. There is one execution context. One decision point. One place to look.

Developers write code. Pipelines enforce policy. That separation turns out to be surprisingly powerful.

Explainability is not a nice-to-have

Another thing that becomes obvious very quickly: pipelines that fail without explanation do not earn trust.“Policy check failed” is not an answer. It is a conversation starter, however usually an unproductive one. If work is blocked, teams need to understand why, immediately and in context. Not by reading a document. Not after escalation. But as part of the pipeline output itself.

Policy-as-code makes that possible, but only if explanation is treated as part of enforcement, not an add-on.

A deliberately small experiment

To investigate these concepts, I developed a compact reference implementation with a focused approach. This implementation enforces policy within the pipeline, necessitating clear input and designed to fail fast while providing insights into its decision-making process. It intentionally avoids attempting to be comprehensive and refrains from using abstractions that obscure the underlying processes. The primary objective was not to create a complete solution but to highlight the associated trade-offs clearly.

What stood out

Even in a limited setup, a few things became very clear.

  • pipeline tasks are stateless unless you make state explicit
  • pipeline definitions and pipeline execution are not the same thing
  • changing code does not automatically change behavior
  • governance only works when people understand it

None of this is new. But it is easy to overlook when governance is discussed in abstract terms.

Closing thought

There is a growing emphasis on the concept of "shifting left" in various discussions. However, what truly holds significance beyond mere timing is the notion of responsibility. It raises the critical question of who enforces compliance with established rules and at which stages in the process. 

If governance is truly important, it must be deeply integrated into the entire delivery process right from the outset: early, explicit, and prominently displayed for all to see. 

Moreover, Continuous Integration and Continuous Deployment (CI/CD) pipelines should not merely be viewed as mechanisms for delivering software. Instead, they should be recognized as vital governance boundaries that help ensure accountability and maintain standards throughout the development lifecycle.

The reference implementation discussed here is available as open source. Feedback and alternative perspectives are welcome. If you want to contribute, I’m most interested in:

  • alternative policy examples
  • clearer policy–pipeline contracts
  • cases where this approach breaks down

Pull requests, issues, and disagreement are all equally welcome.

Repo is located at: https://github.com/mnemonic01/opa-tekton-policies.git


Sunday, June 1, 2025

DevOpsDays Singapore 2025: A Decade of DevOps Evolution and the AI Revolution



DevOpsDays Singapore 2025 celebrated its 10th anniversary with the theme "DevOps Meets AI"

As the DevOps community in Singapore celebrates a significant milestone, the 10th anniversary of DevOpsDays Singapore, this year's event brought together over attendees for two days of insightful discussions, workshops, and networking opportunities. Held on May 14-15, 2025, at the Stephen Riady Auditorium @ NTUC, the conference centered around the theme "DevOps Meets AI: Transforming Engineering with Generative AI Tools."


This landmark edition marked a decade of DevOps evolution in Singapore, bringing back the workshop track to commemorate the occasion. The event, organized in partnership with TTAB (Tech Talent Assembly), an association for ICT professionals in Singapore, offered a comprehensive exploration of how artificial intelligence is reshaping DevOps practices and engineering workflows.

In this blog post, I'll take you through the highlights of DevOpsDays Singapore 2025, covering the keynotes, expert talks, workshops, and open space discussions that made this anniversary edition particularly memorable. From AI-savvy operating models to practical implementations of AI agents, the conference provided valuable insights into the intersection of DevOps and artificial intelligence, addressing both the opportunities and challenges in this rapidly evolving landscape.


The integration of DevOps and AI was a central theme throughout the conference


Personally this was also a reunion with my good friend Santosh Dhuraij, and of course beautiful SIngapore.



DevOpsDays Singapore 2025 – Where DevOps Meets AI

The first day of DevOpsDays Singapore 2025 began with an enthusiastic welcome address by Desmond Tan,Deputy Secretary General, NTUC. Senior Minister of State, Prime Minister’s Office setting the stage for the anniversary celebration and introducing the conference theme. Tan highlighted the significance of reaching the 10-year milestone and emphasized how the return of the workshop track would provide hands-on learning opportunities for attendees.


The 10th edition of DevOpsDays Singapore was more than a milestone celebration — it marked a pivotal shift from traditional DevOps practices to the strategic integration of artificial intelligence across engineering workflows.

AI as a Strategic Force in DevOps

A recurring theme throughout the event was how AI is no longer just another tool but a fundamental force reshaping DevOps at the organizational level. The message was clear: effective AI adoption requires a rethinking of how teams are structured, how work flows, and how decisions are made.

Rather than rushing into advanced automation or agentic AI, speakers emphasized the importance of building a strong foundation — including high-quality data, responsible governance, and cross-functional collaboration. Organizations need to evolve from static operating models to AI-aware, continuously adaptive structures.

Prompt Engineering and LLMs: Emerging Skills for Engineers

With the rise of generative AI, prompt engineering was positioned as a core capability for future DevOps professionals. Hands-on workshops showed how to craft effective prompts for large language models (LLMs), using techniques like few-shot learning and chain-of-thought prompting.

Beyond prompts, sessions also explored how to build production-grade LLM applications using retrieval-augmented generation (RAG), vector databases, and observability patterns. The focus was on turning LLM prototypes into reliable, maintainable services integrated into real-world DevOps pipelines.

DevSecOps in the Age of AI

AI is transforming DevSecOps by both introducing new tools and creating new risks. On the one hand, machine learning enhances vulnerability detection, compliance checks, and threat modeling. On the other hand, AI-generated code, autonomous agents, and model hallucinations introduce new attack surfaces.

Security-focused sessions, including interactive workshops, reinforced the need for security-by-design in environments where AI is embedded deeply in workflows. As DevOps shifts left, security must keep pace.

On-Prem and Edge AI: Beyond the Cloud

While much of the AI conversation centers around cloud platforms, several talks focused on scenarios where internet access is limited or prohibited — such as defense, healthcare, or financial services. Presenters shared techniques for deploying AI systems in air-gapped environments, including offline model updates and secure data pipelines.

Edge AI was another focal point, with lightweight Kubernetes distributions enabling inference and decision-making close to data sources. These solutions are critical for real-time, low-latency AI workloads in constrained environments.

Infrastructure for AI Workloads

Supporting AI means rethinking infrastructure. From smart Kubernetes resource management using predictive algorithms to scalable distributed storage using Ceph, the sessions underscored that platform engineering must adapt to AI’s unique demands.

Infrastructure-as-code practices were also revisited — showcasing how tools like Terraform and AWS CDK can be combined using AI to streamline infrastructure provisioning and improve security configurations.

Community-Driven Collaboration and Open Spaces

True to the DevOpsDays spirit, open space discussions provided some of the most dynamic and relevant exchanges. These sessions reaffirmed the maturity and collaborative nature of the DevOps community in Singapore — curious, hands-on, and willing to tackle emerging challenges together.


Final Reflections

DevOpsDays Singapore 2025 wasn’t just a celebration of the past 10 years — it was a blueprint for what’s next. The convergence of DevOps and AI is already happening, and this community is leading the charge with a combination of strategic vision and practical experimentation. The future of DevOps is not just faster and more secure — it’s smarter. 

As DevOpsDays Singapore looks ahead to its next decade, the 2025 edition demonstrated that the community remains vibrant, engaged, and committed to sharing knowledge and advancing practices. The integration of AI into DevOps represents both a challenge and an opportunity—one that the Singapore DevOps community is well-positioned to navigate based on the insights and connections fostered at this milestone event.

For those who couldn't attend or who want to revisit the content, many of the presentations and workshop materials will probably be available on the DevOpsDays Singapore website in the coming weeks. The conversations started at the conference will continue in local meetups and online communities, ensuring that the learning and collaboration extend well beyond these two days in May 2025. I sure hope to be there in 2026.  Together with Santosh, we finalized the day with a visit to Zuhlke Engineering, to follow some interesting lectures around security, before I took off to the Netherlands again.

Thanks to all the organizers and in special Sergiu Bodu!

Sunday, February 9, 2025

DORA explained

Where DORA meets DORA: DevOps and Security 


In the DevOps world, the acronym DORA refers to two critical yet distinct concepts:

  1. DevOps Research and Assessment (DORA) Metrics – A set of key performance indicators (KPIs) used to measure software delivery performance.
  2. Digital Operational Resilience Act (DORA) – A regulatory framework introduced by the European Union to strengthen the operational resilience of financial institutions.

Both are crucial for organizations that want to achieve high-performance software delivery while ensuring security, compliance, and resilience in their operations. In this article, we’ll explore both meanings of DORA, their significance in the DevOps ecosystem, and why organizations should adopt them.


DORA Metrics: Measuring DevOps Performance


What Are DORA Metrics?

DORA Metrics were developed by the DevOps Research and Assessment (DORA) team, founded by Dr. Nicole Forsgren. These metrics are used to measure software delivery performance and operational efficiency in DevOps teams.

The four key DORA Metrics are:

  1. Deployment Frequency (DF): How often code is deployed to production. High-performing teams deploy multiple times a day.
  2. Lead Time for Changes (LTC): The time it takes for a code change to go from commit to production. Shorter lead times indicate efficient workflows.
  3. Change Failure Rate (CFR): The percentage of deployments that result in failures, such as incidents or rollbacks. Lower rates mean more stable releases.
  4. Mean Time to Recovery (MTTR): The time it takes to recover from failures. Fast recovery improves reliability and user trust.

These metrics help organizations evaluate their DevOps maturity and optimize software development and deployment processes.








Why Should DevOps Teams Adopt DORA Metrics?

  • Data-Driven Decision-Making: Helps teams identify bottlenecks and inefficiencies.
  • Improved Software Quality: Reduces failures and enhances customer satisfaction.
  • Faster Time-to-Market: Shorter lead times enable faster innovation.
  • Operational Resilience: Ensures teams can quickly recover from incidents.

By continuously measuring and improving these metrics, DevOps teams can enhance their agility and reliability.



DORA: Digital Operational Resilience Act


What is the Digital Operational Resilience Act (DORA)?


The Digital Operational Resilience Act (DORA) is a European Union (EU) regulation designed to improve cybersecurity and operational resilience in the financial sector. It applies to banks, insurance companies, fintech firms, and third-party IT service providers.

The act was introduced in response to the increasing threats posed by cyberattacks and IT failures, ensuring that financial institutions can withstand, respond to, and recover from operational disruptions.








Key Requirements of DORA

  1. ICT Risk Management: Organizations must implement strong IT security measures to protect critical systems.
  2. Incident Reporting: Mandatory reporting of major cyber incidents to regulators.
  3. Operational Resilience Testing: Firms must conduct regular stress testing and cyber resilience exercises.
  4. Third-Party Risk Management: Financial institutions must assess and manage risks from external vendors and cloud providers.
  5. Information Sharing: Encourages collaboration among financial entities to share threat intelligence.

Why Should DevOps Teams Care About DORA?

For DevOps teams working in the financial sector, compliance with DORA is essential to ensure their systems are secure, resilient, and compliant with EU regulations.

  • Enhanced Security: Aligns DevOps practices with robust security measures.
  • Resilience by Design: Promotes secure software development and operational resilience.
  • Regulatory Compliance: Avoids legal penalties and ensures business continuity.
  • Risk Mitigation: Reduces vulnerabilities from third-party dependencies and IT failures.

By integrating DORA compliance into DevOps workflows, teams can improve both their software delivery capabilities and their ability to withstand cyber threats.



Why DevOps Should Adopt Both DORAs 


Although the two concepts of DORA in DevOps are different, they complement each other. Adopting both helps organizations achieve high-performance software delivery while ensuring security and compliance.


  1. Measuring and Improving Performance: DORA Metrics help teams optimize their software delivery pipelines.
  2. Enhancing Security and Compliance: DORA (the regulation) ensures that teams develop secure, resilient, and compliant systems.
  3. Reducing Downtime and Failures: A focus on both operational resilience and DevOps performance minimizes disruptions and improves service reliability.
  4. Future-Proofing Digital Services: As cyber threats increase, integrating DORA regulations into DevOps protects businesses from operational risks.


By adopting both DORA frameworks, organizations can create a robust, efficient, and resilient DevOps culture that drives innovation while ensuring security and compliance.



Conclusion

In the DevOps landscape, DORA Metrics provide a framework for measuring and improving software delivery performance, while the Digital Operational Resilience Act (DORA) ensures that organizations are prepared for operational and cybersecurity risks.

For DevOps teams—especially those in financial services and regulated industries—adopting both DORA approaches is crucial for building a secure, high-performing, and resilient digital ecosystem.

By leveraging DORA Metrics for efficiency and complying with DORA regulations for security, organizations can achieve the perfect balance between speed, reliability, and compliance in their DevOps practices.



 

Sources:

https://dora.dev/

https://www.eba.europa.eu/regulation-and-policy/operational-resilience

Cloud Native Without Enforcement: Why principles are not enough

Cloud Native is often used as a direction, but rarely defined in a way that influences actual decisions. It appears in strategies and archi...