Observability Is Not an Island: Team Collaboration and Cultural Transformation

This article was last updated on: June 29, 2026 pm

Observability Is Not an Island: Team Collaboration and Cultural Transformation

Honestly, I’ve been chatting with a few folks on the front lines of operations recently, and there’s a common theme: companies either have no one dedicated to observability, or they’ve set up a “centralized observability team” that ends up spending all day fixing Grafana dashboards, configuring alert rules, and digging through metric fields — ultimately becoming a “tool operations center.” They’re doing plenty of work, but business teams still complain, and incidents still go undetected.

This reminds me of a classic question: Who should actually own observability? Today let’s talk about this topic, drawing on pitfalls I’ve personally encountered and SpotOn’s real-world case study, to share my perspective on organizational models for observability.

Background: The “Centralized” Curse of Observability

Many companies, especially mid-to-large enterprises, immediately think “we need to set up a monitoring team” or “we need an observability platform team” when they start looking into observability. And what happens?

│ 📝 Notes: The worst case I’ve seen is where the centralized team becomes a “configuration administrator” + “alert courier,” and business teams don’t even know what’s being monitored for their services or how alerts are configured.

Put simply, the essence of observability is not “having a team that can do monitoring,” but “every team being able to understand the health of their own services.” Just as quality is everyone’s responsibility rather than the QA department’s alone, the same applies to observability.

Observability Takes a Village — It’s Not an Island

One comment I’ve seen really resonates: centralized teams often devolve into tool operations centers rather than enablers, ultimately becoming bottlenecks in the development process.

Observability spans multiple stages:

Development: instrumentation (Tracing), logging standards (Logging)
CI/CD: services exposing metrics (Metrics)
Testing: validating SLO configurations
Production: alert response, incident investigation (Alerts)

None of these stages can exist in isolation from business teams. If an organization “extracts” observability and tosses it to a centralized team, it quickly becomes:

Dev teams write code without caring about instrumentation → “the observability team will handle it anyway”
The observability team doesn’t understand the business → alert rules are either too many or too few — one bottleneck becomes two
Both sides start pointing fingers → 🤷‍♂️

So, what’s the right approach?

Platform Engineering + Convention-Over-Configuration = The Remedy

Rather than building a massive centralized team, build an observability platform team. What’s the difference?

Centralized Team

Responsibilities: configuring metrics, alert rules, dashboards
Problem: directly managing observability for hundreds of services — simply unscalable
Outcome: becomes a bottleneck

Platform Engineering Team

Responsibilities: designing reusable components, convention-over-configuration templates, best practices
Goal: enable each business team to self-configure while ensuring data consistency
Advantage: each team manages its own observability; the platform team only provides “tools” and “standards”

It’s just like when we did DevOps — provide pipeline templates and let teams write their own jobs, rather than having ops write a Jenkinsfile for every project.

│ 📝 Notes: “Convention-over-configuration” here means — you just need to declare something like service: my-app and team: team-x in your code (similar to a Prometheus CRD like ServiceMonitor), and the platform automatically provisions default alert rules, dashboards, and log collection for you.

Real-World Case Study: SpotOn’s Observability Transformation

I recently watched SpotOn’s talk (How SpotOn Consolidated Observability Tools & Drove Observability Culture Change with Grafana Cloud), where they consolidated from a chaotic multi-tool state into Grafana Cloud.

SpotOn’s Approach

Tool consolidation: unified scattered Datadog, New Relic, and homegrown tools into Grafana Cloud
Platform-ification: the platform team provided reusable dashboard templates and alert rule presets
Cultural transformation: shifted from “bottom-up reactive alerting” to “top-down decision support”

One insight particularly worth learning: Observability isn’t about piling up dashboards — it’s about providing high-quality data to drive organizational decisions. This “decision support” angle is truly a key point many teams overlook.

Pitfalls They Encountered

👍 Pros:

Tool consolidation reduced operational complexity
The platform engineering model significantly lowered the onboarding barrier for teams
After the cultural shift, teams proactively optimized their own SLOs

👎 Cons:

Cultural change met significant resistance; initially some teams felt “we don’t need monitoring”
Convention-over-configuration has non-trivial maintenance costs; the platform team needs to continuously update best practices

My Reflections

Looking at SpotOn’s case, truly effective observability isn’t pushed top-down — it’s operated with an “internal product” mindset. The platform team should design observability services like building a product, focusing on the “users” (i.e., business teams) and their experience and satisfaction.

│ 🤔 A key question: Is your observability platform something teams “have to use,” or something they “want to use”?

How to Implement: Cultural Transformation Is Key

Many teams’ observability status quo looks like this:

A flood of alerts, but no one knows what they mean for the business
Dashboards look impressive, but leadership still can’t tell “is our system actually healthy?”
Monitoring gaps are only discovered after an incident occurs

This is the classic case of “monitoring for the sake of monitoring.” So how do you change? Three steps.

How to Change? Three Steps

Define objectives: Make it clear that observability exists to support decisions, not to accumulate tools.
Build habits:
- Regular retrospectives: discuss alert response performance and SLO attainment rates
- Best practice sharing: let high-performing teams share their experiences
Establish feedback loops: The platform engineering team must continuously accept feedback from users (business teams) and iteratively improve templates and rules

Operational Recommendations for Platform Teams

Design observability services with an “internal product” mindset: including documentation, templates, skills, APIs, best practices, and audit mechanisms
Drive cultural adoption through community operations: hold regular observability workshops, compile best practices, appoint “observability ambassadors”
Enable rather than enforce: give business teams the feeling that “I can handle observability on my own”

Final Thoughts

Observability can be both hard and simple. The key isn’t how many tools you use, but how teams are organized and how culture is built. I led an observability team for a while myself and deeply understand this — when the direction is right, the path becomes easy. Don’t mask strategic laziness with tactical diligence.

Core takeaways:

There is no centralized observability team — only a platform engineering team + individual business teams
Convention-over-configuration is the key to lowering barriers; the platform team “builds the wheels,” business teams “drive the car”
The ultimate goal is to provide high-quality data to drive decisions, not to pile up dashboards
Cultural transformation is harder than tool consolidation, but far more worthwhile

Looking back after all this effort, teams that persisted in transforming observability from a “tool” into a “culture” all reaped the rewards in the end.

🎉🎉🎉

📚 References

Observability

#Observability #Grafana #SRE #SLO #Team Collaboration #Platform Engineering

Observability Is Not an Island: Team Collaboration and Cultural Transformation

https://e-whisper.com/posts/26761/

Author

east4ming

Posted on

May 16, 2026

Licensed under

In the AI Era, Why Leisure Matters More Than Busyness Previous

Failure Isn't Shameful, Hiding It Is: Lessons from Cloudflare's Crisis Communication Next