Observability Is Not an Island: Team Collaboration and Cultural Transformation
This article was last updated on: May 17, 2026 am
Observability Is Not an Island: Team Collaboration and Cultural Transformation
Honestly, I’ve been chatting with a few folks on the front lines of operations recently, and there’s a common theme: companies either have no one dedicated to observability, or they’ve set up a “centralized observability team” that ends up spending all day fixing Grafana dashboards, configuring alert rules, and digging through metric fields — ultimately becoming a “tool operations center.” They’re doing plenty of work, but business teams still complain, and incidents still go undetected.
This reminds me of a classic question: Who should actually own observability? Today let’s talk about this topic, drawing on pitfalls I’ve personally encountered and SpotOn’s real-world case study, to share my perspective on organizational models for observability.
Background: The “Centralized” Curse of Observability
Many companies, especially mid-to-large enterprises, immediately think “we need to set up a monitoring team” or “we need an observability platform team” when they start looking into observability. And what happens?
│ 📝 Notes: The worst case I’ve seen is where the centralized team becomes a “configuration administrator” + “alert courier,” and business teams don’t even know what’s being monitored for their services or how alerts are configured.
Put simply, the essence of observability is not “having a team that can do monitoring,” but “every team being able to understand the health of their own services.” Just as quality is everyone’s responsibility rather than the QA department’s alone, the same applies to observability.
Observability Takes a Village — It’s Not an Island
One comment I’ve seen really resonates: centralized teams often devolve into tool operations centers rather than enablers, ultimately becoming bottlenecks in the development process.
Observability spans multiple stages:
- Development: instrumentation (Tracing), logging standards (Logging)
- CI/CD: services exposing metrics (Metrics)
- Testing: validating SLO configurations
- Production: alert response, incident investigation (Alerts)
None of these stages can exist in isolation from business teams. If an organization “extracts” observability and tosses it to a centralized team, it quickly becomes:
- Dev teams write code without caring about instrumentation → “the observability team will handle it anyway”
- The observability team doesn’t understand the business → alert rules are either too many or too few — one bottleneck becomes two
- Both sides start pointing fingers → 🤷♂️
So, what’s the right approach?
Platform Engineering + Convention-Over-Configuration = The Remedy
Rather than building a massive centralized team, build an observability platform team. What’s the difference?
Centralized Team
- Responsibilities: configuring metrics, alert rules, dashboards
- Problem: directly managing observability for hundreds of services — simply unscalable
- Outcome: becomes a bottleneck
Platform Engineering Team
- Responsibilities: designing reusable components, convention-over-configuration templates, best practices
- Goal: enable each business team to self-configure while ensuring data consistency
- Advantage: each team manages its own observability; the platform team only provides “tools” and “standards”
It’s just like when we did DevOps — provide pipeline templates and let teams write their own jobs, rather than having ops write a Jenkinsfile for every project.
│ 📝 Notes: “Convention-over-configuration” here means — you just need to declare something like service: my-app and team: team-x in your code (similar to a Prometheus CRD like ServiceMonitor), and the platform automatically provisions default alert rules, dashboards, and log collection for you.
Real-World Case Study: SpotOn’s Observability Transformation
I recently watched SpotOn’s talk (How SpotOn Consolidated Observability Tools & Drove Observability Culture Change with Grafana Cloud), where they consolidated from a chaotic multi-tool state into Grafana Cloud.
SpotOn’s Approach
- Tool consolidation: unified scattered Datadog, New Relic, and homegrown tools into Grafana Cloud
- Platform-ification: the platform team provided reusable dashboard templates and alert rule presets
- Cultural transformation: shifted from “bottom-up reactive alerting” to “top-down decision support”
One insight particularly worth learning: Observability isn’t about piling up dashboards — it’s about providing high-quality data to drive organizational decisions. This “decision support” angle is truly a key point many teams overlook.
Pitfalls They Encountered
👍 Pros:
- Tool consolidation reduced operational complexity
- The platform engineering model significantly lowered the onboarding barrier for teams
- After the cultural shift, teams proactively optimized their own SLOs
👎 Cons:
- Cultural change met significant resistance; initially some teams felt “we don’t need monitoring”
- Convention-over-configuration has non-trivial maintenance costs; the platform team needs to continuously update best practices
My Reflections
Looking at SpotOn’s case, truly effective observability isn’t pushed top-down — it’s operated with an “internal product” mindset. The platform team should design observability services like building a product, focusing on the “users” (i.e., business teams) and their experience and satisfaction.
│ 🤔 A key question: Is your observability platform something teams “have to use,” or something they “want to use”?
How to Implement: Cultural Transformation Is Key
Many teams’ observability status quo looks like this:
- A flood of alerts, but no one knows what they mean for the business
- Dashboards look impressive, but leadership still can’t tell “is our system actually healthy?”
- Monitoring gaps are only discovered after an incident occurs
This is the classic case of “monitoring for the sake of monitoring.” So how do you change? Three steps.
How to Change? Three Steps
- Define objectives: Make it clear that observability exists to support decisions, not to accumulate tools.
- Build habits:
- Regular retrospectives: discuss alert response performance and SLO attainment rates
- Best practice sharing: let high-performing teams share their experiences
- Establish feedback loops: The platform engineering team must continuously accept feedback from users (business teams) and iteratively improve templates and rules
Operational Recommendations for Platform Teams
- Design observability services with an “internal product” mindset: including documentation, templates, skills, APIs, best practices, and audit mechanisms
- Drive cultural adoption through community operations: hold regular observability workshops, compile best practices, appoint “observability ambassadors”
- Enable rather than enforce: give business teams the feeling that “I can handle observability on my own”
Final Thoughts
Observability can be both hard and simple. The key isn’t how many tools you use, but how teams are organized and how culture is built. I led an observability team for a while myself and deeply understand this — when the direction is right, the path becomes easy. Don’t mask strategic laziness with tactical diligence.
Core takeaways:
- There is no centralized observability team — only a platform engineering team + individual business teams
- Convention-over-configuration is the key to lowering barriers; the platform team “builds the wheels,” business teams “drive the car”
- The ultimate goal is to provide high-quality data to drive decisions, not to pile up dashboards
- Cultural transformation is harder than tool consolidation, but far more worthwhile
Looking back after all this effort, teams that persisted in transforming observability from a “tool” into a “culture” all reaped the rewards in the end.
🎉🎉🎉