Can AI Replace Ops Engineers?

This article was last updated on: May 17, 2026 am

Yes, it can.

With only one prerequisite:

Your company doesn’t adopt a “defensive ops” strategy.

│ 📝 Disclaimer:
│
│ - Artisanal craftsmanship, purely handwritten
│ - This article is 100% written by me manually
│ - This article is NOT AI-generated

Background

The trend of AI + AI IDE/CLI replacing developers is already quite obvious.

As an ops engineer, staying vigilant in times of peace, I naturally started seriously 🤔 thinking about this question: Can AI replace ops engineers?

To find out, I handed several real-world cases to AI for execution, including common ops tasks:

Database migration
Application upgrade
Deploying new services
…

The result is that I greatly underestimated AI’s capabilities — the actual outcome was even better than my best-case expectations.

Let’s take a look.

Real-World Cases

Case 1: Upgrading LobeChat from v1 to v2

Introduction to LobeChat

LobeHub (called LobeChat in v1, renamed to LobeHub in v2) is practically tailor-made for tinkerers like us. Honestly, switching back and forth between windows with ChatGPT is too cumbersome. But LobeHub is different — it lets you build your own AI team.

Imagine: you can create an Agent dedicated to writing code, one responsible for document organization, and another to help with data analysis — and they can collaborate with each other! It feels like playing StarCraft, except your “units” are all AIs.

What excites me most is its self-hosting capability. A single Docker command gets the entire service up and running, with data completely under your own control. For those concerned about privacy, this is a godsend.

I mainly use these features: various assistants (roasting people, polishing articles, analyzing international affairs, Wang Yangming’s philosophy of mind teachings, financial management…), plus RAG document resource management capabilities.

If you’re also tired of switching between various AI tools, or want a fully private AI workspace, I strongly recommend trying LobeHub. The 74.4k stars on GitHub aren’t for nothing, and the community is very active.

One-line summary: LobeHub transforms you from “using AI” to “managing an AI team”, fully self-hosted with your data under your own control.

My upgraded LobeChat v2 looks like this:

My self-hosted LobeChat v2

My Deployment Setup

LobeChat v1 had a Docker Compose deployment option. I rewrote it as K8s Manifests and deployed it on my Homelab. See my previous configuration here: homelab2/apps/lobe-chat at 3855a4c141a4c9cd8c503d891be38a032766bb15 · east4ming/homelab2

What Changed in V2?

│ 📚️ Reference documentation:
│
│ Migrate from v1.x Local Database to … · LobeHub

LobeChat v2 underwent massive changes, making this migration so challenging that even I found it daunting 😖:

PostgreSQL needed to be upgraded from 16 to 17, and not vanilla PostgreSQL — from pgvector 16 to ParadeDB 17 😱
Data must not be lost
LobeChat’s authentication system underwent a major overhaul: switching from NextAuth to Better Auth
The official documentation (referenced above) is just one page — a descriptive overview rather than detailed step-by-step instructions. And it doesn’t apply to my case since I’m not using Docker Compose for deployment… 😑

AI Enters the Stage

🎉 Despite all these difficulties, with AI’s help, the migration was bumpy but ultimately completed successfully 🎉🎉🎉

I used Claude Code with DeepSeek via API as the model (later, after trying other models, I realized DeepSeek isn’t currently in the top tier — but even so, it did a great job). I also used the planning-with-files skill:

plaintext
task_plan.md → Track phases and progress
findings.md → Store research and findings
progress.md → Session log and test results

The reason for using the planning-with-files skill is:

This is a very challenging ops migration task that consumes a large amount of context
Ops work is like this — you need proper planning
Using this skill ensures there are requirements, design plans, and most importantly: migration progress is tracked in real-time through tasks, so context is never lost

planning-with-files

AI first planned these 3 files:

I won’t paste the full content here to save readers’ time. If you’re interested, click the links above 👆 to check them out.

findings

Summary of the Lobe Chat v1 to v2 Production Migration Plan

Core Objectives
Key Changes and Requirements
Completed Preparation Work
Technical Decisions and Risk Mitigation
Follow-up Plans

progress

Migration Project Summary Report

Project Overview
Key Steps and Results
- Preparation and Assessment (Phase 1)
- Database Upgrade (Phase 2)
- Authentication System Migration (Phase 3)
- Deployment and Verification (Phase 4)
- Wrap-up and Monitoring (Phase 5)
Final Status

task_plan

Task Overview
Key Progress and Completion Status
Core Decisions and Considerations
Environment Information
- Production domain: west-beta.ts.net
- Network configuration: Tailscale Ingress and ExternalServices
- User email
Summary

To sum up, objectively speaking, it wrote much better than I could (otherwise I wouldn’t still be an ops engineer 😂), and the considerations were very thorough.

Execution

The execution process followed the planned documentation step by step.

I took a shortcut here — I manually disabled ArgoCD’s auto-sync feature first. Then I had AI modify the K8s manifests, and after modifications, deployed directly via kubectl commands or executed PostgreSQL migration commands.

In the end, it was indeed completed successfully. 🎉🎉🎉

Summary Report

It also generated additional related documentation:

Honestly, I could have thought of items 1, 4, and 5, but “User Feedback Collection” and “User Migration Notification” were aspects I genuinely wouldn’t have considered 😂.

👍️ AI Strengths

AI can complete 90% of the work (the remaining 10% required my intervention. I believe the reason isn’t AI’s fault — it’s that my GitOps repo lacked visibility in certain areas, causing AI to not understand those aspects and make misjudgments. More details below)
AI produced very thorough planning
AI kept documentation updated in real-time before, during, and after the migration (I can only guarantee keeping an Excel todo checklist updated at best)

👎 Weaknesses

Although my deployment information is mostly in Git, there were still some gaps, causing AI to potentially lack critical production environment information (e.g., data backup mechanisms, secrets, data in PVs) (this is my fault)
Despite the well-planned documentation constantly emphasizing data importance, during execution it would still casually use commands like rm, delete, drop. So permissions must be strictly controlled
This is currently AI’s biggest problem — it will deceive. For steps that didn’t succeed, AI would sometimes pretend not to notice, continue executing subsequent steps, and mark them as ✔️ completed in the documentation. (For example, my database dump restore failed, and it just skipped it and marked it as complete 😂😂😂)
AI’s 128K context was exhausted multiple times, potentially causing loss of critical information. So you must tell AI to document as it goes

🫣 Detailed records are here:

Merge pull request ‘feat(lobe-chat): 实现v2迁移准备工作’ (#312) from lobehub-… · east4ming/homelab2@ba0d16c

Migration PR

Summary

AI spent one full day and ¥6.50 to successfully complete this task.

Cost

Case 2: Deploying a New Service — Online Documentation Website

Compared to the previous task, this one was relatively simpler. AI performed even better — 100% completion with zero human intervention!

Task Overview

This was a project at my company. I have a gitops-monitor repo at work that contains everything I’m responsible for in ops. I wanted to generate an online documentation website using MkDocs, based on the Markdown documents in my repo, with bilingual Chinese-English support.

This time I switched to using the company-provided Kiro IDE.

AI Enters the Stage

Kiro Specs

Kiro Specs first generated 3 documents:

Requirements → Design → Tasks

Requirements: equivalent to the refined requirements you produce after business/leadership raises a need
Design: equivalent to your migration plan
Tasks: equivalent to the actual Excel task checklist used during migration

Requirements

Requirements document contents:

Introduction
Glossary
Requirements
- MkDocs Project Configuration
- Chinese-English Bilingual Support
- Git Branch Management
- Kubernetes Deployment Manifests
- AWS ALB Ingress Configuration
- MkDocs Static Files and Deployment Process

Here you can see it refined my vague description into specific, concrete requirements. And each requirement has: user stories and acceptance criteria. 👍️

Design

Technical Design Document

Overview
Architecture (here AI drew a flowchart 👍️)
Component Design
- MkDocs Configuration — mkdocs.yml
- Kubernetes Deployment Manifests — planned to write a mkdocs Helm chart (including: Deployment, PVC, ConfigMap, Service, Ingress)
- ArgoCD Auto-discovery (AI analyzed and found that ArgoCD uses auto-discovery, so no additional ArgoCD configuration was needed)
- Build and Deployment
- i18n Bilingual Implementation
- Git Branch Strategy (feature branch for development, merged to main branch via PR for deployment)
Implementation Tasks

Tasks

Implementation Plan

Overview
Task Checklist (listed 7 major tasks and more subtasks, with inter-task dependencies noted, and real-time updates upon completion — - [x])
Notes

Execution

The coding phase doesn’t need much elaboration — this is AI’s forte. It wrote:

mkdocs.yml
Helm chart

It completed the task excellently with zero human intervention. 🎉🎉🎉

👍️ Strengths

AI can complete 100% of the work with zero human intervention
AI produced very thorough planning
AI kept documentation updated in real-time before, during, and after the migration

👎 Weaknesses

None

Summary

This task was simpler, the repo had complete information, and the model used was stronger. AI perfectly 👐 completed the task. Zero weaknesses.