Documentation
operations/README.md
About Operations Documentation
What's Here
Operational guides for running and maintaining Dynaplex in production and development environments.
Current docs:
- operations.md - Production operations guide
- documentation-setup.md - Maintaining documentation
What Belongs Here
Operations documentation covers:
- Production operations - Running services in production
- Monitoring and observability - Health checks, metrics, logs
- Incident response - What to do when things go wrong
- Maintenance procedures - Regular tasks and upkeep
- Process documentation - Team workflows and standards
Examples of operations docs:
- "Monitoring Dynaplex services"
- "Incident response runbook"
- "Database backup procedures"
- "Service health check guide"
- "Documentation maintenance process"
- "Release management workflow"
What Doesn't Belong Here
❌ How to build features → Use how-to/
❌ Learning materials → Use tutorials/
❌ API reference → Use reference/
❌ Architecture explanations → Use explanation/
How to Think About This
Analogy: Operations manual for a factory
- Operations: "How to monitor production systems"
- Not operations: "How to build a new service" (that's how-to)
- Not operations: "What is monitoring?" (that's explanation)
- Not operations: "Metrics API reference" (that's reference)
Key characteristics:
- For operators - SRE, DevOps, on-call engineers
- Production-focused - Real systems, real data
- Process-oriented - Repeatable procedures
- Always up-to-date - Critical for reliability
- Actionable - Clear steps to take
Writing Guidelines
Structure
For runbooks:
# [Service/System] Operations
## Overview
[What this system does, criticality]
## Monitoring
[Where to check health, key metrics]
## Common Issues
### Issue: [Problem]
**Symptoms:** [How you know]
**Diagnosis:** [How to confirm]
**Resolution:** [Steps to fix]
## Escalation
[When to escalate, who to contact]
For procedures:
# [Procedure Name]
## When to Use
[Circumstances requiring this procedure]
## Prerequisites
[Access, tools, permissions needed]
## Steps
1. [Action with expected result]
2. [Action with expected result]
## Verification
[How to confirm success]
## Rollback
[How to undo if needed]
Style
- Be extremely clear and precise
- Assume operator is under pressure
- Include expected results for each step
- Provide specific commands, not descriptions
- Test procedures regularly
- Keep current with production
Types of Operations Docs
1. Runbooks
Guides for operating specific services:
- Service overview
- Monitoring and alerts
- Common issues and fixes
- Escalation procedures
2. Procedures
Step-by-step processes:
- Deployment procedures
- Backup and restore
- Certificate renewal
- Database migrations
3. Process Documentation
Team workflows:
- Documentation standards
- Release process
- Incident response process
- On-call rotation
4. Standards
Operational standards:
- Naming conventions
- Tagging standards
- Security requirements
- Compliance checklists
When to Create Operations Docs
Create operations docs when:
- ✅ Launching a new service to production
- ✅ Documenting an incident response
- ✅ Establishing a new process
- ✅ Something requires regular maintenance
- ✅ On-call engineers need guidance
Don't create operations docs for:
- ❌ Development processes (use how-to)
- ❌ Architectural concepts (use explanation)
Maintaining Operations Docs
Operations docs require special attention:
- Test regularly - Run through procedures
- Update immediately - After incidents or changes
- Review with team - Ensure accuracy
- Version control - Track changes over time
- Make accessible - Easy to find during incidents
Critical: Out-of-date operations docs are dangerous!
Organization
Organize by:
- Service/component - Per-service runbooks
- Process type - Deployment, monitoring, incident response
- Responsibility - Dev, ops, security
Checklist for Good Operations Docs
- Tested - Procedure works as written
- Current - Matches production reality
- Complete - All steps included
- Clear - Operator under stress can follow
- Specific - Actual commands, not descriptions
- Safe - Includes rollback procedures
- Accessible - Easy to find when needed
Related to Diátaxis
Operations docs don't fit cleanly into Diátaxis (which focuses on software documentation), but they're closest to how-to guides with these differences:
- Focus on running not building
- For operators not developers
- Must be battle-tested and current
- Used during incidents (high pressure)
Questions? See Documentation Setup