Documentation

operations/README.md

About Operations Documentation

What's Here

Operational guides for running and maintaining Dynaplex in production and development environments.

Current docs:

  • operations.md - Production operations guide
  • documentation-setup.md - Maintaining documentation

What Belongs Here

Operations documentation covers:

  • Production operations - Running services in production
  • Monitoring and observability - Health checks, metrics, logs
  • Incident response - What to do when things go wrong
  • Maintenance procedures - Regular tasks and upkeep
  • Process documentation - Team workflows and standards

Examples of operations docs:

  • "Monitoring Dynaplex services"
  • "Incident response runbook"
  • "Database backup procedures"
  • "Service health check guide"
  • "Documentation maintenance process"
  • "Release management workflow"

What Doesn't Belong Here

❌ How to build features → Use how-to/
❌ Learning materials → Use tutorials/
❌ API reference → Use reference/
❌ Architecture explanations → Use explanation/

How to Think About This

Analogy: Operations manual for a factory

  • Operations: "How to monitor production systems"
  • Not operations: "How to build a new service" (that's how-to)
  • Not operations: "What is monitoring?" (that's explanation)
  • Not operations: "Metrics API reference" (that's reference)

Key characteristics:

  • For operators - SRE, DevOps, on-call engineers
  • Production-focused - Real systems, real data
  • Process-oriented - Repeatable procedures
  • Always up-to-date - Critical for reliability
  • Actionable - Clear steps to take

Writing Guidelines

Structure

For runbooks:

# [Service/System] Operations

## Overview
[What this system does, criticality]

## Monitoring
[Where to check health, key metrics]

## Common Issues
### Issue: [Problem]
**Symptoms:** [How you know]
**Diagnosis:** [How to confirm]
**Resolution:** [Steps to fix]

## Escalation
[When to escalate, who to contact]

For procedures:

# [Procedure Name]

## When to Use
[Circumstances requiring this procedure]

## Prerequisites
[Access, tools, permissions needed]

## Steps
1. [Action with expected result]
2. [Action with expected result]

## Verification
[How to confirm success]

## Rollback
[How to undo if needed]

Style

  • Be extremely clear and precise
  • Assume operator is under pressure
  • Include expected results for each step
  • Provide specific commands, not descriptions
  • Test procedures regularly
  • Keep current with production

Types of Operations Docs

1. Runbooks

Guides for operating specific services:

  • Service overview
  • Monitoring and alerts
  • Common issues and fixes
  • Escalation procedures

2. Procedures

Step-by-step processes:

  • Deployment procedures
  • Backup and restore
  • Certificate renewal
  • Database migrations

3. Process Documentation

Team workflows:

  • Documentation standards
  • Release process
  • Incident response process
  • On-call rotation

4. Standards

Operational standards:

  • Naming conventions
  • Tagging standards
  • Security requirements
  • Compliance checklists

When to Create Operations Docs

Create operations docs when:

  • ✅ Launching a new service to production
  • ✅ Documenting an incident response
  • ✅ Establishing a new process
  • ✅ Something requires regular maintenance
  • ✅ On-call engineers need guidance

Don't create operations docs for:

  • ❌ Development processes (use how-to)
  • ❌ Architectural concepts (use explanation)

Maintaining Operations Docs

Operations docs require special attention:

  • Test regularly - Run through procedures
  • Update immediately - After incidents or changes
  • Review with team - Ensure accuracy
  • Version control - Track changes over time
  • Make accessible - Easy to find during incidents

Critical: Out-of-date operations docs are dangerous!

Organization

Organize by:

  • Service/component - Per-service runbooks
  • Process type - Deployment, monitoring, incident response
  • Responsibility - Dev, ops, security

Checklist for Good Operations Docs

  • Tested - Procedure works as written
  • Current - Matches production reality
  • Complete - All steps included
  • Clear - Operator under stress can follow
  • Specific - Actual commands, not descriptions
  • Safe - Includes rollback procedures
  • Accessible - Easy to find when needed

Operations docs don't fit cleanly into Diátaxis (which focuses on software documentation), but they're closest to how-to guides with these differences:

  • Focus on running not building
  • For operators not developers
  • Must be battle-tested and current
  • Used during incidents (high pressure)

Questions? See Documentation Setup