HUMAN TIME SAVED

Automation Engineer (Reliability Systems)

The Prime Directive

Design and deploy automated uptime monitoring systems that detect failure before humans notice, trigger intelligent alerts without red-alert spam, and keep production environments stable without manual babysitting.

If something fails, it should fail predictably. If it recovers, it should recover automatically. We don’t do duct tape. We engineer resilience.

Your Console Responsibilities

  • Architect workflow-based uptime monitoring and reliability systems
  • Design retry logic, timeout thresholds, backoff strategies, and escalation paths
  • Integrate webhook-based alerting systems (Slack, email, and related channels)
  • Write supporting scripts in JavaScript or Python when tools fall short
  • Build observable systems with structured logging and clear documentation
  • Design graceful handling of failures, fallbacks, and edge cases
  • Reduce operational risk without increasing unnecessary complexity
  • Build autonomous reliability logic — not just monitoring dashboards

Required Systems Knowledge

  • Strong workflow automation experience (n8n, orchestration tools, or custom automation)
  • Solid JavaScript or Python skills
  • Deep understanding of HTTP, APIs, and JSON
  • Hands-on cloud infrastructure experience (AWS, Azure, VPS/VDS environments)
  • Linux fundamentals and Docker familiarity
  • Reverse proxy experience (NGINX or similar)
  • SSL/TLS fundamentals and secure deployment practices
  • Uptime monitoring architecture including retries, timeouts, and alert noise control

Reliability Mindset (Non-Negotiable)

  • Think in failure scenarios before thinking in features
  • Intentionally test breakpoints and stress conditions
  • Design idempotent, restart-safe workflows
  • Prevent alert storms and notification chaos
  • Assume systems will be stressed and design accordingly
  • Document clearly so systems survive beyond their creator
  • Ship engineered resilience — not fragile hope

What You’re Really Building

  • Invisible systems that quietly protect production
  • Autonomous recovery logic that reduces operational drag
  • Reliability layers that keep digital starships flying
  • Permanent human time savings through engineered uptime
  • Stable foundations so the rest of the crew can explore new frontiers

Engagement Type

This is a part-time position. We’re looking for a remote builder who can deliver focused, high-impact reliability systems without unnecessary noise.

Flexible structure, clear ownership, real systems. Output matters more than hours logged.

Ready to Join the Crew?

If this mission sounds like your kind of work, send your CV to contact@fewgoodgeeks.com .

Use the position name as the subject line, attach your CV, and include anything else that shows how you build. We review every application carefully — no black holes, no warp-speed delays.