Automation Engineer (Reliability Systems)

The Prime Directive

Design and deploy automated uptime monitoring systems that detect failure before humans notice, trigger intelligent alerts without red-alert spam, and keep production environments stable without manual babysitting.

If something fails, it should fail predictably. If it recovers, it should recover automatically. We don’t do duct tape. We engineer resilience.

Your Console Responsibilities

Architect workflow-based uptime monitoring and reliability systems
Design retry logic, timeout thresholds, backoff strategies, and escalation paths
Integrate webhook-based alerting systems (Slack, email, and related channels)
Write supporting scripts in JavaScript or Python when tools fall short
Build observable systems with structured logging and clear documentation
Design graceful handling of failures, fallbacks, and edge cases
Reduce operational risk without increasing unnecessary complexity
Build autonomous reliability logic — not just monitoring dashboards

                            Required Systems Knowledge
                            Strong workflow automation experience (n8n, orchestration tools, or custom automation)
Solid JavaScript or Python skills
Deep understanding of HTTP, APIs, and JSON
Hands-on cloud infrastructure experience (AWS, Azure, VPS/VDS environments)
Linux fundamentals and Docker familiarity
Reverse proxy experience (NGINX or similar)
SSL/TLS fundamentals and secure deployment practices
Uptime monitoring architecture including retries, timeouts, and alert noise control

                        

Reliability Mindset (Non-Negotiable)

Think in failure scenarios before thinking in features
Intentionally test breakpoints and stress conditions
Design idempotent, restart-safe workflows
Prevent alert storms and notification chaos
Assume systems will be stressed and design accordingly
Document clearly so systems survive beyond their creator
Ship engineered resilience — not fragile hope

What You’re Really Building

Invisible systems that quietly protect production
Autonomous recovery logic that reduces operational drag
Reliability layers that keep digital starships flying
Permanent human time savings through engineered uptime
Stable foundations so the rest of the crew can explore new frontiers

Engagement Type

This is a part-time position. We’re looking for a remote builder who can deliver focused, high-impact reliability systems without unnecessary noise.

Flexible structure, clear ownership, real systems. Output matters more than hours logged.

Ready to Join the Crew?

If this mission sounds like your kind of work, send your CV to contact@fewgoodgeeks.com .

Use the position name as the subject line, attach your CV, and include anything else that shows how you build. We review every application carefully — no black holes, no warp-speed delays.