Home / Welcome to OneBlog Engineering: Building Scalable, Resilient Services with Elixir

Welcome to OneBlog Engineering: Building Scalable, Resilient Services with Elixir

OneBlog runs Elixir on BEAM in AWS Fargate, using PostgreSQL for relational data and MongoDB for event logs. Services talk via GenServer RPC with Libcluster‑based discovery, are observed via OpenTelemetry, secured with Erlang cookies, and deployed through canary CI/CD. Future work includes hot‑code upgrades, edge‑aware clustering, and serverless GenServers.

October 22, 2025
Share:

previewI'm Justin Bean, the CTO at OneBlog. Welcome to our engineering blog.

First, an introduction into our stack at OneBlog — our main language is Elixir. Elixir is a functional language heavily influenced by Ruby but with the philosophy that a cluster should be made up of many small processes, rather than fewer, larger servers. This design choice aligns perfectly with the modern micro‑services mindset, where each piece of functionality runs in isolation, can be restarted independently, and communicates through well‑defined interfaces. The BEAM VM that powers Elixir gives us lightweight concurrency, fault tolerance, and hot code upgrades—all of which are essential for the kind of always‑on, high‑traffic platform we run at OneBlog.


We deploy our Elixir code to AWS using containers that run as Fargate tasks in Elastic Container Service (ECS). This serverless container model lets us focus on code rather than managing EC2 instances, auto‑scaling groups, or patch cycles. Each Fargate task is a self‑contained runtime environment that includes the BEAM, our compiled releases, and a minimal OS layer. Because tasks are immutable, deployments are a simple “push‑new‑image, retire‑old‑tasks” workflow. Rollbacks are as easy as re‑tagging a previous image and redeploying.

Our data layer consists of two primary stores. The core relational data lives in Amazon RDS for PostgreSQL, which gives us ACID guarantees, powerful SQL querying, and a mature ecosystem of extensions (such as pg_partman for partitioning and pg_stat_statements for query analysis). For event‑driven features—particularly anything that involves AI—we record every interaction in a MongoDB/DocumentDB collection. This event log is the single source of truth for audit trails, debugging sessions, and, eventually, event replay for downstream analytics pipelines. By separating transactional data from immutable event streams we keep our write path fast while still supporting rich, time‑travel queries.

You're reading this right now on our “client” app, and the content is managed through our content management system, or “CMS.” These are two separate Elixir codebases that live in distinct repositories, have independent CI pipelines, and are versioned independently. Instead of connecting our system using private or public APIs, we use the power of Elixir to serve content directly through RPC (remote procedure calls) using a built‑in language feature called GenServer. Each service runs one or more GenServer processes that expose a thin, typed interface. When the client needs a piece of content, it opens a GenServer call to the CMS node, passes a simple request struct, and receives a response struct. This approach eliminates HTTP overhead, reduces latency (sub‑millisecond round‑trips inside the same VPC), and leverages BEAM’s built‑in distribution layer for secure node‑to‑node communication.

On startup, our apps go through a service registry and discovery process powered by Libcluster and the AWS ECS API. When a new Fargate task boots, Libcluster queries the ECS “list‑tasks” endpoint, extracts the private IP addresses of all tasks that belong to the same service, and automatically forms a mesh network.

Security is baked into every layer. Because our services communicate over the BEAM’s distribution protocol, we enforce node authentication using Erlang’s built‑in cookie mechanism, which is rotated automatically via a Secrets Manager rotation Lambda. All container images are scanned for vulnerabilities with Amazon ECR image scanning, and we enforce a “no‑root” user policy inside each container. Network traffic between services stays within a private VPC, protected by security groups that only allow the ports required for BEAM clustering and our custom RPC endpoints.

One of the most exciting parts of our stack is how we handle AI‑driven features. When a user triggers an AI workflow—say, generating a personalized blog recommendation—we log the request and the model’s output as a discrete event in MongoDB/DocumentDB. This immutable log enables three powerful capabilities:

  1. Auditing: We can reconstruct exactly what model version, input data, and parameters were used for any given recommendation, satisfying compliance requirements.
  2. Troubleshooting: If a user reports a bad recommendation, we can replay the event through a sandboxed model version to debug the issue without affecting live traffic.
  3. Event replay: Downstream analytics pipelines consume the event stream to train new recommendation models, run A/B tests, or generate business intelligence reports.

All of these efforts are driven by a simple mantra: let the language and runtime do the heavy lifting, so our engineers can focus on delivering value to our users. Elixir’s concurrency model, combined with AWS’s managed services, gives us a platform that scales horizontally, recovers gracefully, and stays observable from the moment a request hits the edge to the moment it lands in a persistent store.

More on our distributed approach in the next article.