Our stack at Escape is written in multiple languages because each team has specific needs. We use TypeScript for its vibrant ecosystem, Python for cybersecurity research and Go for performance-sensitive tasks. To orchestrate cross-language task orchestration, we first developed a simple request-response protocol over HTTP, but it wasn't sustainable as the Escape codebase grew rapidly. We evaluated several technologies to replace our homegrown protocol, and two emerged as the most promising options: Connect and Temporal. The title gives it away, but the reason is far from obvious.
Temporal vs Connect (gRPC)
On the one hand, Connect is a gRPC-compatible implementation that allows direct communication between two processes. On the other hand, Temporal is a centralized task orchestration system. Temporal does not even advertise itself as cross-language capable! You'd have to dig into their GitHub repos to know that it's posible.
Since we already use Protobuf internally, any gRPC implementation would have been the expected next step, but our requirements took us in the opposite direction.
- Cross-language type safety: This requirement should favor gRPC over all technologies since it is the direct implementation of this requirement. However, we have already satisfied this requirement for any technology by using Protobuf to securely serialize and deserialize messages.
- Observability: This is where Temporal's centralized model starts to shine in this comparison. Having an orchestrator allows logs and results to be in the same place. In addition, Temporal ships with a simple and clean web UI that allows a developer to see what has been exchanged between all the processes involved.
- Reliability and scalability: Temporal puts all pending tasks into queues, meaning that a process restart will not crash the remote procedure call, just delay it a bit. The queuing mechanism also allows multiple workers to handle the task out-of-the-box, whereas the same feature in gRPC would have required the deployment out a load balancer.
It turned out that we needed more than just a request-response protocol. We also had use for Temporal features outside of RPC, and since Escape is a small company, we preferred to introduce one complex tool rather than two.
What is Temporal?
Temporal is a distributed, scalable, and reliable platform for building durable workflow applications. It provides a programming model that makes it easy to write code that can handle complex and long-running workflows, even in the face of failures.
Temporal is used by companies like Netflix, Uber, and Snapchat to build critical applications such as order processing, payment processing, and customer onboarding.
A Temporal application is built around two concepts: workflows and activities.
- Workflows are deterministic sequences of instructions that can be resumed and monitored with minimal effort. They execute side effects through activities, which can be non-deterministic.
- Activities are units of work performed asynchronously by a Temporal worker. Activities are typically used to perform complex tasks, such as calling an external service, transcoding a media file, or sending an email message. They can be stateful, whereas workflows are completely stateless.
A workflow execution can last for days, even months, thanks to the reentrant property of workflows: when its worker stops, the workflow resumes exactly where it was before, replaying all the instructions in order on a new worker. You can
sleep for days without worrying that your process will be restarted!
There is much more to learn about what Temporal is and what it can do, but this short section should have given you an idea of the power of the tool. See their documentation, including schemas, for more information about what Temporal is.
Temporal for cross-language RPC
What may not be obvious from the examples above is that workflows and activities can be written in different languages, as can the code that starts a workflow and the workflow itself.
Thanks to this ability, we built RPC over Temporal:
As long as you properly define workflow names and queues with the same names, Temporal will do all the heavy lifting of sending reliable messages, even across different programming languages. Temporal comes with sensible defaults, but all parameters, such as retries and timeouts, can be customized per user's needs.
Temporal can also be used to migrate between languages, allowing for both legacy and modern workers to run on the same queue, regardless of their implementation language. Although we have not yet used Temporal for this purpose, this potential use case also weighs in favor of Temporal.
Pros and cons
It may seem like we have nothing but compliments for Temporal, but all software has its drawbacks, Temporal included. Let's start by adding a few more qualities to the list compiled in this article:
- Temporal is incremental: It's not an all-or-nothing technology; we've only started using it in a few places where we need it most.
- It's easy for basic stuff: We only need a subset of Temporal's features, and we were able to learn them very quickly. Creating basic workflows like the ones above requires only limited knowledge of the Temporal API.
However, the following negative aspects can be noted:
- The documentation is vast and incomplete: There are several official tutorial websites, and some pages have a large "WORK IN PROGRESS" banner at the top. It's hard to know what to learn and how to learn it.
- There seem to be a lot of gotchas: There are many ways to do the same thing, and some may appear more idiomatic than others to experienced developers. The lack of clear guidelines makes it hard to know best practices, and we learn them the hard way, through trial and error.
- Temporal Cloud is really expensive: It starts at $225/month, which means small businesses and hobbyists will have to host their instance themselves. Although well documented, setting up and keeping Temporal up to date is a technical and time-consuming task.
We still believe that choosing Temporal was the right choice for our technical stack and needs, but you may want to reconsider adopting Temporal in light of these drawbacks.
We hope you enjoyed this article and found it enlightening. We're always excited to hear back and always open to feedback - feel free to send comments wherever you came across this article! If you want hands-on experience, you can find all the code of this article in a working environment in a GitHub repository.
Want to learn more about new technologies being implemented at Escape? We wrote many times about the introduction of new technologies in the Escape stack:
- Using Protobuf with TypeScript, a way to ensure safe serialization and deserialization of network data.
- Set up a design system using Storybook and Figma, how our frontend developers introduced tools to improve the brand consistency across Escape.
- Code quality controls in our Node.js monorepo, how we created a consistent coding-style across a large codebase.