Data access should be an infrastructure problem

The dire state of data access

Modern cloud-native software systems are characterized by an abundance of data systems. It sometimes seems that many companies, especially those that adopt a microservices-oriented architecture early on, have more databases than engineers to manage them.

Much has been written about the complexity and overhead that is induced by adopting a microservices architecture. In fact, an entire landscape of cloud-native projects and companies has risen in the wake of the microservices revolution. New fields such as container orchestration, service mesh, and distributed tracing were created to deal with the sprawling complexity of these architectures.

The proliferation of data systems in typical cloud-native applications is a direct consequence of widely accepted design principles such as “Database per Service”. In addition, performance and cost considerations for applications running at a large scale led to the adoption of many specialized data storage technologies (such as in-memory key-value, document, and object/blob stores) that are now running side-by-side with more traditional relational databases. Furthermore, many applications opt to read and write data from third-party APIs that abstract the concrete storage technology away from them.

With his plethora of data technologies that a single application must access, comes the direct consequence that data access becomes very complex. Functionality that has been supported by relational database engines for decades like reading, writing, or deleting a record, support for transactions (performing multiple changes that must completely succeed or fail), and performing joins between different collections of data, have all now become distributed systems problems, that are notoriously hard to design and implement correctly.

Like it or not, microservices architectures, along with their accompanying disintegrated data model are here to stay. Even if we see a return of monolithic deployments come back to fashion (e.g. “The Majestic Monolith"), it is very hard to imagine the same happening on the data infrastructure layer.

Technical infrastructure

Wikipedia defines Infrastructure as “the set of fundamental facilities and systems that support the sustainable functionality of households and firms. Serving a country, city, or other area, including the services and facilities necessary for its economy to function.” In other words, infrastructure is the common layer of systems that we need to sustain the functionality of our lives and companies. We cannot even begin thinking about running a successful business in our society without all of the fundamental infrastructures that modern cities provide us with: electricity, fresh water, waste removal, roads and transportation, communication networks, etc.

The evolution of software engineering can be thought of as a continuous process of taking the most common hard problems and turning them into technical infrastructure. Consider some seminal technical projects from recent years: “public cloud” services (such as AWS EC2 or Google Cloud), data infrastructure (such as Apache Spark or Kafka), networking, and messaging (such as gRPC or Envoy). The common thread between these technologies is that they have all become the technical infrastructure for today’s applications, they are the water lines, sewage systems, and electricity grids of our industry.

The Operational Data Graph

We founded Ariga out of frustration from the immense complexity of building software that is backed by a disintegrated data model. Such software is complex to design, build and manage in production; it is complex to understand, secure, and govern. Ariel and I have always been obsessed with improving the process of building software, looking for ways to push the envelope on how to make hard things simpler, safer, and cheaper.

Together we steward Ent, a Linux Foundation-backed project that is loved by software engineers in companies of all sizes and shapes. Ent is an entity framework for Go that provides application developers with robust, privacy-aware APIs for working with their data. As we near our 10,000th stargazer on GitHub, we are looking to expand Ent’s success to more programming languages and distributed data architectures.

With our team of hard-core data and infrastructure engineers, we are working on a platform that is focused on providing an amazing developer experience that enables developers to manage, access, and maintain a complex data architecture as if it were a single database. We call it the Operational Data Graph.

This platform provides companies with a simple way to build and maintain data topologies that capture all of the company’s data systems in one connected graph. Next, the platform unlocks management capabilities to safely evolve the structure and schemas of the different components in the graph. Finally, developers can automatically expose different parts of the graph as APIs in many popular standards (such as REST/OpenAPI, GraphQL, gRPC, etc.) with our high-performance, privacy-aware access layer.

With this platform, we hope to usher in an era in software engineering where data access becomes an infrastructure problem, allowing developers to once again focus on providing value to their users instead of solving the same distributed systems problems again and again.