Programmatic inspection of databases in Go using Atlas

Database inspection is the process of connecting to a database to extract metadata about the way data is structured inside it. In this post, we will present some use cases for inspecting a database, demonstrate why it is a non-trivial problem to solve, and finally show how it can be solved using Atlas, an open-source package (and command-line tool) written in Go that we are maintaining at Ariga.

As an infrastructure engineer, I have wished many times to have a simple way to programmatically inspect a database. Database schema inspection can be useful for many purposes. For instance, you might use it to create visualizations of data topologies, or use it to find table columns that are no longer in use and can be deprecated. Perhaps you would like to automatically generate resources from this schema (such as documentation or GraphQL schemas), or to use to locate fields that might carry personally-identifiable information for compliance purposes. Whatever your use case may be, having a robust way to get the schema of your database is the foundation for many kinds of infrastructure applications.

When we started working on the core engine for Atlas, we quickly discovered that there wasn’t any established tool or package that could parse the information schema of popular databases and return a data structure representing it. Why is this the case? After all, most databases provide some command-line tool to perform inspection. For example, psql, the standard CLI for Postgres, supports the \d command to describe a table:

postgres=# \d users;
                       Table "public.users"
 Column |          Type          | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
 id     | integer                |           | not null |
 name   | character varying(255) |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)

So what makes inspection a non-trivial problem to solve? In this post, I will discuss two aspects that I think are interesting. The first is the variance in how databases expose schema metadata and the second is the complexity of the data model that is required to represent a database schema.

How databases expose schema metadata

Most of the SQL that we use in day-to-day applications is pretty standard. However, when it comes to exposing schema metadata, database engines vary greatly in the way they work. The way to retrieve information about things like available schemas and tables, column types and their default values and many other aspects of the database schema looks completely different in each database engine. For instance, consider this query (source) which can be used to get the metadata about table columns from a Postgres database:

SELECT t1.table_name,
       t1.column_name,
       t1.data_type,
       t1.is_nullable,
       t1.column_default,
       t1.character_maximum_length,
       t1.numeric_precision,
       t1.datetime_precision,
       t1.numeric_scale,
       t1.character_set_name,
       t1.collation_name,
       t1.udt_name,
       t1.is_identity,
       t1.identity_start,
       t1.identity_increment,
       t1.identity_generation,
       col_description(to_regclass("table_schema" || '.' || "table_name")::oid, "ordinal_position") AS comment,
       t2.typtype,
       t2.oid
FROM "information_schema"."columns" AS t1
         LEFT JOIN pg_catalog.pg_type AS t2
                   ON t1.udt_name = t2.typname
WHERE table_schema = $1
  AND table_name IN (%s)
ORDER BY t1.table_name, t1.ordinal_position

As you can see, while it’s definitely possible to get the needed metadata, information about the schema is stored in multiple tables in a way that isn’t particularly well documented, and often requires delving into the actual source code to understand fully. Here’s a query to get similar information from MySQL (source):

SELECT `TABLE_NAME`,
       `COLUMN_NAME`,
       `COLUMN_TYPE`,
       `COLUMN_COMMENT`,
       `IS_NULLABLE`,
       `COLUMN_KEY`,
       `COLUMN_DEFAULT`,
       `EXTRA`,
       `CHARACTER_SET_NAME`,
       `COLLATION_NAME`
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA` = ?
  AND `TABLE_NAME` IN (%s)
ORDER BY `ORDINAL_POSITION`

While this query is much shorter, you can see that it’s completely different from the one we ran to inspect Postgres column metadata. This demonstrates just one way in inspecting Postgres is difference from inspecting MySQL.

Mapping database schemas into a useful data structure

To be a solid foundation for building infrastructure, inspection must produce a useful data structure that can be traversed and analyzed to provide insights, in other words, a graph representing the data topology. As mentioned above, such graphs can be used to create ERD (entity-relation diagram) charts, such as the schema visualizations on the Atlas Management UI:

Schema ERD open

Let’s consider some aspects of database schemas that such a data structure should capture:

  • Databases are split into logical schemas.
  • Schemas contain tables, and may have attributes (such as default collation).
  • Tables contain columns, indexes and constraints.
  • Columns are complex entities that have types, that may be standard to the database engine (and version) or custom data types that are defined by the user. In addition, Columns may have attributes, such as default values, that may be a literal or an expression (it is important to be able to discern between now() and "now()").
  • Indexes contain references to columns of the table they are defined on.
  • Foreign Keys contain references to column in other tables, that may reside in other schemas.
  • …and much, much more!

To capture any one of these aspects boils down to figuring out the correct query for the specific database engine you are working with. To be able to provide developers with a data structure that captures all of it, and to do it well across different versions of multiple database engines we’ve learned, is not an easy task. This is a perfect opportunity for an infrastructure project: a problem that is annoyingly complex to solve and that if solved well, becomes a foundation for many kinds of applications. This was one of our motivations for creating Atlas (GitHub) - an open-source project that we maintain here at Ariga.

Using Atlas, database schemas can be inspected to product Go structs representing a graph of the database schema topology. Notice the many cyclic references that make it hard to print (but very ergonomic to travere :-)):

&schema.Realm{
    Schemas: {
        &schema.Schema{
            Name:   "test",
            Tables: {
                &schema.Table{
                    Name:    "users",
                    Schema:  &schema.Schema{(CYCLIC REFERENCE)},
                    Columns: {
                        &schema.Column{
                            Name: "id",
                            Type: &schema.ColumnType{
                                Type: &schema.IntegerType{
                                    T:        "int",
                                    Unsigned: false,
                                },
                                Null: false,
                            },
                        },
                    },
                    PrimaryKey: &schema.Index{
                        Unique: false,
                        Table:  &schema.Table{(CYCLIC REFERENCE)},
                        Attrs:  nil,
                        Parts:  {
                            &schema.IndexPart{
                                SeqNo: 0,
                                Desc:  false,
                                C:     &schema.Column{(CYCLIC REFERENCE)},
                            },
                        },
                    },
                },
                &schema.Table{
                    Name:    "posts",
                    Schema:  &schema.Schema{(CYCLIC REFERENCE)},
                    Columns: {
                        &schema.Column{
                            Name: "id",
                            Type: &schema.ColumnType{
                                Type: &schema.IntegerType{
                                    T:        "int",
                                    Unsigned: false,
                                },
                                Null: false,
                            },
                        },
                        &schema.Column{
                            Name: "author_id",
                            Type: &schema.ColumnType{
                                Type: &schema.IntegerType{
                                    T:        "int",
                                    Unsigned: false,
                                },
                                Null: true,
                            },
                        },
                    },
                    PrimaryKey: &schema.Index{
                        Unique: false,
                        Table:  &schema.Table{(CYCLIC REFERENCE)},
                        Parts:  {
                            &schema.IndexPart{
                                SeqNo: 0,
                                Desc:  false,
                                C:     &schema.Column{(CYCLIC REFERENCE)},
                            },
                        },
                    },
                    ForeignKeys: {
                        &schema.ForeignKey{
                            Symbol:  "owner_id",
                            Table:   &schema.Table{(CYCLIC REFERENCE)},
                            Columns: {
                                &schema.Column{(CYCLIC REFERENCE)},
                            },
                            RefTable:   &schema.Table{(CYCLIC REFERENCE)},
                            RefColumns: {
                                &schema.Column{(CYCLIC REFERENCE)},
                            },
                            OnDelete: "SET NULL",
                        },
                    },
                },
            },
        },
    },
}

Inspecting databases in Go using Atlas

While Atlas is commonly used as a CLI tool, all of Atlas’s core-engine capabilities are available as a Go module that you can use programmatically. Let’s get started with database inspection in Go:

To install Atlas, use:

go get ariga.io/atlas@master

Drivers

Atlas currently supports three core capabilities for working with SQL schemas.

  • “Inspection” - Connecting to a database and understanding its schema.
  • “Plan” - Compares two schemas and produces a set of changes needed to reconcile the target schema to the source schema.
  • “Apply” - creates concrete set of SQL queries to migrate the target database.

In this post we will dive into the inspection with Atlas. The way inspection is done varies greatly between the different SQL databases. Atlas currently has four supported drivers:

  • MySQL
  • MariaDB
  • PostgreSQL
  • SQLite

Atlas drivers are built on top of the standard library <code>database/sql</code> package. To initialize the different drivers, we need to initialize a sql.DB and pass it to the Atlas driver constructor. For example:

package tutorial

import (
	"database/sql"
	"log"
	"testing"
	_ "github.com/mattn/go-sqlite3"
	"ariga.io/atlas/sql/schema"
	"ariga.io/atlas/sql/sqlite"
)

func Test(t *testing.T) {
	// Open a "connection" to sqlite.
	db, err := sql.Open("sqlite3", "file:example.db?cache=shared&_fk=1&mode=memory")
	if err != nil {
		log.Fatalf("failed opening db: %s", err)
	}
	// Open an atlas driver.
	driver, err := sqlite.Open(db)
	if err != nil {
		log.Fatalf("failed opening atlas driver: %s", err)
	}
	// ... do stuff with the driver
}

Inspection

As we mentioned above, inspection is one of Atlas’s core capabilities. Consider the Inspector interface in the sql/schema package:

package schema

// Inspector is the interface implemented by the different database
// drivers for inspecting multiple tables.
type Inspector interface {
	// InspectSchema returns the schema description by its name. An empty name means the
	// "attached schema" (e.g. SCHEMA() in MySQL or CURRENT_SCHEMA() in PostgreSQL).
	// A NotExistError error is returned if the schema does not exists in the database.
	InspectSchema(ctx context.Context, name string, opts *InspectOptions) (*Schema, error)

	// InspectRealm returns the description of the connected database.
	InspectRealm(ctx context.Context, opts *InspectRealmOption) (*Realm, error)
}

As you can see, the Inspector interface provides methods for inspecting on different levels:

  • InspectSchema - provides inspection capabilities for a single schema within a database server.
  • InspectRealm - inspects the entire connected database server.

Each database driver (for example MySQL, Postgres or SQLite) implements this interface. Let’s see how we can use this interface by inspecting a “dummy” SQLite database. Continuing on the example from above:

package tutorial

func TestInspect(t *testing.T) {
	// ... skipping driver creation
	ctx := context.Background()
	// Create an "example" table for Atlas to inspect.
	_, err = db.ExecContext(ctx, "create table example ( id int not null );")
	if err != nil {
		log.Fatalf("failed creating example table: %s", err)
	}
	// Open an atlas driver.
	driver, err := sqlite.Open(db)
	if err != nil {
		log.Fatalf("failed opening atlas driver: %s", err)
	}
	// Inspect the created table.
	sch, err := driver.InspectSchema(ctx, "main", &schema.InspectOptions{
		Tables: []string{"example"},
	})
	if err != nil {
		log.Fatalf("failed inspecting schema: %s", err)
	}
	tbl, ok := sch.Table("example")
	require.True(t, ok, "expected to find example table")
	require.EqualValues(t, "example", tbl.Name)
	id, ok := tbl.Column("id")
	require.True(t, ok, "expected to find id column")
	require.EqualValues(t, &schema.ColumnType{
		Type: &schema.IntegerType{T: "int"}, // An integer type, specifically "int".
		Null: false,                         // The column has NOT NULL set.
		Raw:  "INT",                         // The raw type inspected from the DB.
	}, id.Type)
}

The full source-code for this example is available in the atlas-examples repo .

And voila! In this example, we first created a table named “example” by executing a query directly against the database. Next, we used the driver’s InspectSchema method to inspect the schema of the table we created. Finally, we made some assertions on the returned schema.Table instance to verify that it was inspected correctly.

Inspecting using the CLI

If you don’t want to write any code and just want to get a document representing your database schema, you can always use the Atlas CLI to do it for you. To get started, head over to the docs.

Wrapping up

In this post we presented the Go API of Atlas, which we initially built around our use case of building a new database migration tool, as part of the Operational Data Graph Platform that we are creating here at Ariga. As we mentioned in the beginning of this post, there are a lot of cool things you can build if you have proper database inspection, which raises the question, what will you build with it?

Getting involved with Atlas

Announcing Atlas v0.3.2: multi-schema support

Last week we released v0.3.2 of the Atlas CLI.

Atlas is an open source tool that helps developers manage their database schemas. Atlas plans database migrations for you based on your desired state. The two main commands are inspect and apply. The inspect command inspects your database and the apply command runs a migration by providing an HCL document with your desired state.

The most notable change in this version is the ability to interact with multiple schemas in both database inspection and migration (the apply command).

Some other interesting features include:

  • schema apply --dry-run - running schema apply in dry-run mode connects to the target database and prints the SQL migration to bring the target database to the desired state without prompting the user to approve it.
  • schema fmt - adds basic formatting capabilities to .hcl files.
  • schema diff - Connects to two given databases, inspects them, calculates the difference in their schemas, and prints a plan of SQL statements needed to migrate the “from” database to the state of the “to” database.

In this post we will explore the topic of multi-schema support. We will start our discussion with a brief explanation of database schemas, next we’ll present the difference between how MySQL and PostgreSQL treat “schemas”. We will then show how the existing schema inspect and schema apply commands work with multi-schema support, and wrap up with some plans for future releases.

What is a database schema?

Within the context of relational (SQL) databases, a database schema is a logical unit within a physical database instance (server/cluster) that forms a namespace of sorts. Inside each schema you can describe the structure of the tables, relations, indexes and other attributes that belong to it. In other words, the database schema is a “blueprint” of the data structure inside a logical container (Note: in Oracle databases a schema is linked to the user, so it carries a different meaning which is out of scope for this post). As you can guess from the title of this post, many popular relational databases allow users to host multiple (logical) schemas on the same (physical) database.

Where are database schemas used in practice?

Why is this level of logical division necessary? Isn’t it enough to be able physically split data into different database instances? In my career, I’ve seen multiple scenarios in which organizations opt to split a database into multiple schemas.

First, grouping different parts of your application into logical units makes it simpler to reason about and govern. For instance, it is possible to create multiple user accounts in our database and give each of them permission to access a subset of the schemas in the database. This way, each user can only touch the parts of the database they need, preventing the practice of creating an almighty super-user account that has no permission boundary.

An additional pattern I’ve seen used, is in applications with a multi-tenant architecture where each tenant has its own schema with the same exact table structure (or some might have a different structure since they use different versions of the application). This pattern is used to create a stronger boundary between the different tenants (customers) preventing the scenario where one tenant accidentally has access to another’s data that is incidentally hosted on the same machine.

Another useful feature of schemas is the ability to divide the same server into different environments for different development states. For example, you can have a “dev” and “staging” schema inside the same server.

What are the differences between schemas in MySQL and PostgreSQL?

A common source of confusion for developers (especially when switching teams or companies) is the difference between the meaning of schemas in MySQL and PostgreSQL. Both are currently supported by Atlas, and have some differences that should be clarified.

Looking at the MySQL glossary, it states:

“In MySQL, physically, a schema is synonymous with a database. You can substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax, for example using CREATE SCHEMA instead of CREATE DATABASE”

As we can see, MySQL doesn’t distinguish between schemas and databases in the terminology, but the underlying meaning is still the same - a logical boundary for resources and permissions.

To demonstrate this, open your favorite MySQL shell and run:

mysql> create schema atlas;
Query OK, 1 row affected (0.00 sec)

To create a table in our new schema, assuming we have the required permissions, we can switch to the context of the schema that we just created, and create a table:

USE atlas;
CREATE table some_name (
    id int not null
);

Alternatively, we can prefix the schema, by running:

CREATE TABLE atlas.cli_versions
(
    id      bigint auto_increment primary key,
    version varchar(255) not null
);

This prefix is important since, as we said, schemas are logical boundaries (unlike database servers). Therefore, we can create references between them using foreign keys from tables in SchemaA to SchemaB. Let’s demonstrate this by creating another schema with a table and connect it to a table in the atlas schema:

CREATE SCHEMA atlantis;

CREATE TABLE atlantis.ui_versions
(
    id               bigint auto_increment
        primary key,
    version          varchar(255) not null,
    atlas_version_id bigint       null,
    constraint ui_versions_atlas_version_uindex
        unique (atlas_version_id)
);

Now let’s link atlantis to atlas:

alter table atlantis.ui_versions
    add constraint ui_versions_cli_versions_id_fk
        foreign key (atlas_version_id) references atlas.cli_versions (id)
            on delete cascade;

That’s it! We’ve created 2 tables in 2 different schemas with a reference between them.

How does PostgreSQL treat schemas?

When booting a fresh PostgreSQL server, we get a default logical schema called “public”. If you wish to split your database into logical units as we’ve shown with MySQL, you can create a new schema:

CREATE SCHEMA atlas;

Contrary to MySQL, Postgres provides an additional level of abstraction: databases.
In Postgres, a single physical server can host multiple databases. Unlike schemas (which are basically the same as in MySQL) - you can’t reference a table from one PostgreSQL database to another.

In Postgres, the following statement will create an entirely new database, where we can place different schemas and tables with that may contain references between them:

create database releases;

When we run this statement, the database will be created with the default Postgres metadata tables and the default public schema.

In Postgres, you can give permissions to an entire database(s), schema(s), and/or table(s), and of course other objects in the Postgres schema.

Another distinction from MySQL is that in addition to sufficient permissions, a user must have the schema name inside their search_path in order to use it without a prefix.

To sum up, both MySQL and Postgres allow the creation of separate logical schemas within a physical database server, schemas can refer to one another via foreign-keys. PostgreSQL supports an additional level of separation by allowing users to create completely different databases on the server.

Atlas multi-schema support

As we have shown, having multiple schemas in the same database is a common scenario with popular relational databases. Previously, the Atlas CLI only supported inspecting or applying changes to a single schema (even though this has been long supported in the Go API). With this release, we have added support for inspecting and applying multiple schemas with a single .hcl file.

Next, let’s demonstrate how we can use the Atlas CLI to inspect and manage a database with multiple schemas.

Start by downloading and installing the latest version of the CLI. For the purpose of this demo, we will start with a fresh database of MySQL running in a local docker container:

docker run --name atlas-db  -p 3306:3306 -e MYSQL_ROOT_PASSWORD=pass -e MYSQL_DATABASE=example mysql:8

By passing example in the MYSQL_DATABASE environment variable a new schema named “example” is created. Let’s verify this by using the atlas schema inspect command. In previous versions of Atlas, users had to specify the schema name as part of the DSN for connecting to the database, for example:

atlas schema inspect -d "mysql://root:pass@tcp(localhost:3306)/example" 

Starting with v0.3.2, users can omit the schema name from the DSN to instruct Atlas to inspect the entire database. Let’s try this:

$ atlas schema inspect -d "mysql://root:pass@tcp(localhost:3306)/" > atlas.hcl
cat atlas.hcl
schema "example" {
  charset   = "utf8mb4"
  collation = "utf8mb4_0900_ai_ci"
}

Let’s verify that this works correctly by editing the atlas.hcl that we have created above and adding a new schema:

schema "example" {
  charset   = "utf8mb4"
  collation = "utf8mb4_0900_ai_ci"
}
schema "example_2" {
  charset   = "utf8mb4"
  collation = "utf8mb4_0900_ai_ci"
}

Next, we will use the schema apply command to apply our changes to the database:

atlas schema apply -d "mysql://root:pass@tcp(localhost:3306)/" -f atlas.hcl

Atlas plans a migration to add the new DATABASE (recall that in MySQL DATABASE and SCHEMA are synonymous) to the server, when prompted to approve the migration we choose “Apply”:

-- Planned Changes:
-- Add new schema named "example_2"
CREATE DATABASE `example_2`
✔ Apply

To verify that schema inspect works properly with multiple schemas, lets re-run:

atlas schema inspect -d "mysql://root:pass@tcp(localhost:3306)/"

Observe that both schemas are inspected:

schema "example" {
  charset   = "utf8mb4"
  collation = "utf8mb4_0900_ai_ci"
}
schema "example_2" {
  charset   = "utf8mb4"
  collation = "utf8mb4_0900_ai_ci"
}

To learn more about the different options for working with multiple schemas in inspect and apply commands, consult the CLI Reference Docs.

What’s next for multi-schema support?

I hope you agree that multi-schema support is a great improvement to the Atlas CLI, but there is more to come in this area. In our previous blogpost we have shared that Atlas also has a Management UI (-w option in the CLI) and multi-schema support is not present there yet - stay tuned for updates on multi-schema support for the UI in an upcoming release!

Getting involved with Atlas

Announcing Atlas v0.3.0: A UI-powered schema migration experience

Earlier this week we released v0.3.0 of the Atlas CLI. This version features a ton of improvements to database inspection, diffing and migration planning. You can read about those in the release notes page, but we wanted to take the time and introduce the biggest feature in this release, the Atlas Management UI.

To recap, Atlas is an open source CLI tool that helps developers manage their database schemas. Contrary to existing tools, Atlas intelligently plans schema migrations for you, based on your desired state. Atlas currently has two main commands: inspect and apply. The inspect command inspects your database, generating an Atlas HCL document. The apply command allows you to migrate your schema from its current state in the database to your desired state by providing an HCL file with the relevant schema.

In this post we will showcase the latest addition to the CLI’s feature set, the Management UI. Until now, you could use Atlas to manage your schemas via your terminal. While this is the common interface for many infrastructure management workflows, we believe that a visual, integrated environment can be beneficial in many use-cases.

Inspecting our database using the Atlas UI

Let’s see how we can use the Atlas UI to inspect our database.

For the purpose of demonstration let’s assume that you have a locally running MySQL database. If you want to follow along, check out the Setting Up tutorial on the Atlas website for instructions on starting up a MySQL database locally using Docker.

We will be working with a MySQL database that has the following tables:

CREATE table users (
    id int PRIMARY KEY,
    name varchar(100)
);
CREATE TABLE blog_posts (
    id int PRIMARY KEY,
    title varchar(100),
    body text,
    author_id int,
    FOREIGN KEY (author_id) REFERENCES users(id)
);

To inspect the database, we can use the atlas schema inspect command. Starting with this version, we can add the -w flag to open the (local) web UI:

atlas schema inspect -d "mysql://root:pass@tcp(localhost:3306)/example" -w

Our browser will open automatically, and we should see this output in the CLI:

Atlas UI available at: http://127.0.0.1:5800/projects/25769803777/schemas/1
Press Ctrl+C to stop

inspect_image

We can see that our schema has been inspected, and that it’s currently synced. On the bottom-left part of the screen the UI displays an ERD (Entity-relation Diagram) showing the different tables and the connections between them (via foreign-keys). On the bottom-right, we can see the current schema, described using the Atlas DDL. In addition, on the top-right, we see the “Activity & History” panel that holds an audit history for all changes to our schema.

Migrating our database schema with the Atlas Management UI

Visualizing the current schema of the database is great, let’s now see how we can use the UI to initiate a change (migration) to our schema.

Click on the Edit Schema button in the top-right corner and add the following two tables to our schema:

table "categories" {
  schema = schema.example
  column "id" {
    null = false
    type = int
  }
  column "name" {
    null = true
    type = varchar(100)
  }
  primary_key {
    columns = [table.categories.column.id, ]
  }
}

table "post_categories" {
    schema = schema.example
    column "post_id" {
        type = int
    }
    column "category_id" {
        type = int
    }
    foreign_key "post_category_post" {
        columns     = [table.post_categories.column.post_id, ]
        ref_columns = [table.blog_posts.column.id, ]
    }
    foreign_key "post_category_category" {
        columns     = [table.post_categories.column.category_id, ]
        ref_columns = [table.categories.column.id, ]
    }
}

Click the Save button and go back to the schema page. Observe that a few things changed on the screen:

The UI after saving

First, we can see that the UI states that our schema is “Out of Sync”. This is because there is a difference between our desired schema, the one we are currently working on, and the inspected schema, which is the actual, current schema of our database.

Second, we can see that our ERD has changed reflecting the addition of the categories and post_categories tables to our schema. These two tables that have been added are now shown in green. By clicking the “expand” icon on the top-right corner of the ERD panel, we can open a more detailed view of our schema.

ERD displaying diff

Going back to our schema page, click the “Migrate Schema” to initiate a migration to apply the changes we want to make to our schema. Next, Atlas will setup the migration. Click “Plan Migration” to see the migration plan to get to the desired schema:

Migration Prep

Atlas displays the diff in the schema in HCL on the left pane, and the planned SQL statements on the right. Click “Apply Migration” to begin executing the plan.

Migration Plan

In the final screen of the migration flow, Atlas displays informative logs about the migration process. In this case, our migration completed successfully! Let’s click “Done” to return to the schema detail page.

Applying Migration

As expected, after executing our migration plan, our database and desired schema are now synced!

Post Migrations

Wrapping Up

In this post, we’ve introduced the Atlas Management UI and showed one of the possible workflows that are supported in it. There’s much more inside, and we invite you to install it today and give it a try.

What next?

Meet Atlas CLI: Inspect and Apply changes to your database schema

At Ariga, we are building a new kind of platform that we call an Operational Data Graph. This platform enables software engineers to manage, maintain and access complex data architectures as if they were one database. Today, we are open-sourcing a CLI for Atlas, one of the fundamental building blocks of our platform.

During my career, the scope of what is expected of me as a software engineer has increased significantly. Developers are no longer expected just to write code, we are expected to provision infrastructure, manage databases, define deployments and monitor systems in production.

Nowadays, one of the responsibilities we have as software engineers is to manage the database schema of our applications. Once seen as falling strictly under the domain of DBAs, today developers everywhere are responsible for defining database schemas and changing them over time. Because an application’s database carries its state, all clients and servers are severely impacted if it stops functioning properly. Therefore, over the years many techniques and tools were developed to deal with this process, which is called migrating the database.

In the last few years we have seen a lot of progress in the field of tools for provisioning infrastructure. From early projects such as Chef and Puppet, to more recent work such as Terraform, a lot of thought and effort has been put across the industry to build tools that simplify and standardize the process. Instead of manually installing and configuring software and services, the common thread between all of these projects is that they are based on machine-readable definition files, a concept also known as infrastructure-as-code (IaC).

Enter: Atlas

Atlas is at the core of Ariga’s platform. In this post, I would like to share with you the work we’ve done so far to provide a solid foundation for managing databases in a way that’s akin to infrastructure-as-code practices.

  • The Atlas DDL (Data-definition Language): we have created the Atlas DDL, a new configuration language designed to capture an organization’s data topology - including relational database schemas. This language is currently described in an HCL syntax (similar to TerraForm), but will support more syntaxes such as JSON and TypeScript in the future. The Atlas DDL currently supports defining schemas for SQL databases such as MySQL, Postgres, SQLite and MariaDB, but in the future, we plan to add support for other types of databases. For example:
table "users" {
  schema = "default"
  column "id" {
    type = "int"
  }
  column "name" {
    type = "string"
  }
  column "manager_id" {
    type = "int"
  }
  primary_key {
    columns = [
        table.users.column.id
    ]
  }
  index "idx_name" {
    columns = [
      table.users.column.name
    ]
    unique = true
  }
  foreign_key "manager_fk" {
    columns = [table.users.column.manager_id]
    ref_columns = [table.users.column.id]
    on_delete = "CASCADE"
    on_update = "NO ACTION"
  }
}
  • The Atlas CLI On top of the building blocks provided by the DDL, we started building our CLI tool to support the two most basic functions:

    • “Schema Inspect” - Create a schema specification file from a database.
    • “Schema Apply” - Migrate a database to a new desired state.

Many infrastructure-as-code projects have taken the declarative approach, in which the developer articulates the desired state of the system and the tool is responsible for figuring out a plan to get there. As we discussed above, changing database schemas safely is a delicate practice, so we had to build the Atlas CLI to be smart enough to understand the nuance of changes for each type of database.

Atlas in action

Let’s see how Atlas CLI works with real databases. Let’s start a MySQL container:

docker run --name atlas-db  -p 3306:3306  -e MYSQL_ROOT_PASSWORD=pass -e MYSQL_DATABASE=example   mysql:8.0.27

Connect to our database using a native client to validate:

docker  exec -it  atlas-db  mysql --password='pass' example
mysql> show tables;
Empty set (0.00 sec)

mysql>

Let’s see how Atlas inspects it:

atlas schema inspect -d "mysql://root:pass@tcp(localhost:3306)/example" > atlas.hcl

As expected, an empty schema:

# cat atlas.hcl
schema "example" {
}

Let’s update our schema to:

# cat atlas.hcl
table "users" {
  schema = "example"
  column "id" {
    null = false
    type = "int"
  }
  column "name" {
    null = false
    type = "string"
    size = 255
  }
  column "manager_id" {
    null = false
    type = "int"
  }
  primary_key {
    columns = [table.users.column.id, ]
  }
  foreign_key "manager_fk" {
    columns     = [table.users.column.manager_id, ]
    ref_columns = [table.users.column.id, ]
    on_update   = "NO ACTION"
    on_delete   = "CASCADE"
  }
  index "idx_name" {
    unique  = true
    columns = [table.users.column.name, ]
  }
  index "manager_fk" {
    unique  = false
    columns = [table.users.column.manager_id, ]
  }
}
schema "example" {
}

And apply our changes!

atlas schema apply -d "mysql://root:pass@tcp(localhost:3306)/example" -f atlas.hcl



-- Planned Changes:
-- Add Table : users
CREATE TABLE `example`.`users` (`id` int NOT NULL, `name` varchar(255) NOT NULL, `manager_id` int NOT NULL, PRIMARY KEY (`id`), UNIQUE INDEX `idx_name` (`name`), CONSTRAINT `manager_fk` FOREIGN KEY (`manager_id`) REFERENCES `example`.`users` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE) ;
Use the arrow keys to navigate: ↓ ↑ → ←
? Are you sure?:
  ▸ Apply
    Abort

Of course we are sure !

Using CLI to examine our database:

mysql> describe users;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | int          | NO   | PRI | NULL    |       |
| name       | varchar(255) | NO   | UNI | NULL    |       |
| manager_id | int          | NO   | MUL | NULL    |       |
+------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql>

Let’s make sure that it has the FK:

mysql> show create table users;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                            |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| users | CREATE TABLE `users` (
  `id` int NOT NULL,
  `name` varchar(255) NOT NULL,
  `manager_id` int NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_name` (`name`),
  KEY `manager_fk` (`manager_id`),
  CONSTRAINT `manager_fk` FOREIGN KEY (`manager_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>

Now let’s see that Atlas inspects this correctly:

atlas schema inspect -d "mysql://root:pass@tcp(localhost:3306)/example" > atlas.hcl
# cat atlas.hcl
table "users" {
  schema = "example"
column "id" {
    null = false
    type = "int"
  }
  column "name" {
    null = false
    type = "string"
    size = 255
  }
  column "manager_id" {
    null = false
    type = "int"
  }
  primary_key {
    columns = [table.users.column.id, ]
  }
  foreign_key "manager_fk" {
    columns     = [table.users.column.manager_id, ]
    ref_columns = [table.users.column.id, ]
    on_update   = "NO ACTION"
    on_delete   = "CASCADE"
  }
  index "idx_name" {
    unique  = true
    columns = [table.users.column.name, ]
  }
  index "manager_fk" {
    unique  = false
    columns = [table.users.column.manager_id, ]
  }
}
schema "example" {
}

Let’s see what happens when we try to reapply the same change:

atlas schema apply -d "mysql://root:pass@tcp(localhost:3306)/example" -f atlas.hcl
Schema is synced, no changes to be made

In this example we have shown how we can inspect a MySQL database schema and apply a change.

What’s Next?

The Atlas DDL opens up a world of tools and services, and with the help of our community, we are planning to push the development ecosystem forward. A list of tools that are on our road map includes:

  • Integrations with Terraform, GitHub actions and Kubernetes.
  • Extended migration logics such as renaming columns, adding or dropping nullability and altering enums.
  • Toolsets for examining the migration history and reproducing it.

We hope that you find Atlas CLI as exciting as we do, and we invite you to contribute your ideas and code.

Two GitHub features that make bug reporting and triaging easier than ever

At Ariga, our software solution is heavily built on top of ent. Ent is a simple, yet powerful entity framework (ORM) for Go, that makes it easy to build and maintain applications with large data-models. Ent makes it possible to define any data model or graph-structure in Go code easily. Ent generates an idiomatic and statically-typed API for working with databases that keeps Go developers productive and happy. As we rely on Ent for building our Operational Data Graph platform, we are deeply invested in the project and are committed to making it and its community successful.

In this blog post, we would like to share with you our success story using two GitHub tools, template repositories and Codespaces, that help us maintain Ent by improving contributor productivity and the overall velocity of the project.

Our Story

As a project maintainer, it is in your best interest to ensure that your users are happy and receiving the help they need. By addressing bug reports and solving them, you are able to improve the quality of the project as well as assist users. The optimal way to receive a bug report is in the form of a reproducible example. Generally, this requires receiving code that is runnable in order to see a live example of the problem.

On GitHub, it is common practice to ask the user to open an issue and provide a temporary repository with the code to reproduce the bug. In Ent, there is a special bug reporting issue template that guides the user on how to properly describe the problem. Using the issue template makes the project maintainer’s life easy by ensuring that each issue includes all the necessary information and is written in an organized manner.

Once the project maintainer gets a hold of the repository, they can locally clone it and start debugging, rather than create a new project from scratch to try and reproduce the issue. However, this method can be risky. Although most users are friendly and don’t have bad intentions - you can never trust too easily. You don’t know what’s in the code you are about to execute and what harm it may do. It is always possible to check the code, or to run it in an isolated environment, yet it takes time and effort to go to these lengths to ensure your safety.

In August 2021, GitHub released a new feature, Codespaces. GitHub Codespaces is a tool that offers cloud-powered shared development environments right from within a web browser, giving access to an entire Visual Studio Code experience. Directly from any repository you are able to run your code without any installations, as if you were fully configured and running on your own local machine.

Soon after Codespaces was released, Ariel (Ariga’s co-founder and Ent’s creator) decided to give it a try. He opened a repository provided by a user, intending to debug an issue, straight in Codespaces (you can view the issue here). With a click of a button, without any prior setup needed, he was in a completely ready-to-go environment and was able to start debugging right away! Within a minute, the bug was triaged and Ariel was able to get back to the user with the answer. This is any project maintainer’s dream! You want to be able to address as many issues as possible, diligently, and in a timely manner, without having to risk your own safety in the process.

issue image

Taking this idea a step further, I thought about using template repositories to refine bug reporting even more. Template repositories is a feature designed to “make boilerplate code management and distribution a first-class citizen on GitHub”. These repositories are used as templates across multiple projects, for simple code reuse and enable developers to easily create new repositories with the same directory structure, branches and files. Generally, template repositories are used as a way to kickstart projects quickly, however the Ent project came up with another use for them.

Ent launched the new ent/bug template repository. This template has Ent pre-installed, a basic user schema, and test connections to different databases. Now users can easily create a new repository from the bug template and add in their code. This way, users can share the necessary code for reproducing the bug, without having to spend much time on setting up a new project.

By utilizing the strengths of both Codespaces and template repositories, we have made both bug reporting and triaging much easier tasks. Using both of these tools is a win-win situation for project maintainers and the users alike. Project maintainers can now significantly cut down the time required for setting up environments to triage bugs, making the entire process much smoother, faster and safer. Similarly, users now have the ability to save time and effort and are able to consistently provide reproducible bug reports.

We highly encourage you to try using these tools in your own project and see what life changers they are!

Feel free to find me on the the #ent channel on the Gophers slack workspace if you want to discuss this topic or ask any further questions, I’d love to hear from you!