Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Pull Request

on:
pull_request:
branches:
- master

jobs:
check_migrations:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Changed files
id: changed_files
uses: tj-actions/changed-files@ed68ef82c095e0d48ec87eccea555d944a631a4c
with:
since_last_remote_commit: true
files: '**/migrations/*.py'

- name: Link to migration docs
if: steps.changed_files.outputs.any_changed == 'true'
uses: mozilla/addons/.github/actions/pr-comment@main
env:
docs_base_url: https://mozilla.github.io/addons-server
database_path: /topics/development/database.html
with:
repo: ${{ github.repository }}
github_token: ${{ secrets.GITHUB_TOKEN }}
pr: ${{ github.event.pull_request.number }}
edit_last: true
body: |
## ⚠️ Data migration detected ⚠️

It looks like there are data migrations in your PR.
Please **read our policies** on database migrations before merging.

[Migration Docs](${{ env.docs_base_url }}${{ env.database_path }})
133 changes: 133 additions & 0 deletions docs/topics/development/database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Database

Information about the database and how to work with it.

## Migrations

MySQL does not support transactional DDL (Data Definition Language statements like `ALTER TABLE`, `CREATE TABLE`, etc.).

Therefore, Django migrations run against MySQL are not truly atomic, even if `atomic = True` is set on the migration class.

Here's what that means:

- Implicit Commits: MySQL automatically commits the transaction before and after most DDL statements.
- Partial Application: If a migration involves multiple operations (e.g., adding a field, then creating an index) and fails during the second operation, the first operation (adding the field) will not be rolled back automatically. The database schema will be left in an intermediate state.
- `atomic = True` Limitation: While Django attempts to wrap the migration in a transaction, the underlying database behavior with DDL prevents full atomicity for schema changes in MySQL. DML operations (like `RunPython` or data migrations) within that transaction might be rolled back, but the DDL changes won't be.

### Writing migrations

Keep these tips in mind when writing migrations:

1. Always create a migration via `./manage.py makemigrations`

This ensures that the migration is created with the correct dependencies, sequential naming etc. (to create an empty migration that executes arbitrary code, use `./manage.py makemigrations --empty <name>`).

2. If you need to execute arbitrary code, use `RunPython` or `RunSQL` operations.

These operations are not wrapped in a transaction, so you need to be careful with how you write them and be mindful that the state of the database might not be consistent every time the migration is run. Validate assumptions and be careful.

If your migration requires `RunPython`, make that the only operation in the migration. Split database table modification to a separate migration to ensure a partial application due to failure does not result in an invalid database state.

3. Large data migrations should be run via `tasks`

This ensures they run on an environment that supports long running work. In production (kubernetes) pods are disposable, so you should not assume you can run a long migration in an arbitrary pod.

### Standard migrations

Some standard migrations for common changes are covered with custom classes.

1) Modifying waffle switches

Create/Delete/Rename a waffle switch can be done with a dedicated migration class.
The class can be generated with a custom management command:

```bash
./manage.py migrate_waffle test_switch --action rename --new_name test_switch_2
```

```python
from django.db import migrations
import olympia.core.db.migrations


class Migration(migrations.Migration):

dependencies = [
]

operations = [
olympia.core.db.migrations.CreateWaffleSwitch(
name='test_switch',
),
]
```

2) Back/Forward filling data (RunPython)

Migrations that modify significant amounts of data should be run via tasks.
You can execute the task via a migration using the custom migration class.

The task should accept no arguments and will not be retried if it fails.
Like all migrations, it should be idempotent and code deployed after the migration
has run should not rely on the task having been completted successfully. Only
rely on the task being queued.

```python
from django.db import migrations
import olympia.core.db.migrations

class Migration(migrations.Migration):

dependencies = [
]

operations = [
olympia.core.db.migrations.MigrationTask(
'olympia.accounts.tasks.backfill_user_data',
)
]
```

### Testing migrations

You can test migrations locally. This should not be considered a safe verification that a migration will work in production because the data in your local database will differ significantly from production. But this will ensure your migration runs against the same schema and with some seeded data.

- Find the current version of AMO on production.

```bash
curl -s "https://addons.mozilla.org/__version__"
```

- Checkout the tag

```bash
git checkout <tag>
```

- Make up initialized database

```bash
make up INIT_CLEAN=True
```

- Checkout your branch and run the migrations

```bash
git checkout <branch>
make up
```

### Deploying migrations

Migrations are deployed to production automatically via the deployment pipeline.
Importantly, migrations are deployed before new code is deployed. That menas:

- If a migration fails, it will cancel the deployment.
- Migrations must be backwards compatible with the previous version of the code. see [testing migrations](#testing-migrations)

Migrations run on dedicated pods in kubernetes against the primary database. That means:

- Changes to the database schema will not be immediately reflected across all replicas immediately after the migration is deployed.
- Long running migrations could be interupted by kubernetes and should be avoided. See [writing migrations](#writing-migrations)

If you have an important data migration, consider shipping it in a dedicated release to ensure the database is migrated before required code changes are deployed. See [cherry-picking](https://mozilla.github.io/addons/server/push_duty/cherry-picking.html) for details.
1 change: 1 addition & 0 deletions docs/topics/development/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ waffle
remote_settings
../../../README.rst
authentication
database
```
20 changes: 20 additions & 0 deletions src/olympia/accounts/migrations/0002_migration_task.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Generated by Django 4.2.20 on 2025-04-17 18:58

from django.db import migrations
import olympia.core.db.migrations


def migration_task(apps, schema_editor):
pass


class Migration(migrations.Migration):

dependencies = [
('accounts', '0001_remove_old_2fa_waffle_switch'),
]

operations = [
olympia.core.db.migrations.MigrationTask(
),
]
12 changes: 12 additions & 0 deletions src/olympia/amo/management/commands/run_task.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from django.core.management.base import BaseCommand

from olympia.amo.celery import app

class Command(BaseCommand):
help = 'Queue a task to run'

def add_arguments(self, parser):
parser.add_argument('task_name', type=str, help='Name of the task to run')

def handle(self, *args, **options):
app.send_task(options['task_name'])
61 changes: 55 additions & 6 deletions src/olympia/core/db/migrations.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,35 +27,84 @@ def inner(apps, schema_editor):

class DeleteWaffleSwitch(migrations.RunPython):
def __init__(self, name, **kwargs):
self.name = name
super().__init__(
delete_waffle_switch(name),
reverse_code=create_waffle_switch(name),
delete_waffle_switch(self.name),
reverse_code=create_waffle_switch(self.name),
**kwargs,
)

def deconstruct(self):
return (
self.__class__.__name__,
(self.name,),
{},
)

def describe(self):
return 'Delete Waffle Switch (Python operation)'


class CreateWaffleSwitch(migrations.RunPython):
def __init__(self, name, **kwargs):
self.name = name
super().__init__(
create_waffle_switch(name),
reverse_code=delete_waffle_switch(name),
create_waffle_switch(self.name),
reverse_code=delete_waffle_switch(self.name),
**kwargs,
)

def deconstruct(self):
return (
self.__class__.__name__,
(self.name,),
{},
)

def describe(self):
return 'Create Waffle Switch (Python operation)'


class RenameWaffleSwitch(migrations.RunPython):
def __init__(self, old_name, new_name, **kwargs):
self.old_name = old_name
self.new_name = new_name
super().__init__(
rename_waffle_switch(old_name, new_name),
reverse_code=rename_waffle_switch(new_name, old_name),
rename_waffle_switch(self.old_name, self.new_name),
reverse_code=rename_waffle_switch(self.new_name, self.old_name),
**kwargs,
)

def deconstruct(self):
return (
self.__class__.__name__,
(self.old_name, self.new_name),
{},
)

def describe(self):
return 'Rename Waffle Switch, safely (Python operation)'



class MigrationTask(migrations.RunPython):
def __init__(self, **kwargs):
self.func_name = 'migration_task'
super().__init__(self.run_task, **kwargs)

def run_task(self, apps, schema_editor):
import importlib
from olympia.core.tasks import migration_task

module = importlib.import_module(self.__module__)
try:
breakpoint()
func = getattr(module, self.func_name)
migration_task.apply(func)
except AttributeError:
raise ValueError(f'Function {self.func_name} not found in module {self.__module__}. Create this function in the module and re-run the migration.')

def deconstruct(self):
return (self.__class__.__name__, (), {})


88 changes: 88 additions & 0 deletions src/olympia/core/management/commands/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
from functools import cached_property

from django.core.management.base import BaseCommand
from django.db import connections
from django.db.migrations.graph import MigrationGraph
from django.db.migrations.loader import MigrationLoader
from django.db.migrations.migration import Migration
from django.db.migrations.operations.base import Operation
from django.db.migrations.writer import MigrationWriter

import olympia.core.logger


class BaseMigrationCommand(BaseCommand):
log = olympia.core.logger.getLogger('z.core')

def add_arguments(self, parser):
parser.add_argument(
'app_label',
type=str,
help='The app label of the migration',
)
parser.add_argument(
'--dry-run',
action='store_true',
help='Dry run the migration',
)
self.extend_arguments(parser)

@cached_property
def graph(self) -> MigrationGraph:
return MigrationLoader(connections['default'], ignore_no_migrations=True).graph

def migration(self, name: str, app_label: str) -> Migration:
return Migration(name, app_label)

def writer(self, migration: Migration) -> MigrationWriter:
return MigrationWriter(migration)

def print(self, filename: str, output: str) -> None:
self.stdout.write(f'{filename}: \n')
self.stdout.write(output)

def get_name(self, *args, **options) -> str:
"""
Return the name of the migration excluding the migration number.
"""
raise NotImplementedError

def get_operation(self, *args, **options) -> Operation:
"""
Return the operation to be performed in the migration.
"""
raise NotImplementedError

def extend_arguments(self, parser) -> None:
"""
Extend the arguments of the command.
"""
raise NotImplementedError

def handle(self, *args, **options):
app_label = options.get('app_label')
dry_run = options.get('dry_run', False)
name = self.get_name(*args, **options)

leaf_nodes = self.graph.leaf_nodes(app_label)
migration_number = 1

if leaf_nodes and (node := leaf_nodes[-1]):
migration_number = int(node[1].split('_', 1)[0]) + 1

migration_name = f'{migration_number:04d}_{name}'

migration = self.migration(migration_name, app_label)
migration.dependencies = leaf_nodes
migration.operations = [self.get_operation(*args, **options)]

writer = self.writer(migration)
filename = writer.path
output = writer.as_string()

if dry_run:
self.print(filename, output)
return output

with open(filename, 'w') as f:
f.write(output)
Loading
Loading