Solution 1:

For now, I found this command line solution, runinng aws glue batch-delete-partition iteratively for batches of 25 partitions using xargs

(here I am assuming there are max 1000 partitions):

aws glue get-partitions --database-name=<my-database> --table-name=<my-table> | jq -cr '[ { Values: .Partitions[].Values } ]' > partitions.json

seq 0 25 1000 | xargs -I _ bash -c "cat partitions.json | jq -c '.[_:_+25]'" | while read X; do aws glue batch-delete-partition --database-name=<my-database> --table-name=<my-table > --partitions-to-delete=$X; done

Hope it helps someone, but I'd prefer a more elegant solution

Solution 2:

Using python3 with boto3 looks a little bit nicer. Albeit not by much :)

Unfortunately AWS doesn't provide a way to delete all partitions without batching 25 requests at a time. Note that this will only work for deleting the first page of partitions retrieved.

import boto3

glue_client = boto3.client("glue", "us-west-2")

def get_and_delete_partitions(database, table, batch=25):
    partitions = glue_client.get_partitions(
        DatabaseName=database,
        TableName=table)["Partitions"]

    for i in range(0, len(partitions), batch):
        to_delete = [{k:v[k]} for k,v in zip(["Values"]*batch, partitions[i:i+batch])]
        glue_client.batch_delete_partition(
            DatabaseName=database,
            TableName=table,
            PartitionsToDelete=to_delete)

EDIT: To delete all partitions (beyond just the first page) using paginators makes it look cleaner.

import boto3

glue_client = boto3.client("glue", "us-west-2")

def delete_partitions(database, table, partitions, batch=25):
    for i in range(0, len(partitions), batch):
      to_delete = [{k:v[k]} for k,v in zip(["Values"]*batch, partitions[i:i+batch])]
      glue_client.batch_delete_partition(
        DatabaseName=database,
        TableName=table,
        PartitionsToDelete=to_delete)

def get_and_delete_partitions(database, table):
    paginator = glue_client.get_paginator('get_partitions')
    itr = paginator.paginate(DatabaseName=database, TableName=table)
    
    for page in itr:
      delete_partitions(database, table, page["Partitions"])