Delete all data for a kind in Google App Engine

I would like to wipe out all data for a specific kind in Google App Engine. What is the best way to do this? I wrote a delete script (hack), but since there is so much data is timeout's out after a few hundred records.


Solution 1:

I am currently deleting the entities by their key, and it seems to be faster.

from google.appengine.ext import db

class bulkdelete(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        try:
            while True:
                q = db.GqlQuery("SELECT __key__ FROM MyModel")
                assert q.count()
                db.delete(q.fetch(200))
                time.sleep(0.5)
        except Exception, e:
            self.response.out.write(repr(e)+'\n')
            pass

from the terminal, I run curl -N http://...

Solution 2:

You can now use the Datastore Admin for that: https://developers.google.com/appengine/docs/adminconsole/datastoreadmin#Deleting_Entities_in_Bulk

Solution 3:

If I were a paranoid person, I would say Google App Engine (GAE) has not made it easy for us to remove data if we want to. I am going to skip discussion on index sizes and how they translate a 6 GB of data to 35 GB of storage (being billed for). That's another story, but they do have ways to work around that - limit number of properties to create index on (automatically generated indexes) et cetera.

The reason I decided to write this post is that I need to "nuke" all my Kinds in a sandbox. I read about it and finally came up with this code:

package com.intillium.formshnuker;

import java.io.IOException;
import java.util.ArrayList;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.google.appengine.api.datastore.Key;
import com.google.appengine.api.datastore.Query;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.FetchOptions;
import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;

import com.google.appengine.api.labs.taskqueue.QueueFactory;
import com.google.appengine.api.labs.taskqueue.TaskOptions.Method;

import static com.google.appengine.api.labs.taskqueue.TaskOptions.Builder.url;

@SuppressWarnings("serial")
public class FormsnukerServlet extends HttpServlet {

 public void doGet(final HttpServletRequest request, final HttpServletResponse response) throws IOException {

  response.setContentType("text/plain");

  final String kind = request.getParameter("kind");
  final String passcode = request.getParameter("passcode");

  if (kind == null) {
   throw new NullPointerException();
  }

  if (passcode == null) {
   throw new NullPointerException();
  }

  if (!passcode.equals("LONGSECRETCODE")) {
   response.getWriter().println("BAD PASSCODE!");
   return;
  }

  System.err.println("*** deleting entities form " + kind);

  final long start = System.currentTimeMillis();

  int deleted_count = 0;
  boolean is_finished = false;

  final DatastoreService dss = DatastoreServiceFactory.getDatastoreService();

  while (System.currentTimeMillis() - start < 16384) {

   final Query query = new Query(kind);

   query.setKeysOnly();

   final ArrayList<Key> keys = new ArrayList<Key>();

   for (final Entity entity: dss.prepare(query).asIterable(FetchOptions.Builder.withLimit(128))) {
    keys.add(entity.getKey());
   }

   keys.trimToSize();

   if (keys.size() == 0) {
    is_finished = true;
    break;
   }

   while (System.currentTimeMillis() - start < 16384) {

    try {

     dss.delete(keys);

     deleted_count += keys.size();

     break;

    } catch (Throwable ignore) {

     continue;

    }

   }

  }

  System.err.println("*** deleted " + deleted_count + " entities form " + kind);

  if (is_finished) {

   System.err.println("*** deletion job for " + kind + " is completed.");

  } else {

   final int taskcount;

   final String tcs = request.getParameter("taskcount");

   if (tcs == null) {
    taskcount = 0;
   } else {
    taskcount = Integer.parseInt(tcs) + 1;
   }

   QueueFactory.getDefaultQueue().add(
    url("/formsnuker?kind=" + kind + "&passcode=LONGSECRETCODE&taskcount=" + taskcount).method(Method.GET));

   System.err.println("*** deletion task # " + taskcount + " for " + kind + " is queued.");

  }

  response.getWriter().println("OK");

 }

}

I have over 6 million records. That's a lot. I have no idea what the cost will be to delete the records (maybe more economical not to delete them). Another alternative would be to request a deletion for the entire application (sandbox). But that's not realistic in most cases.

I decided to go with smaller groups of records (in easy query). I know I could go for 500 entities, but then I started receiving very high rates of failure (re delete function).

My request from GAE team: please add a feature to delete all entities of a kind in a single transaction.

Solution 4:

Try using App Engine Console then you dont even have to deploy any special code

Solution 5:

Presumably your hack was something like this:

# Deleting all messages older than "earliest_date"
q = db.GqlQuery("SELECT * FROM Message WHERE create_date < :1", earliest_date)
results = q.fetch(1000)

while results:
    db.delete(results)
    results = q.fetch(1000, len(results))

As you say, if there's sufficient data, you're going to hit the request timeout before it gets through all the records. You'd have to re-invoke this request multiple times from outside to ensure all the data was erased; easy enough to do, but hardly ideal.

The admin console doesn't seem to offer any help, as (from my own experience with it), it seems to only allow entities of a given type to be listed and then deleted on a page-by-page basis.

When testing, I've had to purge my database on startup to get rid of existing data.

I would infer from this that Google operates on the principle that disk is cheap, and so data is typically orphaned (indexes to redundant data replaced), rather than deleted. Given there's a fixed amount of data available to each app at the moment (0.5 GB), that's not much help for non-Google App Engine users.