Solution 1:

For future Googlers, here is an algorithm that I ported from Quasimondo. It's kind of a mix between a box blur and a gaussian blur, it's very pretty and quite fast too.

Update for people encountering the ArrayIndexOutOfBoundsException problem : @anthonycr in the comments provides this information :

I found that by replacing Math.abs with StrictMath.abs or some other abs implementation, the crash does not occur.

/**
 * Stack Blur v1.0 from
 * http://www.quasimondo.com/StackBlurForCanvas/StackBlurDemo.html
 * Java Author: Mario Klingemann <mario at quasimondo.com>
 * http://incubator.quasimondo.com
 *
 * created Feburary 29, 2004
 * Android port : Yahel Bouaziz <yahel at kayenko.com>
 * http://www.kayenko.com
 * ported april 5th, 2012
 *
 * This is a compromise between Gaussian Blur and Box blur
 * It creates much better looking blurs than Box Blur, but is
 * 7x faster than my Gaussian Blur implementation.
 *
 * I called it Stack Blur because this describes best how this
 * filter works internally: it creates a kind of moving stack
 * of colors whilst scanning through the image. Thereby it
 * just has to add one new block of color to the right side
 * of the stack and remove the leftmost color. The remaining
 * colors on the topmost layer of the stack are either added on
 * or reduced by one, depending on if they are on the right or
 * on the left side of the stack.
 *  
 * If you are using this algorithm in your code please add
 * the following line:
 * Stack Blur Algorithm by Mario Klingemann <[email protected]>
 */

public Bitmap fastblur(Bitmap sentBitmap, float scale, int radius) {

    int width = Math.round(sentBitmap.getWidth() * scale);
    int height = Math.round(sentBitmap.getHeight() * scale);
    sentBitmap = Bitmap.createScaledBitmap(sentBitmap, width, height, false);

    Bitmap bitmap = sentBitmap.copy(sentBitmap.getConfig(), true);

    if (radius < 1) {
        return (null);
    }

    int w = bitmap.getWidth();
    int h = bitmap.getHeight();

    int[] pix = new int[w * h];
    Log.e("pix", w + " " + h + " " + pix.length);
    bitmap.getPixels(pix, 0, w, 0, 0, w, h);

    int wm = w - 1;
    int hm = h - 1;
    int wh = w * h;
    int div = radius + radius + 1;

    int r[] = new int[wh];
    int g[] = new int[wh];
    int b[] = new int[wh];
    int rsum, gsum, bsum, x, y, i, p, yp, yi, yw;
    int vmin[] = new int[Math.max(w, h)];

    int divsum = (div + 1) >> 1;
    divsum *= divsum;
    int dv[] = new int[256 * divsum];
    for (i = 0; i < 256 * divsum; i++) {
        dv[i] = (i / divsum);
    }

    yw = yi = 0;

    int[][] stack = new int[div][3];
    int stackpointer;
    int stackstart;
    int[] sir;
    int rbs;
    int r1 = radius + 1;
    int routsum, goutsum, boutsum;
    int rinsum, ginsum, binsum;

    for (y = 0; y < h; y++) {
        rinsum = ginsum = binsum = routsum = goutsum = boutsum = rsum = gsum = bsum = 0;
        for (i = -radius; i <= radius; i++) {
            p = pix[yi + Math.min(wm, Math.max(i, 0))];
            sir = stack[i + radius];
            sir[0] = (p & 0xff0000) >> 16;
            sir[1] = (p & 0x00ff00) >> 8;
            sir[2] = (p & 0x0000ff);
            rbs = r1 - Math.abs(i);
            rsum += sir[0] * rbs;
            gsum += sir[1] * rbs;
            bsum += sir[2] * rbs;
            if (i > 0) {
                rinsum += sir[0];
                ginsum += sir[1];
                binsum += sir[2];
            } else {
                routsum += sir[0];
                goutsum += sir[1];
                boutsum += sir[2];
            }
        }
        stackpointer = radius;

        for (x = 0; x < w; x++) {

            r[yi] = dv[rsum];
            g[yi] = dv[gsum];
            b[yi] = dv[bsum];

            rsum -= routsum;
            gsum -= goutsum;
            bsum -= boutsum;

            stackstart = stackpointer - radius + div;
            sir = stack[stackstart % div];

            routsum -= sir[0];
            goutsum -= sir[1];
            boutsum -= sir[2];

            if (y == 0) {
                vmin[x] = Math.min(x + radius + 1, wm);
            }
            p = pix[yw + vmin[x]];

            sir[0] = (p & 0xff0000) >> 16;
            sir[1] = (p & 0x00ff00) >> 8;
            sir[2] = (p & 0x0000ff);

            rinsum += sir[0];
            ginsum += sir[1];
            binsum += sir[2];

            rsum += rinsum;
            gsum += ginsum;
            bsum += binsum;

            stackpointer = (stackpointer + 1) % div;
            sir = stack[(stackpointer) % div];

            routsum += sir[0];
            goutsum += sir[1];
            boutsum += sir[2];

            rinsum -= sir[0];
            ginsum -= sir[1];
            binsum -= sir[2];

            yi++;
        }
        yw += w;
    }
    for (x = 0; x < w; x++) {
        rinsum = ginsum = binsum = routsum = goutsum = boutsum = rsum = gsum = bsum = 0;
        yp = -radius * w;
        for (i = -radius; i <= radius; i++) {
            yi = Math.max(0, yp) + x;

            sir = stack[i + radius];

            sir[0] = r[yi];
            sir[1] = g[yi];
            sir[2] = b[yi];

            rbs = r1 - Math.abs(i);

            rsum += r[yi] * rbs;
            gsum += g[yi] * rbs;
            bsum += b[yi] * rbs;

            if (i > 0) {
                rinsum += sir[0];
                ginsum += sir[1];
                binsum += sir[2];
            } else {
                routsum += sir[0];
                goutsum += sir[1];
                boutsum += sir[2];
            }

            if (i < hm) {
                yp += w;
            }
        }
        yi = x;
        stackpointer = radius;
        for (y = 0; y < h; y++) {
            // Preserve alpha channel: ( 0xff000000 & pix[yi] )
            pix[yi] = ( 0xff000000 & pix[yi] ) | ( dv[rsum] << 16 ) | ( dv[gsum] << 8 ) | dv[bsum];

            rsum -= routsum;
            gsum -= goutsum;
            bsum -= boutsum;

            stackstart = stackpointer - radius + div;
            sir = stack[stackstart % div];

            routsum -= sir[0];
            goutsum -= sir[1];
            boutsum -= sir[2];

            if (x == 0) {
                vmin[y] = Math.min(y + r1, hm) * w;
            }
            p = x + vmin[y];

            sir[0] = r[p];
            sir[1] = g[p];
            sir[2] = b[p];

            rinsum += sir[0];
            ginsum += sir[1];
            binsum += sir[2];

            rsum += rinsum;
            gsum += ginsum;
            bsum += binsum;

            stackpointer = (stackpointer + 1) % div;
            sir = stack[stackpointer];

            routsum += sir[0];
            goutsum += sir[1];
            boutsum += sir[2];

            rinsum -= sir[0];
            ginsum -= sir[1];
            binsum -= sir[2];

            yi += w;
        }
    }

    Log.e("pix", w + " " + h + " " + pix.length);
    bitmap.setPixels(pix, 0, w, 0, 0, w, h);

    return (bitmap);
}

Solution 2:

Android Blur Guide 2016

with Showcase/Benchmark App and Source on Github. Also check out the Blur framework I'm currently working on: Dali.

After experimenting a lot I can now safely give you some solid recommendations that will make your life easier in Android when using the Android Framework.

Load and Use a downscaled Bitmap (for very blurry images)

Never use a the full size of a Bitmap. The bigger the image the more needs to be blurred and also the higher the blur radius needs to be and usually, the higher the blur radius the longer the algorithm takes.

final BitmapFactory.Options options = new BitmapFactory.Options();
options.inSampleSize = 8;
Bitmap blurTemplate = BitmapFactory.decodeResource(getResources(), R.drawable.myImage, options);

This will load the bitmap with inSampleSize 8, so only 1/64 of the original image. Test what inSampleSize suits your needs, but keep it 2^n (2,4,8,...) to avoid degrading quality due to scaling. See Google doc for more

Another really big advantage is that bitmap loading will be really fast. In my early blur testing I figured that the longest time during the whole blur process was the image loading. So to load a 1920x1080 image from disk my Nexus 5 needed 500ms while the blurring only took another 250 ms or so.

Use Renderscript

Renderscript provides ScriptIntrinsicBlur which is a Gaussian blur filter. It has good visual quality and is just the fastest you realistically get on Android. Google claims to be "typically 2-3x faster than a multithreaded C implementation and often 10x+ faster than a Java implementation". Renderscript is really sophisticated (using the fastest processing device (GPU, ISP, etc.), etc.) and there is also the v8 support library for it making it compatible down to 2.2. Well at least in theory, through my own tests and reports from other devs it seems that it is not possible to use Renderscript blindly, since the hardware/driver fragmentation seems to cause problems with some devices, even with higher sdk lvl (e.g. I had troubles with the 4.1 Nexus S) so be careful and test on a lot of devices. Here's a simple example that will get you started:

//define this only once if blurring multiple times
RenderScript rs = RenderScript.create(context);

(...)
//this will blur the bitmapOriginal with a radius of 8 and save it in bitmapOriginal
final Allocation input = Allocation.createFromBitmap(rs, bitmapOriginal); //use this constructor for best performance, because it uses USAGE_SHARED mode which reuses memory
final Allocation output = Allocation.createTyped(rs, input.getType());
final ScriptIntrinsicBlur script = ScriptIntrinsicBlur.create(rs, Element.U8_4(rs));
script.setRadius(8f);
script.setInput(input);
script.forEach(output);
output.copyTo(bitmapOriginal);

When using the v8 support with Gradle, which is specifically recommended by Google "because they include the latest improvements", you only need to add 2 lines to your build script and use android.support.v8.renderscript with current build tools (updated syntax for android Gradle plugin v14+)

android {
    ...
    defaultConfig {
        ...
        renderscriptTargetApi 19
        renderscriptSupportModeEnabled true
    }
}

Simple benchmark on a Nexus 5 - comparing RenderScript with different other java and Renderscript implementations:

The average runtime per blur on different pic sizes The average runtime per blur on different pic sizes

Megapixels per sec that can be blurred Megapixels per sec that can be blurred

Each value is the avg of 250 rounds. RS_GAUSS_FAST is ScriptIntrinsicBlur (and nearly always the fastest), others that start with RS_ are mostly convolve implementations with simple kernels. The details of the algorithms can be found here. This is not purely blurring, since a good portion is garbage collection that is measured. This can be seen in this here (ScriptIntrinsicBlur on a 100x100 image with about 500 rounds)

enter image description here

The spikes are gc.

You can check for yourself, the benchmark app is in the playstore: BlurBenchmark

Reuses Bitmap wherever possible (if prio: performance > memory footprint)

If you need multiple blurs for a live blur or similar and your memory allows it do not load the bitmap from drawables multiple times, but keep it "cached" in a member variable. In this case always try to use the same variables, to keep garbage collecting to a minimum.

Also check out the new inBitmap option when loading from a file or drawable which will reuse the bitmap memory and save garbage collection time.

For blending from sharp to blurry

The simple and naive method is just to use 2 ImageViews, one blurred, and alpha fade them. But if you want a more sophisticated look that smoothly fades from sharp to blurry, then check out Roman Nurik's post about how to do it like in his Muzei app.

Basically he explains that he pre-blurs some frames with different blur extents and uses them as keyframes in an animation that looks really smooth.

Diagram where Nurik exaplains his approach

Solution 3:

This is a shot in the dark, but you might try shrinking the image and then enlarging it again. This can be done with Bitmap.createScaledBitmap(Bitmap src, int dstWidth, int dstHeight, boolean filter). Make sure and set the filter parameter to true. It'll run in native code so it might be faster.