Format usb and confirm all zeros

I'll throw my hat into the ring here as well. One alternative that I love to use is scrub. It is in the repositories, so to install it from a terminal window type in:

sudo apt-get install scrub

scrub supports many different types of scrubbing patterns

Available patterns are:
  nnsa          3-pass   NNSA NAP-14.1-C
  dod           3-pass   DoD 5220.22-M
  bsi           9-pass   BSI
  usarmy        3-pass   US Army AR380-19
  random        1-pass   One Random Pass
  random2       2-pass   Two Random Passes
  schneier      7-pass   Bruce Schneier Algorithm
  pfitzner7     7-pass   Roy Pfitzner 7-random-pass method
  pfitzner33   33-pass   Roy Pfitzner 33-random-pass method
  gutmann      35-pass   Gutmann
  fastold       4-pass   pre v1.7 scrub (skip random)
  old           5-pass   pre v1.7 scrub
  dirent        6-pass   dirent
  fillzero      1-pass   Quick Fill with 0x00
  fillff        1-pass   Quick Fill with 0xff
  custom        1-pass   custom="string" 16b max, use escapes \xnn, \nnn, \\

To use scrub to fill the drive with all zeros first make sure the drive is not mounted. Then run the following line (-p means pattern to use):

sudo scrub -p fillzero /dev/sdX

then you should see something like this:

scrub: using Quick Fill with 0x00 patterns
scrub: please verify that device size below is correct!
scrub: scrubbing /dev/sdh 31260704768 bytes (~29GB)
scrub: 0x00    |.....                                           |

Some of the patterns used for scrubbing should have a verify pass to make sure the scrubbing passed.

If you would like, you can add the hexdump (as in Byte Commander's answer) or any of the other answers to the end for verification.

Hope this helps!


Apply dd, and tr for virtual inspection:

dd if=/dev/sdb | tr '\0' 0

Apply dd and grep for automatic checking:

dd if=/dev/sdb | grep -zq . && echo non zero

The above is significantly slower than the optimized command below:

grep -zq . /dev/sdb && echo non zero

grep -z reads in null-delimited lines. If all bytes are null, then each line is empty, so . should never match.

Of course, this won't be true for a formatted partition - the format system will be using some bytes and they will be non-null.


My suggest would be hexdump. It displays the content of any file or device in hexadecimal format as rows of 16 byte, but if two subsequential lines are equal, it omits them.

Here's an example output of the 512 MB file virtualdevice which is filled with zeroes only on the current directory of my HDD. The leftmost column is the offset of the line in hexadecimal notation, the 8 following columns are the actual data, grouped in two bytes (4 hexadecimal characters):

$ hexdump ./virtualdevice 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
20000000

Performance:

I made the effort and compared my solution to the others by real run time and CPU time for the described example file (512 MB, containing only binary zeroes, located on HDD).

I measured every solution with the time command two times with freshly cleared disk cache and two times with the file already being cached. The time names equal those of the time command, and the additional row CPU is just the sum of the USER+SYS times. It can exceed the REAL time because I'm running a dual-core machine.

For most people, the interesting figures are REAL (time from begin to end, as if measured with a stopwatch. This also contains IO wait and CPU time of other processes) and CPU (CPU time which is actually occupied by the command).

Summary:

The best performance has muru's optimized second version (grep -zq . DEVICE) which uses incredibly few CPU processing time.
Rank 2 share cmp /dev/zero DEVICE (kos' optimized solution) and my own solution hexdump DEVICE. There's nearly no difference between them.
To pipe the data from dd to cmp (dd if=/dev/zero | cmp - DEVICE - kos' unoptimized solution) is very inefficient, the piping seems to consume much processing time.
Using dd and grep shows the by far worst performance of the tested commands.

Conclusion:

Although the most critical part of operations like these is the IO access time, there are significant differences in the processing speed and efficiency of the tested approaches.

If you are very impatient, use the second version of muru's answer (grep -zq . DEVICE)!
But you can as well use either the second version of kos' answer (cmp /dev/zero DEVICE) or my own (hexdump device), as they have almost as good performance.
However, my approach has the advantage that you immediately see the file contents and can approximate how many bytes differ from zero and where they are located. If you have much alternating data though, the output will grow large and it will probably slow down.

What you should avoid in any case is to use dd and pipes. The performance of dd could probably be improved by setting a suitable buffer size, but why doing it the complicated way?

Please also note again that the test was done on a file on my disk instead of an actual device. Also the file contained only zeroes. Both affects the performances.

Here are the detailed results:

  • hexdump ./virtualdevice (my own solution):

            |    Uncached:      |    Cached:
     Time:  |  Run 1:   Run 2:  |  Run 1:   Run 2:
    --------+-------------------+------------------
       REAL |  7.689s   8.668s  |  1.868s   1.930s
       USER |  1.816s   1.720s  |  1.572s   1.696s
        SYS |  0.408s   0.504s  |  0.276s   0.220s
        CPU |  2.224s   2.224s  |  1.848s   1.916s
    
  • dd if=./virtualdevice | grep -zq . && echo non zero (muru's unoptimized solution):

            |    Uncached:      |    Cached:
     Time:  |  Run 1:   Run 2:  |  Run 1:   Run 2:
    --------+-------------------+------------------
       REAL |  9.434s  11.004s  |  8.802s   9.266s
       USER |  2.264s   2.364s  |  2.480s   2.528s
        SYS | 12.876s  12.972s  | 12.676s  13.300s
        CPU | 15.140s  15.336s  | 15.156s  15.828s
    
  • grep -zq . ./virtualdevice && echo non zero (muru's optimized solution):

            |    Uncached:      |    Cached:
     Time:  |  Run 1:   Run 2:  |  Run 1:   Run 2:
    --------+-------------------+------------------
       REAL |  8.763s   6.485s  |  0.770s   0.833s
       USER |  0.644s   0.612s  |  0.528s   0.544s
        SYS |  0.440s   0.476s  |  0.236s   0.264s
        CPU |  1.084s   1.088s  |  0.764s   0.808s
    
  • dd if=/dev/zero | cmp - ./virtualdevice (kos' solution unoptimized):

            |    Uncached:      |    Cached:
     Time:  |  Run 1:   Run 2:  |  Run 1:   Run 2:
    --------+-------------------+------------------
       REAL |  7.678s   6.539s  |  3.151s   3.147s
       USER |  2.348s   2.228s  |  2.164s   2.324s
        SYS |  3.672s   3.852s  |  3.792s   3.516s
        CPU |  6.020s   6.080s  |  5.956s   5.840s
    
  • cmp /dev/zero ./virtualdevice (kos' solution optimized):

            |    Uncached:      |    Cached:
     Time:  |  Run 1:   Run 2:  |  Run 1:   Run 2:
    --------+-------------------+------------------
       REAL |  6.340s   9.183s  |  1.660s   1.660s
       USER |  1.356s   1.384s  |  1.216s   1.288s
        SYS |  0.640s   0.596s  |  0.428s   0.360s
        CPU |  1.996s   1.980s  |  1.644s   1.648s
    

Commands used:

For all four tests I ran the following procedure twice to reduce inaccuracies, replacing <COMMAND> with the exact command from the headline of each table.

  • Let the kernel drop all disk caches:

    sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
    
  • First timed run (uncached), file gets loaded into the cache during this:

    time <COMMAND>
    
  • Second timed run (cached). This time most of the data is taken from the disk cache in RAM, therefore it's much faster than when accessing the disk directly:

    time <COMMAND>