Multiplying a huge number times random() (Python)

Problem: Generate large binary strings (length 2000+). Do it quickly, as this generateRandom() function will be called 300,000 times in the algorithm.

Attempted Solutions: Generate 3 or 4 binary numbers and append them all together 500 times. This is awfully slow.

Make a single call to random.random() and multiply it by a huge number. Convert to binary once and be done. This works for smaller numbers, but because the binary string must be a certain length, the number to convert to binary must be truly enormous (2 ** len(binString)).

Current Code (works for smaller numbers):

binaryRepresentation = ''

binaryRepresentation += bin(int(random.random() * (2 ** binLength)))[2:].zfill(binLength)

Error that I need help fixing: This call throws a "long int too large to convert to float" with the large numbers. Is there a way to make the overall algorithm more efficient or to make this large number convert-able to a float?

Thank you!

Measure whether it is fast enough for your purposes, "randomness" might diminish the more you call it: os.urandom(250). It produces a binary string aka bytes.

To avoid "long int too large to convert to float" error don't use floats.

If you need an integer with k random bits instead of a binary string:

import random
r = random.SystemRandom()

n = r.getrandbits(2000) # uses os.urandom() under the hood

To get a string of "0"s and "1"s:

k = 2000
binstr = "{:0{}b}".format(r.getrandbits(k), k)

Note: you can't use randint/randrange for large numbers if getrandbits is not used:

import random

class R(random.Random):
    def random(self): # override random to suppress getrandbits usage
        return random.random()

r = R()
r.randrange(2**2000) # -> OverflowError: long int too large to convert to float


b2a_bin() extension allows to create binary strings ("01") directly from bytestrings without creating an intermediate Python integer. It is 3-20 times faster than pure Python analogs:

def b2a_bin_bin(data):
    return bin(int.from_bytes(data, 'big', signed=False)
               )[2:].zfill(len(data)*8).encode('ascii', 'strict')

def b2a_bin_format(data):
    n = int.from_bytes(data, 'big', signed=False)
    return "{:0{}b}".format(n, len(data)*8).encode('ascii', 'strict')


>>> import os
>>> from b2a_bin import b2a_bin
>>> b2a_bin.b2a_bin(b'\x0a')
>>> b2a_bin(os.urandom(5))

To go from J.F. Sebastian's answer to a binary string (string with 0 and 1 characters in it):

>>> import random
>>> r = random.SystemRandom()
>>> bin(r.getrandbits(2000))[2:].zfill(2000)
>>> bin(r.getrandbits(2000))[2:].zfill(2000)
>>> bin(r.getrandbits(2000))[2:].zfill(2000)

With this benchmark:

import random
import time

def run(n):
    r = random.SystemRandom()
    for i in xrange(n):
        if i%30000 == 0: print i

s = time.time()
e = time.time()
print "Took %.2fs" % (e-s,)

The result was Took 12.32s

Just getting the random bits without any string conversion (only r.getrandbits(2000)) took 7.77s, so if you could find a way to use the random bits as a long then you'd save yourself some time.

Re-running the benchmark using os.urandom(250) instead (without additional processing) took only 3.59s, so that seems to be the fastest option.

Is random.randrange really too slow? Let's see how slow it really is.

import random

word_size = 2048
word_max = 2 ** word_size

def random_bits(n):
    Return a string consisting of `n` zeroes and ones (chosen randomly).
    def words():
        s, m, r = word_size, word_max, n % word_size
        for _ in range(n // s):
            yield bin(random.randrange(m))[2:].zfill(s)
        yield bin(random.randrange(2 ** r))[2:].zfill(r)
    return ''.join(words())

>>> from timeit import Timer
>>> Timer(lambda:random_bits(2000)).timeit(number=300000)

10 seconds doesn't seem an absurd amount of time for choosing 600 million random bits. So perhaps you can say more about your speed requirement. Is this really too slow?