What is the number of $n \times n$ binary matrices $A$ such that $\det(A) = \text{perm}(A)$?

To address questions (1) and (2), let's start with hardmath's comment. The set of matrices with zero permanent is a subset of the set we want to count, and it's easier to describe. Nonetheless, the number of such matrices is recorded in https://oeis.org/A088672 only up to $6\times6$, combining the efforts of three contributors. In the literature, we find this article:

  • C. J. Everett and P. R. Stein, The asymptotic number of (0,1)-matrices with zero permanent, Disc. Math. 6 (1973), 29–34.

The authors apply standard techniques such as Inclusion-Exclusion, but they can only get asymptotic bounds. All this suggests to me that there is no obvious, simple formula or algorithm. Maybe there exists a simple formula, but finding it will take some novel insight into these problems.

Now on to brute force. If we naively iterate through all $2^{n^2}$ matrices and check each one against all $n!/2$ odd permutations, then we can compute only a few terms of the sequence. Even if we perform 20 billion checks per second, computing $a(7)$ would take two years, and $a(8)$ would take 600,000 years.

Now, it's hard to erase that $2^{n^2}$ term. Still, there are several key speedups, each of which makes roughly another term feasible. They can be implemented in this order:

  1. Instead of iterating over possible values of the last row, compute how many spaces on the last row are "free", meaning that putting a $1$ there would not complete an odd permutation with the given values in the previous $n-1$ rows. If $k$ is the number of free spaces, then $2^k$ is the number of values of the last row that avoid all odd permutations, so we can immediately add that number to our running total. This increases our efficiency by a factor of $2^n/n$.
  2. Dynamic programming. As we iterate through the rows of the matrix, filling in values, reduce the set of odd permutations to check by grouping them according to their values in the remaining rows. By the time we enumerate the next-to-last row, we have only $n^2$ permutations to check. This increases our efficiency by a factor of $n!/n^2$.
  3. Column swaps. Modulo the sign of the permutations, we really only have to inspect representative matrices from orbits under the action of the symmetry group $(S_n\times S_n)\rtimes\mathbb Z/2$, which acts by permuting the rows and columns and transposing the matrix. In other words, we only need to check bipartite graphs up to graph isomorphism. Now, it's a little awkward to solve the graph isomorphism problem in the inner loop of the algorithm. A better tactic is to break up the orbits into more predictable chunks. First, we don't want to give up the big speedup that we got from projecting out the last row, so the symmetry group is reduced to $S_{n-1}\times S_n$. Let's take advantage of the bigger $S_n$ factor first, which swaps columns. We do this by only enumerating columns in increasing order, where the order is defined by taking the value of a column as a binary expansion, where the top row is the most significant bit. This can be computed row-by-row, and it dramatically reduces the number of matrices considered, by almost $n!$. I say "almost" for two reasons. One, some matrices will have large stabilizers under this action, equivalently small orbits. Second, we now have to keep track of even permutation matrices as well as odd permutation matrices, since some orbits will have to include odd permutations of the columns. And it takes a little effort to enforce the ordering and compute the stabilizers. But those problems aside, this increases our efficiency by $n!$.
  4. Row swaps. We can't get an similar $(n-1)!$ speedup by enforcing the same ordering among rows, because the previous column permutations don't preserve the binary value of a row. We can get close, though, by ordering rows according to their Hamming weight, which is preserved by column permutations. There are only $n+1$ possible weights, so this scheme results in unfortunately large stabilizers, and therefore unfortunately small orbits. But even if the typical stabilizer order is $2^{n-1}$, that's still a speedup of $(n-1)!/2^{n-1}$, which is totally worthwhile.
  5. Code optimization. A subset of the set of $2\times n$ partial permutations matrices can be represented as an $n^2$ bitfield. If $n\leq8$, this fits is a single $64$-bit machine register, and using $256$-bit SIMD instructions, we can process $4$ of those per cycle. Use a low-level language with support for generic programming, like C++, to eliminate dynamic allocation, minimize space of arrays, and optimize each row individually. Also, the problem is embarrassingly easy to parallelize to multiple CPU cores, or even multiple machines.

Improvements 1-4 are implemented in https://github.com/Culter/PermDet. They result in the following values:

$a(0)=1$
$a(1)=2$
$a(2)=12$
$a(3)=343$
$a(4)=34997$
$a(5)=12515441$
$a(6)=15749457081$
$a(7)=72424550598849$
$a(8)=1282759836215548737$

In fact, $a(9)$ is well in reach using this algorithm, but it would take greater-than-$64$-bit arithmetic to implement, which I haven't done.