Second part of the factorial sum divisibility question

Which primes $p$ divide the sum of factorials $1! + 2! + 3! + 4! + 5! + \cdots + (p-1)!$? This is related to my previous question.


Solution 1:

Note that heuristically, if these numbers were equidistributed mod $p$ we would expect each one to be zero (i.e., expect $p$ to divide the result) with probability $\approx 1/p$; presuming that all the values are independent (which seems a reasonable assumption), we shoud expect to have $\sum_{p\lt n}\frac{1}{p}\approx M+\ln \ln n$ of them less than $n$, where $M$ is the Meissel-Mertens Constant — this sum is approximately 2.887 for $n=10^6$, so while we would heuristically 'expect' another one in that range it's not surprising that there aren't any, and despite the apparent lack of any more solutions, because of the divergence of the series the 'expected' number of solutions is still infinite!

Added: I put together my own software implementation in C++, confirming the results (that only $p=3$ and $p=11$ work) for $p\lt 5\times 10^5$ but also looking at the distribution of results $\mod p$ to see if any patterns arose. I computed the value of $\dfrac{\sum_{1}^{p-1}i!\bmod p}{p}$ (the division by $p$ serving to 'normalize' values into the range $[0, 1)$ ) for all $p$ in that range and then grouped them in bins; these are the plots with 100 bins and 256 bins. (These bin counts should be small enough relative to the prime values in consideration that there shouldn't be any substantial aliasing effects in binning the data.) The results are a bit scattershot but there doesn't appear to be any skew towards particular values (e.g. $\frac{p-1}2$) which would show up as a spike in the bins. If I have more time I may take a look at the square of the sum and particularly even the inverse of the sum $\pmod p$, but since the latter would actually require me writing a quick GCD algorithm to find the inverse it'll have to wait for later.

(Note: the sharp drop at the last element is an artifact of my adding a 0 value to the end of the bin data so my spreadsheet didn't automatically normalize the range; it's not an actual value.)

enter image description hereenter image description here

Solution 2:

I have a few small analytic comments and a relatively aggressive numerical search.

Firstly, I checked all odd $p$ (not necessarily prime) up to $10^6$ and found only the same five as Yoni above, namely 3, 9, 11, 33, and 99. This took ~1 hour using 1 core of my MacBook Pro, with an early unoptimized version of my code.

Secondly, of course we have: \begin{equation} p\nmid\sum_{n=1}^{p-1}n!\quad \forall p\in\mathbb{N}, p \textrm{ even} \end{equation} because all of the terms in the sum are even except the first one, so the sum is always odd.

Thirdly, a numerical search is sped-up slightly using the identity: \begin{equation} \sum_{n=1}^{p-1} n! = 1+2[1+3[1+4[...[1+(p-1)]]]] \end{equation} I've written software to check primes $p$ for $p|\sum_{n=1}^{p-1} n!$ using the above identity, and set it up to use multiple threads in an embarrassingly-parallel fashion. Using this software I've shown: \begin{equation} \textrm{3 and 11 are the only primes }p<10^7\textrm{ that divide }\sum_{n=1}^{p-1} n! \end{equation} If you want to compile it yourself then you'll have to link with the primesieve library.

#include <stdio.h>
#include <stdlib.h>
#include <primesieve/soe/PrimeSieve.h>


// To compile: g++ -O3 factorialSum.cpp -lprimesieve -o factorialSum
//         or: g++ -O3 -static factorialSum.cpp -lprimesieve -o factorialSum


// We perform arithmetic at u1 precision; p is restricted to u2 precision
typedef uint_fast64_t u1;
typedef uint_fast32_t u2;


// This evaluates $\sum_{n=1}^{p-1} n!$ based on the definition.
// We don't actually use this in our program below.
u2 factorialSum_modp_method1(u2 p){
  u1 factorial_modp=1;
  u1 factorialSum_modp=0;
  for(u1 n=1; n<=p-1; n++){
    factorial_modp*=n;
    factorial_modp%=p;
    factorialSum_modp+=factorial_modp;
    factorialSum_modp%=p;
  }
  return factorialSum_modp;
}


// This evaluates $\sum_{n=1}^{p-1} n! = 1+2[1+3[1+4[...[1+(p-1)]]]]$ via the RHS of the expression.
// This is somewhat faster than 'method1' above.
u2 factorialSum_modp_method2(u2 p){
  u1 factorialSum_modp=1;
  for(u1 n=p-1; n>1; n--)
    factorialSum_modp=(factorialSum_modp*n+1)%p;
  return factorialSum_modp;
}


// This prints out p iff p divides $\sum_{n=1}^{p-1} n!$.
// We pass this as an arument to PrimeSieve.generatePrimes().
void check_factorialSum_modp(u2 p){
  if(factorialSum_modp_method2(p)==0)
    printf("%u\n", (unsigned)p);
}


void error_incorrectUsage(char *args[]){
  printf("Usage: %s thread numThreads blockSize pMin pMax\n", args[0]);
  printf("       thread is a positive integer 0, ..., numThreads-1\n");
  printf("       numThreads is a natural number, 1, 2, ...\n");
  printf("       blockSize is the interleave size between threads\n");
  printf("       pMin and pMax are integers, the upper and lower limits on candidate primes p\n\n");
  printf("e.g.:  %s 0 1 10000 0 100000\n", args[0]);
  printf("       will check [0, 100000] for primes p that divide $\\sum_{n=1}^{p-1} n!$ \n");
  printf("       (expression written LaTeX).\n\n");
  printf("In order to use multiple threads, run multiple instances with different values of 'thread'.\n");
  exit(1);
}


int main(int nargs, char *args[]){

  // Parse command line arguments and check for proper usage
  if( !(nargs==6) ) error_incorrectUsage(args);
  char *error1, *error2, *error3, *error4, *error5;
  u2 thread     = strtoul(args[1], &error1, 10);
  u2 numThreads = strtoul(args[2], &error2, 10);
  u2 blockSize  = strtoul(args[3], &error3, 10);
  u2 pMin       = strtoul(args[4], &error4, 10);
  u2 pMax       = strtoul(args[5], &error5, 10);
  if(*error1 || *error2 || *error3 || *error4 || *error5) error_incorrectUsage(args);

  // Sieve intervals [pLower, pUpper] that belong to this thread
  PrimeSieve ps;
  for(u2 pLower=pMin+thread*blockSize; pLower<=pMax-blockSize; pLower+=numThreads*blockSize){
    u2 pUpper=pLower+blockSize;
    ps.generatePrimes(pLower, pUpper, check_factorialSum_modp);
    printf("interval [%u,%u] completed on thread %u\n", pLower, pUpper, thread);
  }

}