Fast exact bigint factorial
I have a fixed-point bignumber library and want to implement fast factorial with no precision loss.
After some math tricks on paper I got this formula:
(4N)!=((2N)!).((2N)!).{ (2N+1).(2N+3).(2N+5)...(4N-1) }.(2^N)/(N!)
This is already pretty fast, and with some programming tricks the complexity nears ~ O(log(n))
.
To be clear, my current implementation is this:
//---------------------------------------------------------------------------
longnum fact(const DWORD &x,longnum &h) // h return (x>>1)! to speed up computation
{
if (x==0) { h=1; return 1; }
if (x==1) { h=1; return 1; }
if (x==2) { h=1; return 2; }
if (x==3) { h=1; return 6; }
if (x==4) { h=2; return 24; }
int N4,N2,N,i; longnum c,q;
N=(x>>2);
N2=N<<1;
N4=N<<2;
h=fact(N2,q); // get 2N! and N!
c=h*h; for (i=(N2+1)|1;i<=N4;i+=2) c*=i; c/=q; // c= ((2N!)^2)*T1 / N!
for (i=N4+1;i<=x;i++) c*=i; c.round(); c<<=N ; // convert 4N! -> x!, cut off precision losses
for (i=(N2+1)|1,N2=x>>1;i<=N2;i++) h*=i; h.round(); // convert 2N! -> (x/2)!, cut off precision losses
return c;
}
//---------------------------------------------------------------------------
longnum fact(const DWORD &x)
{
longnum tmp;
return fact(x,tmp);
}
//---------------------------------------------------------------------------
Now my question:
-
Is there a fast way to obtain
N!
from this term:T1 = { (2N+1).(2N+3).(2N+5)...(4N-1) }
?Already answered.
So to be clear, I need to extract this unknown term:
T2 = (4N)! / (((2N)!).((2N)!))
so:
(4N)! = (((2N)!).((2N)!)).T2
This would help a lot because then it would not be needed to compute .../(N!)
for factorial.
The T1
term is always integer-decomposable to this:
T1 = T2 * N!
Finally, it hit me :) I have done a little program for primes decomposition of factorials and then suddenly all becomes much clearer:
4! = 2!.2!.(2^1).(3^1) = 24
8! = 4!.4!.(2^1).(5^1).(7^1) = 40320
12! = 6!.6!.(2^2).(3^1).(7^1).(11^1) = 479001600
16! = 8!.8!.(2^1).(3^2).(5^1).(11^1).(13^1) = 20922789888000
20! = 10!.10!.(2^2).(11^1).(13^1).(17^1).(19^1) = 2432902008176640000
24! = 12!.12!.(2^2).(7^1).(13^1).(17^1).(19^1).(23^1) = 620448401733239439360000
28! = 14!.14!.(2^3).(3^3).(5^2).(17^1).(19^1).(23^1) = 304888344611713860501504000000
32! = 16!.16!.(2^1).(3^2).(5^1).(17^1).(19^1).(23^1).(29^1).(31^1) = 263130836933693530167218012160000000
36! = 18!.18!.(2^2).(3^1).(5^2).(7^1).(11^1).(19^1).(23^1).(29^1).(31^1) = 371993326789901217467999448150835200000000
40! = 20!.20!.(2^2).(3^2).(5^1).(7^1).(11^1).(13^1).(23^1).(29^1).(31^1).(37^1) = 815915283247897734345611269596115894272000000000
After analyzing the prime exponents of the T2
term (the rest after half factorials ^ 2) I derive the formula for them:
T2(4N) = multiplication(i=2,3,5,7,11,13,17,...) of ( i ^ sum(j=1,2,3,4,5,...) of (4N/(i^j))-(2N/(i^j)) )
- where multiplication is through all
primes <= 4N
- where sumation is until
i^j <= 4N
The problem is that the divisions 4N/(i^j)
and 2N/(i^j)
must be done in integer math so they cannot be simplified easily.
So I have another question:
-
How can I compute this:
exponent(i) = sum(j=1,2,3,4,5,...) of (N/(i^j))
effectively?i
is any prime wherei<=N
. It should be easy.
Now I calculate the exponent e
for prime i
inside the T2(N)
term like this (but this is too complex for my taste):
for (e=0,a=N/i,b=(N>>1)/i;(a)||(b);e+=a-b-b,a/=i,b/=i);
... I will try implement T2
into fact(x)
and compare speeds ...
I think your over thinking this, the good thing about calculating factorial is you can use the last calculation for calculating the new ones, so clearly the best way to do this is caching the results, this will also be much easier to implement than your solution.
I also saw on another question that you can speed up each single run by using big num multiplication the least amount of times, the way to do this would be keep multiplying until you reach the size of a big num then start multiplying the next numbers until you get a bignum. Repeat this and only at the very end multiply all the big nums you have left together.
My solution is simple but as with most programming problems it already has an accepted fastest solution. You can use a technique called prime swing which I haven't attempted to understand but it's all over the internet so you should have no trouble finding it
I have a solution:
(4N!)=((2N!)^2) . mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]
sub-terms of T2
are always prime^exponent
where exponent can be computed on small integers like this:
for (e=0,j=N4;j;e+=j&1,j/=p);
where e
is exponent, p
is prime and N4
is 4*N
Code for the new equation:
// edit beg:
// Sorry, forget to copy sorted list of all primes up to max n here it is
// end of table is marked with 0
// Primes are in DWORDs so they only 4Byte per number
// so the table is very small compared with lookup table for the same max n!
// and also primes are needed for many other routines in bignum
// can compute n! for n <= max prime in table
DWORD _arithmetics_primes[]={2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,0};
// edit end.
longnum fact(const DWORD &x)
{
if (x<=4)
{
if (x==4) return 24;
if (x==3) return 6;
if (x==2) return 2;
if (x==1) return 1;
if (x==0) return 1;
}
int N4,N2,p,i,j,e; longnum c,pp;
N4=(x>>2)<<2;
N2=N4>>1;
c=fact(N2); c*=c; // c=((2N)!)^2;
for (i=0;;i++) // c*= T2
{
p=_arithmetics_primes[i];
if (!p) break;
if (p>N4) break;
for (e=0,j=N4;j;e+=j&1,j/=p);
if (e) // c*=p^e
{
if (p==2) c<<=e;
else for (pp=p;;)
{
if (int(e&1)) c*=pp;
e>>=1; if (!e) break;
pp*=pp;
}
}
}
for (i=N4+1;i<=x;i++) { c*=i; } c.round();
return c;
}
Here are rough time measurements for the first 128 factorials so you can estimate real complexity.
Fixed point 768.128 bits arithmetics ... 231.36 decimals.
[ 0.001 ms ] 1! = 1
[ 0.000 ms ] 2! = 2
[ 0.000 ms ] 3! = 6
[ 0.000 ms ] 4! = 24
[ 0.006 ms ] 5! = 120
[ 0.006 ms ] 6! = 720
[ 0.007 ms ] 7! = 5040
[ 0.005 ms ] 8! = 40320
[ 0.006 ms ] 9! = 362880
[ 0.007 ms ] 10! = 3628800
[ 0.008 ms ] 11! = 39916800
[ 0.012 ms ] 12! = 479001600
[ 0.013 ms ] 13! = 6227020800
[ 0.014 ms ] 14! = 87178291200
[ 0.016 ms ] 15! = 1307674368000
[ 0.014 ms ] 16! = 20922789888000
[ 0.015 ms ] 17! = 355687428096000
[ 0.017 ms ] 18! = 6402373705728000
[ 0.019 ms ] 19! = 121645100408832000
[ 0.016 ms ] 20! = 2432902008176640000
[ 0.017 ms ] 21! = 51090942171709440000
[ 0.019 ms ] 22! = 1124000727777607680000
[ 0.021 ms ] 23! = 25852016738884976640000
[ 0.023 ms ] 24! = 620448401733239439360000
[ 0.025 ms ] 25! = 15511210043330985984000000
[ 0.027 ms ] 26! = 403291461126605635584000000
[ 0.029 ms ] 27! = 10888869450418352160768000000
[ 0.032 ms ] 28! = 304888344611713860501504000000
[ 0.034 ms ] 29! = 8841761993739701954543616000000
[ 0.037 ms ] 30! = 265252859812191058636308480000000
[ 0.039 ms ] 31! = 8222838654177922817725562880000000
[ 0.034 ms ] 32! = 263130836933693530167218012160000000
[ 0.037 ms ] 33! = 8683317618811886495518194401280000000
[ 0.039 ms ] 34! = 295232799039604140847618609643520000000
[ 0.041 ms ] 35! = 10333147966386144929666651337523200000000
[ 0.039 ms ] 36! = 371993326789901217467999448150835200000000
[ 0.041 ms ] 37! = 13763753091226345046315979581580902400000000
[ 0.044 ms ] 38! = 523022617466601111760007224100074291200000000
[ 0.046 ms ] 39! = 20397882081197443358640281739902897356800000000
[ 0.041 ms ] 40! = 815915283247897734345611269596115894272000000000
[ 0.044 ms ] 41! = 33452526613163807108170062053440751665152000000000
[ 0.046 ms ] 42! = 1405006117752879898543142606244511569936384000000000
[ 0.049 ms ] 43! = 60415263063373835637355132068513997507264512000000000
[ 0.048 ms ] 44! = 2658271574788448768043625811014615890319638528000000000
[ 0.050 ms ] 45! = 119622220865480194561963161495657715064383733760000000000
[ 0.054 ms ] 46! = 5502622159812088949850305428800254892961651752960000000000
[ 0.056 ms ] 47! = 258623241511168180642964355153611979969197632389120000000000
[ 0.056 ms ] 48! = 12413915592536072670862289047373375038521486354677760000000000
[ 0.060 ms ] 49! = 608281864034267560872252163321295376887552831379210240000000000
[ 0.063 ms ] 50! = 30414093201713378043612608166064768844377641568960512000000000000
[ 0.066 ms ] 51! = 1551118753287382280224243016469303211063259720016986112000000000000
[ 0.065 ms ] 52! = 80658175170943878571660636856403766975289505440883277824000000000000
[ 0.069 ms ] 53! = 4274883284060025564298013753389399649690343788366813724672000000000000
[ 0.072 ms ] 54! = 230843697339241380472092742683027581083278564571807941132288000000000000
[ 0.076 ms ] 55! = 12696403353658275925965100847566516959580321051449436762275840000000000000
[ 0.077 ms ] 56! = 710998587804863451854045647463724949736497978881168458687447040000000000000
[ 0.162 ms ] 57! = 40526919504877216755680601905432322134980384796226602145184481280000000000000
[ 0.095 ms ] 58! = 2350561331282878571829474910515074683828862318181142924420699914240000000000000
[ 0.093 ms ] 59! = 138683118545689835737939019720389406345902876772687432540821294940160000000000000
[ 0.089 ms ] 60! = 8320987112741390144276341183223364380754172606361245952449277696409600000000000000
[ 0.093 ms ] 61! = 507580213877224798800856812176625227226004528988036003099405939480985600000000000000
[ 0.098 ms ] 62! = 31469973260387937525653122354950764088012280797258232192163168247821107200000000000000
[ 0.096 ms ] 63! = 1982608315404440064116146708361898137544773690227268628106279599612729753600000000000000
[ 0.090 ms ] 64! = 126886932185884164103433389335161480802865516174545192198801894375214704230400000000000000
[ 0.100 ms ] 65! = 8247650592082470666723170306785496252186258551345437492922123134388955774976000000000000000
[ 0.104 ms ] 66! = 544344939077443064003729240247842752644293064388798874532860126869671081148416000000000000000
[ 0.111 ms ] 67! = 36471110918188685288249859096605464427167635314049524593701628500267962436943872000000000000000
[ 0.100 ms ] 68! = 2480035542436830599600990418569171581047399201355367672371710738018221445712183296000000000000000
[ 0.121 ms ] 69! = 171122452428141311372468338881272839092270544893520369393648040923257279754140647424000000000000000
[ 0.109 ms ] 70! = 11978571669969891796072783721689098736458938142546425857555362864628009582789845319680000000000000000
[ 0.119 ms ] 71! = 850478588567862317521167644239926010288584608120796235886430763388588680378079017697280000000000000000
[ 0.104 ms ] 72! = 61234458376886086861524070385274672740778091784697328983823014963978384987221689274204160000000000000000
[ 0.124 ms ] 73! = 4470115461512684340891257138125051110076800700282905015819080092370422104067183317016903680000000000000000
[ 0.113 ms ] 74! = 330788544151938641225953028221253782145683251820934971170611926835411235700971565459250872320000000000000000
[ 0.118 ms ] 75! = 24809140811395398091946477116594033660926243886570122837795894512655842677572867409443815424000000000000000000
[ 0.118 ms ] 76! = 1885494701666050254987932260861146558230394535379329335672487982961844043495537923117729972224000000000000000000
[ 0.123 ms ] 77! = 145183092028285869634070784086308284983740379224208358846781574688061991349156420080065207861248000000000000000000
[ 0.129 ms ] 78! = 11324281178206297831457521158732046228731749579488251990048962825668835325234200766245086213177344000000000000000000
[ 0.133 ms ] 79! = 894618213078297528685144171539831652069808216779571907213868063227837990693501860533361810841010176000000000000000000
[ 0.121 ms ] 80! = 71569457046263802294811533723186532165584657342365752577109445058227039255480148842668944867280814080000000000000000000
[ 0.119 ms ] 81! = 5797126020747367985879734231578109105412357244731625958745865049716390179693892056256184534249745940480000000000000000000
[ 0.131 ms ] 82! = 475364333701284174842138206989404946643813294067993328617160934076743994734899148613007131808479167119360000000000000000000
[ 0.150 ms ] 83! = 39455239697206586511897471180120610571436503407643446275224357528369751562996629334879591940103770870906880000000000000000000
[ 0.141 ms ] 84! = 3314240134565353266999387579130131288000666286242049487118846032383059131291716864129885722968716753156177920000000000000000000
[ 0.148 ms ] 85! = 281710411438055027694947944226061159480056634330574206405101912752560026159795933451040286452340924018275123200000000000000000000
[ 0.154 ms ] 86! = 24227095383672732381765523203441259715284870552429381750838764496720162249742450276789464634901319465571660595200000000000000000000
[ 0.163 ms ] 87! = 2107757298379527717213600518699389595229783738061356212322972511214654115727593174080683423236414793504734471782400000000000000000000
[ 0.211 ms ] 88! = 185482642257398439114796845645546284380220968949399346684421580986889562184028199319100141244804501828416633516851200000000000000000000
[ 0.151 ms ] 89! = 16507955160908461081216919262453619309839666236496541854913520707833171034378509739399912570787600662729080382999756800000000000000000000
[ 0.157 ms ] 90! = 1485715964481761497309522733620825737885569961284688766942216863704985393094065876545992131370884059645617234469978112000000000000000000000
[ 0.166 ms ] 91! = 135200152767840296255166568759495142147586866476906677791741734597153670771559994765685283954750449427751168336768008192000000000000000000000
[ 0.161 ms ] 92! = 12438414054641307255475324325873553077577991715875414356840239582938137710983519518443046123837041347353107486982656753664000000000000000000000
[ 0.169 ms ] 93! = 1156772507081641574759205162306240436214753229576413535186142281213246807121467315215203289516844845303838996289387078090752000000000000000000000
[ 0.173 ms ] 94! = 108736615665674308027365285256786601004186803580182872307497374434045199869417927630229109214583415458560865651202385340530688000000000000000000000
[ 0.188 ms ] 95! = 10329978488239059262599702099394727095397746340117372869212250571234293987594703124871765375385424468563282236864226607350415360000000000000000000000
[ 0.181 ms ] 96! = 991677934870949689209571401541893801158183648651267795444376054838492222809091499987689476037000748982075094738965754305639874560000000000000000000000
[ 0.187 ms ] 97! = 96192759682482119853328425949563698712343813919172976158104477319333745612481875498805879175589072651261284189679678167647067832320000000000000000000000
[ 0.194 ms ] 98! = 9426890448883247745626185743057242473809693764078951663494238777294707070023223798882976159207729119823605850588608460429412647567360000000000000000000000
[ 0.201 ms ] 99! = 933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582511852109168640000000000000000000000
[ 0.185 ms ] 100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
[ 0.191 ms ] 101! = 9425947759838359420851623124482936749562312794702543768327889353416977599316221476503087861591808346911623490003549599583369706302603264000000000000000000000000
[ 0.202 ms ] 102! = 961446671503512660926865558697259548455355905059659464369444714048531715130254590603314961882364451384985595980362059157503710042865532928000000000000000000000000
[ 0.207 ms ] 103! = 99029007164861804075467152545817733490901658221144924830052805546998766658416222832141441073883538492653516385977292093222882134415149891584000000000000000000000000
[ 0.242 ms ] 104! = 10299016745145627623848583864765044283053772454999072182325491776887871732475287174542709871683888003235965704141638377695179741979175588724736000000000000000000000000
[ 0.210 ms ] 105! = 1081396758240290900504101305800329649720646107774902579144176636573226531909905153326984536526808240339776398934872029657993872907813436816097280000000000000000000000000
[ 0.215 ms ] 106! = 114628056373470835453434738414834942870388487424139673389282723476762012382449946252660360871841673476016298287096435143747350528228224302506311680000000000000000000000000
[ 0.221 ms ] 107! = 12265202031961379393517517010387338887131568154382945052653251412013535324922144249034658613287059061933743916719318560380966506520420000368175349760000000000000000000000000
[ 0.217 ms ] 108! = 1324641819451828974499891837121832599810209360673358065686551152497461815091591578895743130235002378688844343005686404521144382704205360039762937774080000000000000000000000000
[ 0.226 ms ] 109! = 144385958320249358220488210246279753379312820313396029159834075622223337844983482099636001195615259277084033387619818092804737714758384244334160217374720000000000000000000000000
[ 0.232 ms ] 110! = 15882455415227429404253703127090772871724410234473563207581748318444567162948183030959960131517678520479243672638179990208521148623422266876757623911219200000000000000000000000000
[ 0.240 ms ] 111! = 1762952551090244663872161047107075788761409536026565516041574063347346955087248316436555574598462315773196047662837978913145847497199871623320096254145331200000000000000000000000000
[ 0.213 ms ] 112! = 197450685722107402353682037275992488341277868034975337796656295094902858969771811440894224355027779366597957338237853638272334919686385621811850780464277094400000000000000000000000000
[ 0.231 ms ] 113! = 22311927486598136465966070212187151182564399087952213171022161345724023063584214692821047352118139068425569179220877461124773845924561575264739138192463311667200000000000000000000000000
[ 0.240 ms ] 114! = 2543559733472187557120132004189335234812341496026552301496526393412538629248600474981599398141467853800514886431180030568224218435400019580180261753940817530060800000000000000000000000000
[ 0.252 ms ] 115! = 292509369349301569068815180481773552003419272043053514672100535242441942363589054622883930786268803187059211939585703515345785120071002251720730101703194015956992000000000000000000000000000
[ 0.248 ms ] 116! = 33931086844518982011982560935885732032396635556994207701963662088123265314176330336254535971207181169698868584991941607780111073928236261199604691797570505851011072000000000000000000000000000
[ 0.598 ms ] 117! = 3969937160808720895401959629498630647790406360168322301129748464310422041758630649341780708631240196854767624444057168110272995649603642560353748940315749184568295424000000000000000000000000000
[ 0.259 ms ] 118! = 468452584975429065657431236280838416439267950499862031533310318788629800927518416622330123618486343228862579684398745837012213486653229822121742374957258403779058860032000000000000000000000000000
[ 0.261 ms ] 119! = 55745857612076058813234317117419771556272886109483581752463927935846946310374691578057284710599874844234646982443450754604453404911734348832487342619913750049708004343808000000000000000000000000000
[ 0.254 ms ] 120! = 6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000
[ 0.263 ms ] 121! = 809429852527344373968162284544935082997082306309701607045776233628497660426640521713391773997910182738287074185078904956856663439318382745047716214841147650721760223072092160000000000000000000000000000
[ 0.270 ms ] 122! = 98750442008336013624115798714482080125644041369783596059584700502676714572050143649033796427745042294071023050579626404736512939596842694895821378210620013388054747214795243520000000000000000000000000000
[ 0.281 ms ] 123! = 12146304367025329675766243241881295855454217088483382315328918161829235892362167668831156960612640202170735835221294047782591091570411651472186029519906261646730733907419814952960000000000000000000000000000
[ 0.290 ms ] 124! = 1506141741511140879795014161993280686076322918971939407100785852066825250652908790935063463115967385069171243567440461925041295354731044782551067660468376444194611004520057054167040000000000000000000000000000
[ 0.322 ms ] 125! = 188267717688892609974376770249160085759540364871492425887598231508353156331613598866882932889495923133646405445930057740630161919341380597818883457558547055524326375565007131770880000000000000000000000000000000
[ 0.303 ms ] 126! = 23721732428800468856771473051394170805702085973808045661837377170052497697783313457227249544076486314839447086187187275319400401837013955325179315652376928996065123321190898603130880000000000000000000000000000000
[ 0.313 ms ] 127! = 3012660018457659544809977077527059692324164918673621799053346900596667207618480809067860692097713761984609779945772783965563851033300772326297773087851869982500270661791244122597621760000000000000000000000000000000
[ 0.307 ms ] 128! = 385620482362580421735677065923463640617493109590223590278828403276373402575165543560686168588507361534030051833058916347592172932262498857766114955245039357760034644709279247692495585280000000000000000000000000000000
refernce 128! = 385620482362580421735677065923463640617493109590223590278828403276373402575165543560686168588507361534030051833058916347592172932262498857766114955245039357760034644709279247692495585280000000000000000000000000000000
My measurements reveal that N!
uses
- max of
2.2N
fast low level long operations (+,-,<<,>>
) - slightly less than
N/2
long multiplications, but most of them are convenient in size which speeds up the multiplication, so the measured times do not match the obviousO(N/2*O(multiplication))
. After using analysis on the times the observed complexity best matchesO(N.log^2(N))
but still the error is too big so it will be most likely some slightly weirder function...
Also I have tried factorial as non recursive multiplication of primes only (similar to T2
term), but the results was much slower.
P.S.: Code posted in the question is also 100%
working, but slower than new one (even if it uses fewer multiplications - because of more memory needed for recursion and not optimized multiplicants order).