Optimize table layout

I tried to implement Rahul's suggestion to view it as a convex optimization problem. The results are mixed. I can easily do small tables like 30 by 30, but 300 by 300 can be done with only about 1% precision if you are willing to wait 1 minute and getting down from there will take eternity. That is primarily due to the inefficiency of the solution finder I'm using (which is more or less just cycling over variables and optimizing over certain subsets of them; I wish I could find a better way or, at least, accelerate convergence somewhat). Nevertheless it is a good exercise in convex programming, so I'll post the details here. The algorithm can be modified to take into account "natural" restrictions of the kind $w_j\ge W_j$ or $h_i\ge H_i$ (width/height should not be too small) and the modification has pretty much the same rate of performance as far as I can tell from simulations, but I'll restrict myself to the original question here.

Let $w_j$ be the unknown widths and $a_{ij}$ be the known areas. We want to minimize $\sum_i\max_j \frac{a_{ij}}{w_j}$. It is useful to consider the dual problem. I will spare you from the general theory of duality and will just note that $$ \max_j \frac{a_{ij}}{w_j}=\max\left\{\sum_j b_{ij}\frac{a_{ij}}{w_j}:b_{ij}\ge 0, \sum_j b_{ij}=1\right\} $$ so if we consider all admissible vectors $w=(w_1,\dots,w_n)$ (non-negative entries, total sum $1$) and all admissible matrices $b=(b_{ij})$ (non-negative entries, all row sums equal to $1$), we can write our problem as that of finding $$ \min_w\max_b \sum_{i,j} b_{ij}\frac{a_{ij}}{w_j}\,. $$ The dual problem to that is finding $$ \max_b \min_w\sum_{i,j} b_{ij}\frac{a_{ij}}{w_j}\,. $$ The inner $\min_w$ is here easy to find: if we denote $S_j=\sum_i b_{ij}a_{ij}$, then it is just $(\sum_j \sqrt{S_j})^2$ with unique optimal $w_j$ proportional to $\sqrt{S_j}$.

There are two things one should understand about duality. The first one is that every admissible matrix $b$ (computed or just taken from the ceiling) can serve as the certificate of the impossibility to do better than a certain number in the original problem, i.e., the minimax is never less than the maximin. This is pretty trivial: just use the given $b$ to estimate the minimax from below. The second one is that the true value of minimax is actually the same as the true value of maximin (under some mild assumptions that certainly hold in our case). This is a somewhat non-trivial statement.

Together these two observations allow one to use the following strategy. We shall try to solve the dual problem. For every approximation $b$ to the solution, we will look at the easily computable lower bound $(\sum_j\sqrt{S_j})^2$ it produces and at the corresponding minimizer $w$. For that $w$ we can easily compute the original function $\sum_j\max_i\frac{a_{i,j}}{w_j}$. If its value is reasonably close to the lower bound, we know that we should look no further.

Now, of course, the question is how to maximize $\sum_j\sqrt S_j$ under our constraints on $b$. It doesn't look like an attractive problem because the number of unknowns increased from $n$ to $mn$. Still, one can notice that if we fix all rows of $b$ except, say, the $i$'th one, then the optimization of the $i$'th row is rather straightforward. Indeed, the corresponding problem is of the following kind:

**Find $\max\sum_j\sqrt{a_jb_j+c_j}$ where $a_j,c_j\ge 0$ are given and $b_j\ge 0$ are the unknowns subject to the constraint $\sum_j b_j=1$. Using the standard Lagrange multiplier mumbo-jumbo, we conclude that the optimal $b_j$ must satisfy the equations $\frac{a_{j}}{\sqrt{a_jb_j+c_j}}=\lambda$ whenever $b_j>0$ and the inequalities $\frac{a_{j}}{\sqrt{a_jb_j+c_j}}\le \lambda$ whenever $b_j=0$. Thus, the optimizer is just a vector $b_j=\max(\Lambda a_{j}-\frac{c_j}{a_j},0)$ with an unknown $\Lambda=\frac 1{\lambda^2}>0$ that should be found from the constraint $\sum_j b_j=1$. This is a one-variable equation for the root of a monotone function, so it can be easily solved in various ways.

Thus, we can optimize each row of $b$ with other rows fixed rather quickly. The natural idea is then just to cycle over rows optimizing each one in turn. It takes about 20 full cycles to get the lower bound and the value of the function within 1% range from each other on a random matrix (structured matrices seem to be even better) up to the size of 300 by 300 at least.

This is the description. The code (in Asymptote) is below.

srand(seconds());

int m=50, n=55;

real[][] a, b;
for(int i=0;i<m;++i)
{
    a[i]=new real[]; b[i]=new real[];
    for(int j=0; j<n; ++j)
    {
        a[i][j]=unitrand();
        b[i][j]=1/n;
    }
    //a[i][rand()%n]=2;
    a[i]*=unitrand();
}

real[] p, S;

for(int k=0;k<101;++k)
{
    for(int j=0;j<n;++j)
    {
        real s=0;
        for(int i=0;i<m;++i)
            s+=a[i][j]*b[i][j];
        S[j]=s;
        p[j]=sqrt(S[j]);
    }
    if(k%10==0)
    {
        write("*** Iteration "+string(k)+" ***");
        write(sum(map(sqrt,S))^2);
    }

    p/=sum(p);

    real s=0; 
    for(int i=0;i<m;++i)
    {
        real M=0; 
        for(int j=0;j<n;++j)
        {
            real h=a[i][j]/p[j];
            if(h>M)
                M=h;
        }
        s+=M;
    }
    if(k%10==0)
        write(s);
    //pause();

    for(int i=0;i<m;++i)
    {
        real[] A,V,C,B;
        for(int j=0;j<n;++j)
        {
            A[j]=a[i][j];
            V[j]=S[j]-a[i][j]*b[i][j];
            C[j]=V[j]/a[i][j];
        }
        real aa=(sum(C)+1)/sum(A);
        real da=1;
        while(da>1/10^10)
        {
            for(int j=0;j<n;++j)
            {
                B[j]=aa*A[j]-C[j];
                if(B[j]<0)
                {
                    A[j]=0;
                    B[j]=0;
                }
            }
            da=sum(B)-1; aa-=da/sum(A); 
        }
        for(int j=0;j<n;++j)
        {
            b[i][j]=B[j];
            S[j]=V[j]+a[i][j]*B[j];
        }
    }
}

write("************");

pause();

This problem can be solved readily using a convex programming library like CVX or CVXPY, after applying the transformation $a_{ij}\le h_i w_j \iff \log a_{ij} \le \log h_i + \log w_j$ to convert it to a convex problem. Here is CVXPY code for fedja's problem:

import cvxpy as cp
import numpy as np
from math import *

# Problem data.
m = 50
n = 37
# np.random.seed(0)
A = np.ones((m,n))
for i in range(m):
    for j in range(n):
        A[i,j] = 1 + 0.0001*sin(i + j*j)
wmax = 1

# Construct the problem.
h = cp.Variable((m,1))
w = cp.Variable((1,n))
objective = cp.Minimize(cp.sum(h))
H = cp.hstack([h for _ in range(n)])
W = cp.vstack([w for _ in range(m)])
constraints = [cp.log(A) <= cp.log(H) + cp.log(W), cp.sum(w) <= wmax]
problem = cp.Problem(objective, constraints)

problem.solve(verbose=True)

print("Optimal value", problem.value)
# print("Optimal widths", w.value)
# print("Optimal heights")
# print(h.value)
[...solver output, remove 'verbose=True' to hide...]
Maximum number of iterations reached, recovering best iterate (98) and stopping.

Close to OPTIMAL (within feastol=2.5e-07, reltol=5.8e-12, abstol=1.1e-08).
Runtime: 0.491104 seconds.

Optimal value 1850.1460524691356

(Note that this is not exactly a feasible solution: some constraints are violated by ${\sim}10^{-5}$. A feasible solution could be recovered by increasing the row heights slightly.)

Upper and lower bounds on $h_i$ and $w_j$ can easily be added as well.


Here is just the code that is (presumably) finding the exact answer for not too large matrices. All explanations will come later. Again it is in Asymptote. Why not in C? I know C++ but having a slow computer with an interpreted rather than compiled language allows you to watch the program as it goes (if you bother to output the work protocol to the screen, of course) and see many nuances that otherwise would be easily missed. I found at least 6 logical bugs in the original version this way (I hope the remaining number is less). The progress is not linear because the program tries to locate what Matt calls "king cells" in the matrix and the "vert=" line is a better mark of progress than the difference between the upper and the lower bounds. The final array of widths is $P$ and the program terminates when the relative error is $10^{-15}$ (be careful here: due to the rounding errors it may continue to run beyond that point, but once you see "no conflict" and "nothing to do" repeating again and again, the job is actually done and you are watching the geometric convergence game end (which can be also played differently, but who cares?).

srand(seconds());

int m=30, n=17, K=100001, k, SK=10, Count=0, proccnt=0, Failtime=1000000, I=0,J=0, cycletime=0; 
real M=0, Mdel=0, TTT=0, B;
int time=0, failtime=0, successtime=0; 
int tt=1, ttt=1, blcount=0, ulttol=3;


int start=seconds();

int[][] blacklist;
for(int i=0;i<m;++i) blacklist[i]=array(n,1);

real[][] a, aaa,  b , bb, bbb, db, dbb, bres;
real[] AA;

/// Initiating the matrix ////////////

real delta=0.0001;

for(int i=0;i<m;++i)
{
real u=unitrand(), v=unitrand();
a[i]=new real[]; b[i]=new real[];
for(int j=0; j<n; ++j) 
{
a[i][j]=1+delta*sin((i+j^2)); 
b[i][j]=1/n;
}
//a[rand()%(i+1)][rand()%n]=2;
//a[i]*=unitrand();
}

////////////////////////////////////////////


aaa=copy(a); bres=copy(b);
real kill=1/1000;



int [][] temp; bool[] conf=array(n,true); 
bool fast=true, notreset=true, confl=true;

for(int i=0;i<m;++i) temp[i]=array(n,0);

int[] Ind; for(int i=0;i<m;++i) Ind[i]=i;

real Norm(real[][] b)
{
real[] S;
for(int j=0;j<n;++j)
{
real s=0; for(int i=0;i<m;++i) s+=a[i][j]*b[i][j]; S[j]=s;
}
return sum(map(sqrt,S))^2;
}


void shuffle()
{
for(int kk=0;kk<m;++kk) {int a=rand()%m, b=rand()%m; int II=Ind[a]; Ind[a]=Ind[b]; Ind[b]=II;}
}

bool[] conflict(real[][] b)
{
bool[] conf=array(n,false);

int count=0; 

for(int i=0;i<m;++i) 
{
if(min(b[i])<0) {write("karaul"); pause();}
b[i]=max(b[i],array(n,0));
count+=sum(map(sgn,b[i]));
}
int[] pres=array(m,1);
int[][] sb;
for(int i=0;i<m;++i) {sb[i]=map(sgn,b[i]); sb[i][n]=1;}


for(int I=1;I<m;++I)
for(int i=0; i<I; ++i)
{
if(pres[i]>0 && sum(sb[i]*sb[I])>sb[i][n]*sb[I][n]) {pres[i]=0; sb[I]=sb[i]+sb[I];}
}

int vert,edgecnt,Vert=0,Edgecnt=0; int comp=sum(map(sgn,pres));
for(int i=0;i<m;++i) 
{
if(pres[i]>0) 
{
vert=sum(sb[i])-sb[i][n];
Vert+=vert;
edgecnt=-sb[i][n];
for(int j=0;j<n;++j) edgecnt+=max(2*sb[i][j]-1,0); 
Edgecnt+=edgecnt;
if(edgecnt>vert-1) for(int j=0;j<n;++j) {if(sb[i][j]>0) conf[j]=true;}
}
}
int alive=0; for(int i=0;i<m;++i) for(int j=0;j<n;++j)
if(conf[j] && b[i][j]>0 && blacklist[i][j]<=ttt) ++alive;
write("vert="+string(Vert)+"("+string(alive)+") edgecnt="+string(Edgecnt)+" comp="+ string(comp));
return conf;
}





real[] p, P, S;

for(k=0;k<K;++k)
{

void procedure()
{
for(int j=0;j<n;++j)
{
real s=0; for(int i=0;i<m;++i) s+=aaa[i][j]*b[i][j]; S[j]=s;
}
for(int i:Ind)
{
real aa;
real[] A,V,C,B;
for(int j=0;j<n;++j) {A[j]=aaa[i][j]; V[j]=S[j]-aaa[i][j]*b[i][j]; C[j]=V[j]/aaa[i][j];}
real aa=(k==0?(sum(C)+1)/sum(A):AA[i]);

int countbound=40;

for(int j=0;j<n;++j) B[j]=max(aa*A[j]-C[j],0);
if(sum(B)>1/2)
{
if(sum(B)<1)
{
real sl=0;
for(int j=0;j<n;++j) sl+=A[j]*sgn(B[j]);
aa+=1.0001*((1-sum(B))/sl); countbound=4;
}
}
else aa=(sum(C)+1)/sum(A);

real da=1;
int count=0;

while(da>0 && count<countbound)
{
++count; 
//write(da,count); //pause();
for(int j=0;j<n;++j) {B[j]=aa*A[j]-C[j]; if(B[j]<0) {B[j]=0; A[j]=0; C[j]=0;}}
if(sum(A)>0) {da=sum(B)-1; aa-=da/sum(A);}
else {write("alert"); pause(); for(int j=0;j<n;++j) {if(b[i][j]>0) A[j]=aaa[i][j];} aa=(sum(C)+1)/sum(A); } 
//write(sum(B),aa,da,sum(A),sum(C));
}
for(int j=0;j<n;++j) {b[i][j]=B[j]; S[j]=V[j]+aaa[i][j]*B[j];}
Count+=count; 

if(abs(sum(b[i])-1)>0.1) {write("rough!"); pause();}
AA[i]=aa; b[i]/=sum(b[i]);
}
++proccnt;
}

bool check()
{
bool check=false;
for(int i=0;i<m && !check;++i) for(int j=0;j<n;++j) check=check || (bres[i][j]>0 && b[i][j]==0);
return check;
}




void fix()
{
for(int i=0;i<m;++i) for(int j=0;j<n;++j) 
{
if(b[i][j]==0 && conf[j]) aaa[i][j]=a[i][j]*kill;
//if(b[i][j]==0) blacklist[i][j]=1;
}
}


void advance(bool adv=true)
{
for(int kk=0;kk<(adv?ttt:tt)*SK;++kk) procedure(); bres=copy(b); if(adv) {write("advancing with speed "+string(TTT)); fix();}
}


void reset(bool hard=true)
{
if(!confl) write("nothing to do"); else write("RESETTING "+(hard?"HARD":"SOFT")); 
fast=true; if(hard) blcount=0;   
//shuffle();
aaa=copy(a); for(int kk=0;kk<(confl && hard?ttt:1)*SK;++kk) procedure(); 
if(confl && hard) ttt*=2;  
fix(); 
}

real minb=1, shift=0;

TTT=1;

while (TTT>1/3) 
{ 
TTT=0;
//bbb=copy(b); advance(false); 
bb=copy(b); advance(false); bres=copy(b);

for(int i=0;i<m;++i) 
{
db[i]=b[i]-bb[i]; 
//dbb[i]=bb[i]-bbb[i]; 
shift=max(shift,max(map(abs,db[i]))); temp[i]=array(n,0);
}

for(int i=0;i<m;++i) for(int j=0;j<n;++j)
{
if(b[i][j]>0 && db[i][j]<0 && bb[i][j]>0) 
{
real u=-db[i][j]/b[i][j];
//v=-dbb[i][j]/bb[i][j]; 
if(u>TTT && u>0 && aaa[i][j]>a[i][j]/2 && blacklist[i][j]<=ttt) {TTT=u; I=i; J=j; minb=min(minb,b[i][j]);}
}
}
tt=(confl?blacklist[I][J]:1);
if(TTT>1/3) advance(); 
else if(TTT==0 || blcount>ulttol) reset();
else write('\n \naccelerating from speed '+string(TTT)+
"; position=("+string(I)+","+string(J)+"); cycle count= "+string(2*tt*SK)); 
}

time=seconds()-start; if(time>Failtime) {write('\n\nTotal failure'); pause(); Failtime*=2;} 

write("time= "+string(time)+", cycling "+string(cycletime)+
" seconds, failures =  "+string(failtime)+ ", successes= "+string(successtime));

write("count="+string(Count/m/proccnt)); 

conf=conflict(b);

for(int j=0;j<n;++j)
{
real s=0; for(int i=0;i<m;++i) s+=aaa[i][j]*b[i][j]; S[j]=s; p[j]=sqrt(s);  
}

p/=sum(p); 
if(k==0) P=copy(p); 
write(Mdel); 

{
real s=0, sss=0; 
for(int i=0;i<m;++i)
{
real M=0; 
for(int j=0;j<n;++j) {real h=a[i][j]/p[j]; if(h>M) M=h;}
sss+=M;
}


for(int i=0;i<m;++i)
{
real M=0; 
for(int j=0;j<n;++j) {real h=a[i][j]/P[j]; if(h>M) M=h;}
s+=M;
}

if(sss<s) P=copy(p); 
write(s,s-Mdel); 
if(s-Mdel<1/10^15*s) {write("******it took "+string(seconds()-start)+" seconds******");
pause();}
}

confl=false; for(int j=0;j<n;++j) confl=confl || conf[j]; 
if(!confl) {write("no conflict"); reset();} else fix();

if(fast)
{
for(int i=0;i<m;++i) for(int j=0;j<n;++j)
{
if(conf[j] && b[i][j]>0 && bb[i][j]>0) 
{
real u=-db[i][j]/b[i][j]; 
//v=-dbb[i][j]/bb[i][j]; 
if(u>TTT/10 && aaa[i][j]>a[i][j]/2 && blacklist[i][j]<=ttt) temp[i][j]=1;
}
}
}

if(confl) temp[I][J]=1;

void loop()
{
bres=copy(b); Mdel=Norm(b); B=b[I][J]; if(B==0) B=1;

int cyclestart=seconds();

for(int i=0; i<m;++i) for(int j=0; j<n; ++j) if(temp[i][j]>0) aaa[i][j]=a[i][j]*kill;

for(int kk=0;kk<tt*SK;++kk) procedure(); 

if(b[I][J]>0 && confl) {write("Too weak killing!"); pause(); kill/=10;}

for(int i=0; i<m ;++i) for(int j=0; j<n; ++j) if(temp[i][j]>0) aaa[i][j]=a[i][j];

for(int kk=0;kk<tt*SK;++kk) procedure();

cycletime+=seconds()-cyclestart+1;

M=Norm(b); 
}

loop(); real rat=b[I][J]/B;

while (rat>0 && rat<0.9 && M>Mdel) {write("Repeating..."); loop(); rat=b[I][J]/B;}

if(confl && rat>0 && M>Mdel) {write("BLACKLISTING!"); blacklist[I][J]=2*ttt; ++blcount; if(blcount>0) reset((blcount>4?true:false));} 


int bl=0; for (int i=0;i<m;++i) 
bl+=sum(map(sgn,max(blacklist[i]-array(n,ttt),array(n,0)))); 
write(string(bl)+"  vertices blacklisted");


if(M>Mdel) 
{
if(rat==0) {fast=true; blcount=0;}
if(confl) write("Success!"+(b[I][J]==0?" Vertex is gone": "Vertex stays with ratio "+string(b[I][J]/B)+
" and abs value "+string(b[I][J])));
if(!check()) tt*=2; 
Mdel=M; successtime+=2*tt*SK; notreset=true;} 
else 
{
b=copy(bres); fast=false; failtime+=2*tt*SK;
blacklist[I][J]=2*tt;
if(confl) write("Failure! "+string(Mdel-M)+" short...");   
if (tt<ttt) tt*=2; else 
if (TTT>0 && confl) 
{
write("BLACKLISTING!"); blacklist[I][J]=2*ttt; ++blcount; if(blcount>0) reset((blcount>ulttol?true:false));
//write(tt,ttt); pause(); 
} 
else reset(); 
//else {tt*=2;}
}


}

I know that adding a second answer to the same thread is somewhat frowned upon but I felt like a couple of things here merit a special discussion. To avoid any issues with undeserved reputation points, etc., I'll make it a community wiki. Also I apologize in advance that I do not have a continuous stretch of time to type the full thing in one go, so I'll type it by parts, which will, probably, bump it to the front page more often than necessary.

Before I go into mathematics, let me just say that Rahul's answer is both excellent and terrible. It is excellent because it allows one to draw from readily existing sources and just avoid any hard thinking and it is terrible for the same very reason. The code he offers doesn't solve the problem. It merely restates it in the language understandable to the machine, after which the problem is delegated to a black box that spits out an uncheckable answer (even apparently bogus sometimes, as our discussion with Rahul shows, though I still believe that it might be an issue with human programming rather than with the solver itself) and you are left with no better understanding of the matters than you had in the first place. Of course, most of the available solvers are far superior to anything you or I can write ourselves when we have a whole bunch of complicated problems with some crazy constraints and objective functions and need one solver that works for all of them. However I'm really curious what is the price one has to pay for creating a Universal Monster instead of a small application that is aimed at a specific question (and what is the price one has to pay for delegating tasks to such a monster instead of trying to find one's own approach if not to all, then at least to some questions). That's why I wanted to see what is the best precision one can obtain using the standard software on a particular matrix for which I can find an exact solution using a few tricks.

So, the questions I'm going to address now are adding natural additional constraints and the speed of convergence. Note that I can easily add only lower bounds $w_j\ge W_j$ and $h_i\ge H_i$ but not the upper ones. You'll see why in a minute.

Adding the height restrictions is easy. The duality is ultimately just a statement that you need to consider all "trivial lower bounds" and switch from minimax to maximin (the devil is, of course, in the details starting with the exact definition of "trivial lower bounds"). The objective function now is $\sum_i\max(H_i,\max_j\frac {a_{ij}}{w_j})$ and we can use the same trick to estimate it from below by $\sum_{i}[c_iH_i+\sum_jb_{ij}\frac {a_{ij}}{w_j}]$ where now $c_i,b_{ij}\ge 0$ and $c_i+\sum_j b_{ij}=1$. If we had no width restrictions, the story would be almost exactly as before: we would just add terms with the same $j$, use the relation between $c$ and $b$ and get $$ \sum_i H_i+\sum_j \frac{S_j}{w_j}-\sum_i H_i\sum_j b_{ij} $$ with $S_j=\sum_i a_{ij}b_{ij}$ as before. The minimum over $w_j$ is again attained when they are proportional to $\sqrt{S_j}$, so the functional to maximize is $$ \left[\sum_j\sqrt{S_j}\right]^2-\sum_i H_i\sum_{j}b_{ij}=\sigma^2-\sum_i H_i\sum_{j}b_{ij} $$ We can consider one row and take the derivatives, as before, and see that two cases are possible: either we have $\sum_{j}b_{ij}<1$, in which case we have the equations $\frac \sigma{\sqrt{S_j}}=H_i$ for all $j$ with $b_{ij}>0$ and the corresponding inequalities for $b_{ij}=0$, or we have the inequalities everywhere but the constraint $\sum_j b_{ij}=1$ instead. Both cases result in a one-parametric family of vectors to consider and we just should check which constraint is stronger. Note also that we do not need to get an exact maximizer in the row at each step. It is enough to move in the direction of the maximizer and not to overshoot. So, in effect we can view $\sigma$ as a constant when recalculating $b_{ij}$ (the non-overshooting property requires a proof, of course). That's what I'm using in the code though, of course, it is still a story about finding the root of a monotone function of one variable. Since we'll not get a final answer in one step, we'd better not to waste two much time on finding that root with high precision.

The tricky part is to incorporate the width restrictions. Of course, I can formally write $\min_w$ with the restricted domain but then I'll not be able to compute it easily and all my nice formulae and the speech about admissible $b_{ij}$ forming a one-parameter family will go down the drain. So we need to be a bit inventive here. Note that we can add any sum $\sum_j\tau_j(\frac{W_j}{w_j}-1)$ with non-negative $\tau_j$ to our lower bound because this quantity is never positive for $w_j\ge W_j$. This will allow us to bring $\tau$'s and $b$'s together and to redefine $S_j$ as $\tau_jW_j+\sum_{i}a_{ij}b_{ij}$, so that we would have the expression $$ \left[\sum_j\sqrt{S_j}\right]^2-\sum_i H_i\sum_{j}b_{ij}-\sum_j\tau_j $$ to maximize. Again, it is quite a story about why the minimax is the same as the maximin here, but it is at least clear that any such expression can serve as a lower bound for the original problem. Note that $\tau$ enters it in exactly the same way as each row of $b$ and the only difference is that we do not have the restriction that the sum of $\tau_j$ is bounded by $1$ (in fact, those numbers can be as large as they wish), so updating $\tau$'s can be done in pretty much the same way as updating $b$'s.

There is one important catch in this new setup however. Note that we may have the situation when all $b$'s and $\tau$'s are $0$, in which case $w_j$ cannot be determined as "proportional to $\sqrt{S_j}$" because anything is proportional to a string of zeroes. It really happens if (and only if) the constant height restrictions are the strongest constraint, so all weight goes on them. In this case we have no real competition between $w_j$, just the restriction that they shouldn't force the height of any cell to be above the corresponding $H_i$, so we can just initially put $w_j=\max_i \frac{a_{ij}}{H_i}$. The sum will be automatically not greater than $1$ and we can then just scale it to $1$ by enlarging each $w_j$.

The code is below (again in Asymptote, and again not combed, but, apparently, working). Feel free to edit and rewrite it in C#, etc. if you are still interested in how it all works :-). The next question to discuss is the convergence rate. With this simple iteration scheme, it is not good at all (something like $1$ over the number of iterations). I was curious for a while whether one could invent something that would facilitate finding the "exact" (technically machine precision) solutions for reasonable size matrices and after experimenting with few ideas I found something that works at least up to size 50 by 50 on my old laptop though, to be honest, I do not quite understand why exactly it works (however, as before, it outputs both the answer and the certificate of optimality, so technically it doesn't matter how it finds them; the result is completely verifiable when it is achieved).

srand(seconds());

int m=50, n=75, K=201, cc=20;

real[] H,P;
for(int i=0;i<m;++i) H[i]=n*unitrand();
for(int j=0;j<n;++j) P[j]=unitrand();
P*=unitrand()/sum(P);

real[][] a, b;
for(int i=0;i<m;++i)
{
a[i]=new real[]; b[i]=new real[];
if(i<m) {for(int j=0; j<n; ++j) {a[i][j]=unitrand(); b[i][j]=1/n;}}
//a[i][rand()%n]=2;
a[i]*=unitrand();
}

real[] p,t,S;
for(int j=0;j<n;++j) t[j]=0;

for(int k=0;k<K;++k)
{
for(int j=0;j<n;++j)
{
real s=P[j]*t[j]; for(int i=0;i<m;++i) s+=a[i][j]*b[i][j]; S[j]=s;
}


for(int j=0;j<n;++j)
{
p[j]=P[j]; for(int i=0;i<m;++i) p[j]=max(p[j],a[i][j]/(H[i]+1/10^10));
}
if(sum(p)<1) p/=sum(p);
else {p=map(sqrt,S); p/=sum(p);}





if(k%cc==0)
{
write("*** Iteration "+string(k)+" ***");
real s=sum(map(sqrt,S))^2-sum(t)+sum(H);
for(int i=0;i<m;++i) s-=H[i]*sum(b[i]);
write(s);
}

for(int kk=0;kk<20;++kk)
{
p=max(p,P);
p/=sum(p);
}
real s=0; 
for(int i=0;i<m;++i)
{
real M=H[i]; 
for(int j=0;j<n;++j) {real h=a[i][j]/p[j]; if(h>M) M=h;}
s+=M;
}
if(k%cc==0) write(s);
//pause();

real SRS=sum(map(sqrt,S));
for(int kk=0;kk<5;++kk)
{
real[] V,T;
for(int j=0;j<n;++j) {V[j]=S[j]-t[j]*P[j]; t[j]=(P[j]>0?max(SRS^2*P[j]-V[j]/P[j],0):0); S[j]=V[j]+t[j]*P[j];}
SRS=sum(map(sqrt,S));
}

for(int i=0;i<m;++i)
{
real[] A,V,C,B;
for(int j=0;j<n;++j) {A[j]=a[i][j]; V[j]=S[j]-a[i][j]*b[i][j]; C[j]=V[j]/a[i][j];}
if(H[i]>0) 
{
for(int j=0;j<n;++j) B[j]=max(SRS^2/H[i]^2*A[j]-C[j],0);
}
if(H[i]==0 || sum(B)>1)
{
real aa=(sum(C)+1)/sum(A);
real da=1;
while(da>1/10^10)
{
for(int j=0;j<n;++j) {B[j]=aa*A[j]-C[j]; if(B[j]<0) {A[j]=0;B[j]=0;}}
da=sum(B)-1; aa-=da/sum(A); 
}
}
for(int j=0;j<n;++j) {b[i][j]=B[j]; S[j]=V[j]+a[i][j]*B[j];}
SRS=sum(map(sqrt,S));
}


}

write("************");

write(t,P,p);

pause();

Here is just the code that is (presumably) finding the exact answer for not too large matrices. All explanations will come later. Again it is in Asymptote. Why not in C? I know C++ but having a slow computer with an interpreted rather than compiled language allows you to watch the program as it goes (if you bother to output the work protocol to the screen, of course) and see many nuances that otherwise would be easily missed. I found at least 6 logical bugs in the original version this way (I hope the remaining number is less). The progress is not linear because the program tries to locate what Matt calls "king cells" in the matrix and the "vert=" line is a better mark of progress than the difference between the upper and the lower bounds. The final array of widths is $P$ and the program terminates when the relative error is $10^{-15}$ (be careful here: due to the rounding errors it may continue to run beyond that point, but once you see "no conflict" and "nothing to do" repeating again and again, the job is actually done and you are watching the geometric convergence game end (which can be also played differently, but who cares?).

srand(seconds());

int m=30, n=17, K=100001, k, SK=10, Count=0, proccnt=0, Failtime=1000000, I=0,J=0, cycletime=0; 
real M=0, Mdel=0, TTT=0, B;
int time=0, failtime=0, successtime=0; 
int tt=1, ttt=1, blcount=0, ulttol=3;


int start=seconds();

int[][] blacklist;
for(int i=0;i<m;++i) blacklist[i]=array(n,1);

real[][] a, aaa,  b , bb, bbb, db, dbb, bres;
real[] AA;

/// Initiating the matrix ////////////

real delta=0.0001;

for(int i=0;i<m;++i)
{
real u=unitrand(), v=unitrand();
a[i]=new real[]; b[i]=new real[];
for(int j=0; j<n; ++j) 
{
a[i][j]=1+delta*sin((i+j^2)); 
b[i][j]=1/n;
}
//a[rand()%(i+1)][rand()%n]=2;
//a[i]*=unitrand();
}

////////////////////////////////////////////


aaa=copy(a); bres=copy(b);
real kill=1/1000;



int [][] temp; bool[] conf=array(n,true); 
bool fast=true, notreset=true, confl=true;

for(int i=0;i<m;++i) temp[i]=array(n,0);

int[] Ind; for(int i=0;i<m;++i) Ind[i]=i;

real Norm(real[][] b)
{
real[] S;
for(int j=0;j<n;++j)
{
real s=0; for(int i=0;i<m;++i) s+=a[i][j]*b[i][j]; S[j]=s;
}
return sum(map(sqrt,S))^2;
}


void shuffle()
{
for(int kk=0;kk<m;++kk) {int a=rand()%m, b=rand()%m; int II=Ind[a]; Ind[a]=Ind[b]; Ind[b]=II;}
}

bool[] conflict(real[][] b)
{
bool[] conf=array(n,false);

int count=0; 

for(int i=0;i<m;++i) 
{
if(min(b[i])<0) {write("karaul"); pause();}
b[i]=max(b[i],array(n,0));
count+=sum(map(sgn,b[i]));
}
int[] pres=array(m,1);
int[][] sb;
for(int i=0;i<m;++i) {sb[i]=map(sgn,b[i]); sb[i][n]=1;}


for(int I=1;I<m;++I)
for(int i=0; i<I; ++i)
{
if(pres[i]>0 && sum(sb[i]*sb[I])>sb[i][n]*sb[I][n]) {pres[i]=0; sb[I]=sb[i]+sb[I];}
}

int vert,edgecnt,Vert=0,Edgecnt=0; int comp=sum(map(sgn,pres));
for(int i=0;i<m;++i) 
{
if(pres[i]>0) 
{
vert=sum(sb[i])-sb[i][n];
Vert+=vert;
edgecnt=-sb[i][n];
for(int j=0;j<n;++j) edgecnt+=max(2*sb[i][j]-1,0); 
Edgecnt+=edgecnt;
if(edgecnt>vert-1) for(int j=0;j<n;++j) {if(sb[i][j]>0) conf[j]=true;}
}
}
int alive=0; for(int i=0;i<m;++i) for(int j=0;j<n;++j)
if(conf[j] && b[i][j]>0 && blacklist[i][j]<=ttt) ++alive;
write("vert="+string(Vert)+"("+string(alive)+") edgecnt="+string(Edgecnt)+" comp="+ string(comp));
return conf;
}





real[] p, P, S;

for(k=0;k<K;++k)
{

void procedure()
{
for(int j=0;j<n;++j)
{
real s=0; for(int i=0;i<m;++i) s+=aaa[i][j]*b[i][j]; S[j]=s;
}
for(int i:Ind)
{
real aa;
real[] A,V,C,B;
for(int j=0;j<n;++j) {A[j]=aaa[i][j]; V[j]=S[j]-aaa[i][j]*b[i][j]; C[j]=V[j]/aaa[i][j];}
real aa=(k==0?(sum(C)+1)/sum(A):AA[i]);

int countbound=40;

for(int j=0;j<n;++j) B[j]=max(aa*A[j]-C[j],0);
if(sum(B)>1/2)
{
if(sum(B)<1)
{
real sl=0;
for(int j=0;j<n;++j) sl+=A[j]*sgn(B[j]);
aa+=1.0001*((1-sum(B))/sl); countbound=4;
}
}
else aa=(sum(C)+1)/sum(A);

real da=1;
int count=0;

while(da>0 && count<countbound)
{
++count; 
//write(da,count); //pause();
for(int j=0;j<n;++j) {B[j]=aa*A[j]-C[j]; if(B[j]<0) {B[j]=0; A[j]=0; C[j]=0;}}
if(sum(A)>0) {da=sum(B)-1; aa-=da/sum(A);}
else {write("alert"); pause(); for(int j=0;j<n;++j) {if(b[i][j]>0) A[j]=aaa[i][j];} aa=(sum(C)+1)/sum(A); } 
//write(sum(B),aa,da,sum(A),sum(C));
}
for(int j=0;j<n;++j) {b[i][j]=B[j]; S[j]=V[j]+aaa[i][j]*B[j];}
Count+=count; 

if(abs(sum(b[i])-1)>0.1) {write("rough!"); pause();}
AA[i]=aa; b[i]/=sum(b[i]);
}
++proccnt;
}

bool check()
{
bool check=false;
for(int i=0;i<m && !check;++i) for(int j=0;j<n;++j) check=check || (bres[i][j]>0 && b[i][j]==0);
return check;
}




void fix()
{
for(int i=0;i<m;++i) for(int j=0;j<n;++j) 
{
if(b[i][j]==0 && conf[j]) aaa[i][j]=a[i][j]*kill;
//if(b[i][j]==0) blacklist[i][j]=1;
}
}


void advance(bool adv=true)
{
for(int kk=0;kk<(adv?ttt:tt)*SK;++kk) procedure(); bres=copy(b); if(adv) {write("advancing with speed "+string(TTT)); fix();}
}


void reset(bool hard=true)
{
if(!confl) write("nothing to do"); else write("RESETTING "+(hard?"HARD":"SOFT")); 
fast=true; if(hard) blcount=0;   
//shuffle();
aaa=copy(a); for(int kk=0;kk<(confl && hard?ttt:1)*SK;++kk) procedure(); 
if(confl && hard) ttt*=2;  
fix(); 
}

real minb=1, shift=0;

TTT=1;

while (TTT>1/3) 
{ 
TTT=0;
//bbb=copy(b); advance(false); 
bb=copy(b); advance(false); bres=copy(b);

for(int i=0;i<m;++i) 
{
db[i]=b[i]-bb[i]; 
//dbb[i]=bb[i]-bbb[i]; 
shift=max(shift,max(map(abs,db[i]))); temp[i]=array(n,0);
}

for(int i=0;i<m;++i) for(int j=0;j<n;++j)
{
if(b[i][j]>0 && db[i][j]<0 && bb[i][j]>0) 
{
real u=-db[i][j]/b[i][j];
//v=-dbb[i][j]/bb[i][j]; 
if(u>TTT && u>0 && aaa[i][j]>a[i][j]/2 && blacklist[i][j]<=ttt) {TTT=u; I=i; J=j; minb=min(minb,b[i][j]);}
}
}
tt=(confl?blacklist[I][J]:1);
if(TTT>1/3) advance(); 
else if(TTT==0 || blcount>ulttol) reset();
else write('\n \naccelerating from speed '+string(TTT)+
"; position=("+string(I)+","+string(J)+"); cycle count= "+string(2*tt*SK)); 
}

time=seconds()-start; if(time>Failtime) {write('\n\nTotal failure'); pause(); Failtime*=2;} 

write("time= "+string(time)+", cycling "+string(cycletime)+
" seconds, failures =  "+string(failtime)+ ", successes= "+string(successtime));

write("count="+string(Count/m/proccnt)); 

conf=conflict(b);

for(int j=0;j<n;++j)
{
real s=0; for(int i=0;i<m;++i) s+=aaa[i][j]*b[i][j]; S[j]=s; p[j]=sqrt(s);  
}

p/=sum(p); 
if(k==0) P=copy(p); 
write(Mdel); 

{
real s=0, sss=0; 
for(int i=0;i<m;++i)
{
real M=0; 
for(int j=0;j<n;++j) {real h=a[i][j]/p[j]; if(h>M) M=h;}
sss+=M;
}


for(int i=0;i<m;++i)
{
real M=0; 
for(int j=0;j<n;++j) {real h=a[i][j]/P[j]; if(h>M) M=h;}
s+=M;
}

if(sss<s) P=copy(p); 
write(s,s-Mdel); 
if(s-Mdel<1/10^15*s) {write("******it took "+string(seconds()-start)+" seconds******");
pause();}
}

confl=false; for(int j=0;j<n;++j) confl=confl || conf[j]; 
if(!confl) {write("no conflict"); reset();} else fix();

if(fast)
{
for(int i=0;i<m;++i) for(int j=0;j<n;++j)
{
if(conf[j] && b[i][j]>0 && bb[i][j]>0) 
{
real u=-db[i][j]/b[i][j]; 
//v=-dbb[i][j]/bb[i][j]; 
if(u>TTT/10 && aaa[i][j]>a[i][j]/2 && blacklist[i][j]<=ttt) temp[i][j]=1;
}
}
}

if(confl) temp[I][J]=1;

void loop()
{
bres=copy(b); Mdel=Norm(b); B=b[I][J]; if(B==0) B=1;

int cyclestart=seconds();

for(int i=0; i<m;++i) for(int j=0; j<n; ++j) if(temp[i][j]>0) aaa[i][j]=a[i][j]*kill;

for(int kk=0;kk<tt*SK;++kk) procedure(); 

if(b[I][J]>0 && confl) {write("Too weak killing!"); pause(); kill/=10;}

for(int i=0; i<m ;++i) for(int j=0; j<n; ++j) if(temp[i][j]>0) aaa[i][j]=a[i][j];

for(int kk=0;kk<tt*SK;++kk) procedure();

cycletime+=seconds()-cyclestart+1;

M=Norm(b); 
}

loop(); real rat=b[I][J]/B;

while (rat>0 && rat<0.9 && M>Mdel) {write("Repeating..."); loop(); rat=b[I][J]/B;}

if(confl && rat>0 && M>Mdel) {write("BLACKLISTING!"); blacklist[I][J]=2*ttt; ++blcount; if(blcount>0) reset((blcount>4?true:false));} 


int bl=0; for (int i=0;i<m;++i) 
bl+=sum(map(sgn,max(blacklist[i]-array(n,ttt),array(n,0)))); 
write(string(bl)+"  vertices blacklisted");


if(M>Mdel) 
{
if(rat==0) {fast=true; blcount=0;}
if(confl) write("Success!"+(b[I][J]==0?" Vertex is gone": "Vertex stays with ratio "+string(b[I][J]/B)+
" and abs value "+string(b[I][J])));
if(!check()) tt*=2; 
Mdel=M; successtime+=2*tt*SK; notreset=true;} 
else 
{
b=copy(bres); fast=false; failtime+=2*tt*SK;
blacklist[I][J]=2*tt;
if(confl) write("Failure! "+string(Mdel-M)+" short...");   
if (tt<ttt) tt*=2; else 
if (TTT>0 && confl) 
{
write("BLACKLISTING!"); blacklist[I][J]=2*ttt; ++blcount; if(blcount>0) reset((blcount>ulttol?true:false));
//write(tt,ttt); pause(); 
} 
else reset(); 
//else {tt*=2;}
}


}