What is the probability of finding a sorted sequence of length $L$ in an array of $N$ randomly generated numbers?

Solution 1:

I disagree with your analysis. First of all, assuming that you are given $k$ distinct numbers, and you chose all $k$ numbers, one at a time, without replacement, then the probability that the numbers that you chose are in strictly ascending order is $\displaystyle \frac{1}{k!}.$ This is because there are $(k!)$ ways that the selection of numbers may be ordered, only one of which is in strictly ascending order.

I am assuming that all $k$ numbers are distinct.

Then, if you instead draw a subset of $L$ numbers, there are (similarly) $L!$ ways that these numbers might be drawn, only one of which is in strictly ascending order.

Therefore, the probability that the $L$ (distinct) numbers are drawn (without replacement) in strictly ascending order is $\displaystyle \frac{1}{L!}.$


Assuming that (so far) I have interpreted the specific problem as you intend, then it seems as if the question being posed is:

Suppose that $L \leq N \leq k$, where there are $k$ total distinct numbers, of which $N$ numbers will be drawn without replacement. What is the probability that there will be a subsequence of at least $L$ consecutive numbers, within the sequence of $N$ numbers, such that the L consecutive numbers are in strictly ascending order.


If this is the problem to be attacked, then one approach is Inclusion-Exclusion.

That is, for $1 \leq i \leq (N - L + 1)$, let $A_i$ denote the set of all possible sequences of the $N$ numbers such that the subsequence that starts at position $i$, and lasts for $L$ positions is in strictly ascending order.

Given a set $S$ with a finite number of elements, let $|S|$ denote the number of elements in $S$.

Then, the final computation will be $$\frac{|A_1 \cup A_2 \cup \cdots \cup A_{(N-L+1)}|}{N!}. \tag1 $$


Edit
In the above computation, I took the shortcut that it is irrelevant how many elements $k$ are in the superset. Once the $N$ (distinct) items are drawn, without replacement, the problem is totally encapsulated by a consideration of the $(N!)$ ways that these $N$ items may be ordered.


The computation of the numerator in (1) above is standard:

Let $T_1 = |A_1| + |A_2| + \cdots + |A_{(N-L+1)}|.$

Let $\displaystyle T_2 = \sum_{1 \leq i_1 < i_2 \leq (N-L+1)} |A_{i_1} \cap A_{i_2}|.$

That is, $T_2$ represents the sum of $\binom{N - L + 1}{2}$ terms.

Similarly, for $3 \leq r \leq (N-L+1)$,

let $\displaystyle T_r = \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq (N - L + 1)} ~|A_{i_1} \cap A_{i_2} \cap \cdots \cap A_{i_r}|.$

That is, $T_r$ represents the sum of $\binom{N - L + 1}{r}$ terms.

Then, the numerator in (1) above equals

$$\sum_{a = 1}^{(N-L+1)} (-1)^{a-1}T_a.$$


Then, it becomes a simple matter of trying for elegant shortcuts.

For example, $A_1$ leaves $(N-L)$ elements unordered, and so $|A_1| = \binom{N}{L} \times (N-L)!.$ That is, there are $\binom{N}{L}$ choices for the first $L$ items in $A_1$, these first $L$ items can only be ordered in $1$ way, and the other $N-L$ items can be ordered in any way.

By symmetry, you have that $|A_1| = |A_2| = \cdots = |A_{(N-L+1)}|.$

Therefore, $T_1 = (N-L+1) \times |A_1|$.

While the computation of $T_2$ admits some shortcuts, it is nowhere as easy as the computation of $T_1$. That is, when computing $|A_{i_1} \cap A_{i_2}| ~: ~(i_1 < i_2)$ you have to be concerned about two things:

  • Is there an intersection? That is, is $(i_1) + (L-1) \geq i_2$.

  • If so, what is the length of the intersection. That is, how much is $(i_1) + (L) - (i_2).$

Unfortunately, these considerations snowball, when similarly attacking the computations of $T_3, T_4, \cdots.$


Generally, besides Inclusion-Exclusion, the two other approaches to this problem that I am familiar with are the direct approach, and recursion. Clearly, Inclusion-Exclusion is no walk in the park. I do think that it is better than the direct approach.

That means that recursion is sort of a wild card, that may yield easier analysis.