Please ask for clarification using the "request clarification" button
if you have any questions about the answer.
Let me begin by rephrasing your question a bit.
We are given as input positive integers n, n_a and n_b such that
n_a+n_b= n.
I've just changed your "m" to "n_a" since it is easier for me to
remember what it means that way.
For each k, let S_k denote the string ab^ka , that is, 'a' followed k
'b's followed by a.
Let R be the set of words using the characters 'a' and 'b' of length n
that contain exactly n_a 'a' characters and n_b 'b' characters .
The question is, to compute, for each k,
(1) The total number of occurrences of the substring S_k in any
word in the set R
(2) The ratio of the value in (1) to the total number of
occurrences of any substring S_r in any word in the set R .
Let us begin by computing 1.
First, a basic fact of combinatorics is that the number of ways of
choosing a set of k elements from a larger set of n elements is n
choose k , usually written
n
( )
k
where the parentheses are supposed to contain both values but are hard
to draw.
I will write this value as choose(n,k) .
choose(n,k) is equal to (n!/(k!)(n-k)!) . Here m! , m factorial, is
1*2*3*...*(m-1)*m .
First of all, the number of words in R is just choose(n,n_a).
How many words of R can be formed with the string S_k occurring, say,
at the beginning of the word?
Well, if n_a is <2 or n_b < (k-2) then obviously the answer is 0.
Otherwise, we start the word with
S_k....
and we have left over to form the remainder of the word n_a-2 a
characters, n_b-(k-2) b characters, and n-k total characters.
Hence, the total number of words that can be formed from the remainder
is:
choose (n-k, n_a-2)
Now, the same reasoning holds no matter what place we put S_k. There
are n-k+1 possible positions in the string where S_k can begin.
Hence, the total number of possibilities for placing S_k, that is the
value (1), is:
((n-k+1)*choose(n-k,n_a-2)
unless n_a<2 or n_b<(k-2), in which case the answer is 0 .
For part (2) , we want an analytical expression for the sum of all
occurrences of such words S_k.
Consider any word of length n . A word S_k, for some k, begins at each
position in the word except for positions beginning with the letter
'b' or the last 'a'. Since there are n positions in the word, there
are
n-(n_b+1)=na-1
words in some S_k that occur in the word. Since there are
choose(n,n_a) such words, the total number of S_k occurrences in R is:
choose(n,n_a)*(n_a-1)
The desired frequency for substrings of length k is the quotient of
the values in (1) and (2) above, namely:
((n-k+1)*choose(n-k,n_a-2))/(choose(n,n_a)*(n_a-1))
To check this, I implemented the formula in a small Java program. I
ran it with the example you gave, and discovered that I believe the
number of "aa" occurrences and the number of "abba" occurrences was
miscounted (that's why we have computers, after all).
Here is the Java program, followed by some sample runs:
public class Prob{
public static long factorial(long i){
if (i<=1) return 1;
return i*factorial(--i);
}
public static long choose(long n, long k){
if (k>n) return 0;
if (n==0) return 0;
return factorial(n)/(factorial(k)*factorial(n-k));
}
public static long count(long substringlen, long nas, long nbs){
if (nas<2||nbs<(substringlen-2))return 0;
long stringlen=nas+nbs;
return (stringlen-substringlen+1)*choose(stringlen-substringlen,nas-2);
}
public static long counttotal(long nas, long nbs){
long stringlen=nas+nbs;
return choose(stringlen,nas)*(nas-1);
}
public static void main(String[] args){
int stringlength=Integer.parseInt(args[0]);
int nas=Integer.parseInt(args[1]);
int nbs=stringlength-nas;
long counttotal=counttotal(nas,nbs);
long sum=0; //check that the sums agree
for (int substringlen=2;substringlen<=stringlength;++substringlen){
String result="a";
for (int i=0;i<substringlen-2;i++)
result+="b";
result+="a";
long count = count(substringlen,nas,nbs);
System.out.println(result+": "+count+"/"+counttotal);
sum+=count;
}
System.out.println("\nCounted "+sum+" strings; expected:
"+counttotal);
}
}
java Prob 6 3
aa: 20/40
aba: 12/40
abba: 6/40
abbba: 2/40
abbbba: 0/40
java Prob 10 3
aa: 72/240
aba: 56/240
abba: 42/240
abbba: 30/240
abbbba: 20/240
abbbbba: 12/240
abbbbbba: 6/240
abbbbbbba: 2/240
abbbbbbbba: 0/240
Counted 240 strings; expected: 240
SEARCH STRATEGY
---------------
An internet search on substrings and substring occurrences was
unsuccessful.
At first I thought an analytical solution would not be possible. Then
I realized the counting argument above yields the solution. |