Dear nat54321,
In a collection of strings, there may be several groups of anagrams,
but a given string may belong to no more than one such group. For the
strings belonging to an anagram group, it is the case that each one
has the same number of characters of each type. Even if a string does
not have the same character composition as any another string in the
whole collection, it belongs to an anagram group of which it is the
only member. However, I assume that you are not interested in finding
these singular anagram groups.
An easy way to identify anagram groups is to take the first string,
then compare it to every string occurring later in the array. Each
string that has the same character composition as the first string
belongs to an anagram group together with the first string.
Furthermore, each of those strings is barred from future
consideration. We now take the second string, unless it has been
barred, and compare it to all later strings in the array. Then we move
on to the third string, and so on. This is called an O(n^2) algorithm
because if there are n strings in the array, it takes on the order of
n^2 string comparisons to find all the anagram groups.
The following is a pseudocode description of this algorithm, which
finds and displays all non-singular anagram groups.
ANAGRAMS (input: collection of strings STRS):
loop L:
let S be a string in STRS
let ANAS be an string collection containing only S
remove S from STRS
for each string T in STRS:
let T' be a copy of T
for each character C in S:
if T contains C:
remove C from T
else:
break
if T' is empty:
add T to ANAS
remove T from STRS
if ANAS contains more than one string:
print ANAS
if STRS is empty:
return
repeat loop L
In C, I would implement the algorithm as follows.
/* begin anagrams.c */
#include<stdlib.h>
#include<stdio.h>
#include<string.h>
void anagrams(char **strs, int num) {
int *used = (int *) calloc(num, sizeof(int)), i, j, k, l, len;
int *anas = (int *) calloc(num, sizeof(int)), ana_num;
for (i = 0; i < num; i++) {
len = strlen(strs[i]);
ana_num = 0;
for (j = i+1; j < num; j++) {
if (strlen(strs[j]) != len || used[j])
continue;
for (k = 0; k < len; k++) {
for (l = 0; l < len; l++)
if (strs[j][l] == strs[i][k])
break;
if (l == len)
break;
}
if (k == len)
anas[ana_num++] = j;
}
if (ana_num > 0) {
printf("anagrams: %s", strs[i]);
for (k = 0; k < ana_num; k++) {
printf(" %s", strs[anas[k]]);
used[anas[k]] = 1;
}
printf("\n");
}
}
free(used);
free(anas);
}
int main() {
char *strs[] = {"eat", "robe", "tea", "peels", "none", "ate",
"sleep", "bore", "one"};
anagrams(strs, 8);
return 0;
}
/* end anagrams.c */
If I compile and execute this program, the output is as follows.
anagrams: eat tea ate
anagrams: robe bore
anagrams: peels sleep
And that is the correct result.
If you feel that my answer is incomplete or inaccurate in any way, please
post a clarification request so that I have a chance to meet your needs
before you assign a rating.
Regards,
leapinglizard |