Google Answers Logo
View Question
 
Q: Computer Algorithm Question ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: Computer Algorithm Question
Category: Computers > Algorithms
Asked by: nat54321-ga
List Price: $10.00
Posted: 15 Sep 2004 14:32 PDT
Expires: 15 Oct 2004 14:32 PDT
Question ID: 401705
What is alogirthm to find "Given an array of strings, identify all of
the  anagrams in the array."
Answer  
Subject: Re: Computer Algorithm Question
Answered By: leapinglizard-ga on 15 Sep 2004 16:09 PDT
Rated:5 out of 5 stars
 
Dear nat54321,

In a collection of strings, there may be several groups of anagrams,
but a given string may belong to no more than one such group. For the
strings belonging to an anagram group, it is the case that each one
has the same number of characters of each type. Even if a string does
not have the same character composition as any another string in the
whole collection, it belongs to an anagram group of which it is the
only member. However, I assume that you are not interested in finding
these singular anagram groups.

An easy way to identify anagram groups is to take the first string,
then compare it to every string occurring later in the array. Each
string that has the same character composition as the first string
belongs to an anagram group together with the first string.
Furthermore, each of those strings is barred from future
consideration. We now take the second string, unless it has been
barred, and compare it to all later strings in the array. Then we move
on to the third string, and so on. This is called an O(n^2) algorithm
because if there are n strings in the array, it takes on the order of
n^2 string comparisons to find all the anagram groups.

The following is a pseudocode description of this algorithm, which
finds and displays all non-singular anagram groups.

ANAGRAMS (input: collection of strings STRS):
  loop L:
    let S be a string in STRS
    let ANAS be an string collection containing only S
    remove S from STRS
    for each string T in STRS:
      let T' be a copy of T
      for each character C in S:
        if T contains C:
          remove C from T
        else:
          break
      if T' is empty:
        add T to ANAS
        remove T from STRS
    if ANAS contains more than one string:
      print ANAS
    if STRS is empty:
      return
    repeat loop L

In C, I would implement the algorithm as follows.


/* begin anagrams.c */

#include<stdlib.h>
#include<stdio.h>
#include<string.h>

void anagrams(char **strs, int num) {
    int *used = (int *) calloc(num, sizeof(int)), i, j, k, l, len;
    int *anas = (int *) calloc(num, sizeof(int)), ana_num;
    for (i = 0; i < num; i++) {
        len = strlen(strs[i]);
        ana_num = 0;
        for (j = i+1; j < num; j++) {
            if (strlen(strs[j]) != len || used[j])
                continue;
            for (k = 0; k < len; k++) {
                for (l = 0; l < len; l++)
                    if (strs[j][l] == strs[i][k])
                        break;
                if (l == len)
                    break;
            }
            if (k == len)
                anas[ana_num++] = j;
        }
        if (ana_num > 0) {
            printf("anagrams: %s", strs[i]);
            for (k = 0; k < ana_num; k++) {
                printf(" %s", strs[anas[k]]);
                used[anas[k]] = 1;
            }
            printf("\n");
        }
    }
    free(used);
    free(anas);
}

int main() {
    char *strs[] = {"eat", "robe", "tea", "peels", "none", "ate",
"sleep", "bore", "one"};
    anagrams(strs, 8);
    return 0;
}

/* end anagrams.c */


If I compile and execute this program, the output is as follows.

anagrams: eat tea ate
anagrams: robe bore
anagrams: peels sleep

And that is the correct result.

If you feel that my answer is incomplete or inaccurate in any way, please
post a clarification request so that I have a chance to meet your needs
before you assign a rating.

Regards,

leapinglizard
nat54321-ga rated this answer:5 out of 5 stars
good answer. thanks

Comments  
Subject: Re: Computer Algorithm Question
From: gopman-ga on 15 Sep 2004 17:37 PDT
 
An alternate solution (there are lots of ways to do this):

For each word in the list, form a new word by sorting the letters in
that word. Each set of anagrams would have the same
sorted-letter-word.

You could then sort the list of words alphabetically by the sorted-letter-words.

This is somewhat faster than the n-squared solution. It is order N * log N.
Subject: Re: Computer Algorithm Question
From: leapinglizard-ga on 15 Sep 2004 17:58 PDT
 
Good suggestion, gopman. Thank you. If I had thought of it at the
time, I would surely have implemented the sorting approach. I like O(n
log n) much better than O(n^2).

leapinglizard

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy