Google Answers Logo
View Question
 
Q: profanity/obscenity word and sentance DETECTOR ( No Answer,   3 Comments )
Question  
Subject: profanity/obscenity word and sentance DETECTOR
Category: Computers > Algorithms
Asked by: spaarky-ga
List Price: $9.50
Posted: 22 Nov 2005 05:23 PST
Expires: 22 Dec 2005 05:23 PST
Question ID: 596171
I am looking for a profanity detector program, ideally as opensource and php. 

Ideally is should go beyond just identifying banned words. and look
for possible sentances, things like 'you are a load of rubbush'. I
accept this isn't the best example but I it kinda makes the point.

As mentioned it should be a detector not a filter, basically it should
be able to be fed a chunk of text (1-5 sentances for example) and give
a score on how likly the text contains bad words or malice. It doesn't
have to be perfect, false positives are fine, but should have minimal
false negatives. (as always ;-)

If it helps can feed it a large number of text snippets that are known
clean. From this it could possibly gleen the types of sentances that
are acceptable.

(as a small bonus it would be good if it could detect marketing speak ;-)

It would only need to work on English (UK specifically), the text will
be largly generic descriptive text, but will likly contain jargon
and/or place names.

Any ideas for such a framework? possible suggestions accepted for
frameworks that without too much effort could accomlish this, even if
they are not really designed for this.
Answer  
There is no answer at this time.

Comments  
Subject: Re: profanity/obscenity word and sentance DETECTOR
From: dmrmv-ga on 22 Nov 2005 09:39 PST
 
Dan's Guardian (dansguardian.org) is an open source web filtering
program licensed under GPL and free for non-commercial use. You can
choose to filter on weighted phrase lists where words or phrases are
assigned weighted values and the sum of the values determines whether
the page passes the filter. Pages that fail can be logged and the
triggering words or phrases and their sum are also logged. It comes
with sample phrase lists but your own can be added easily. This is
basically what you are describing, except you probably want to examine
a text file directly.

You could use the algorithm from the DG source code under GPL to write
your own implementation, or you could set up DG normally with your
phrase lists and request your text files from a web server. This will
pass them through the filter and you can log files that fail because
your phrase list has been matched.
Subject: Re: profanity/obscenity word and sentance DETECTOR
From: spaarky-ga on 24 Nov 2005 06:01 PST
 
Thanks.

Thats the sort of thing I'm looking for, although I was hoping for a
freeform that trys to analys the structure of sentances. But I guess
thats not an easy task with the English language.

I could probably use that quite well, if nothing else the wordlists
they have generated.
Subject: Re: profanity/obscenity word and sentance DETECTOR
From: bozo99-ga on 23 Dec 2005 18:57 PST
 
Analysis of sentences tends to involve markov chains.

Detection of abuse could be attempted using a bayesian assessment tool
trained on sets of acceptable and abusive messages.  There are many
such tools available - intended as spam filters.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy