|
|
Subject:
profanity/obscenity word and sentance DETECTOR
Category: Computers > Algorithms Asked by: spaarky-ga List Price: $9.50 |
Posted:
22 Nov 2005 05:23 PST
Expires: 22 Dec 2005 05:23 PST Question ID: 596171 |
I am looking for a profanity detector program, ideally as opensource and php. Ideally is should go beyond just identifying banned words. and look for possible sentances, things like 'you are a load of rubbush'. I accept this isn't the best example but I it kinda makes the point. As mentioned it should be a detector not a filter, basically it should be able to be fed a chunk of text (1-5 sentances for example) and give a score on how likly the text contains bad words or malice. It doesn't have to be perfect, false positives are fine, but should have minimal false negatives. (as always ;-) If it helps can feed it a large number of text snippets that are known clean. From this it could possibly gleen the types of sentances that are acceptable. (as a small bonus it would be good if it could detect marketing speak ;-) It would only need to work on English (UK specifically), the text will be largly generic descriptive text, but will likly contain jargon and/or place names. Any ideas for such a framework? possible suggestions accepted for frameworks that without too much effort could accomlish this, even if they are not really designed for this. |
|
There is no answer at this time. |
|
Subject:
Re: profanity/obscenity word and sentance DETECTOR
From: dmrmv-ga on 22 Nov 2005 09:39 PST |
Dan's Guardian (dansguardian.org) is an open source web filtering program licensed under GPL and free for non-commercial use. You can choose to filter on weighted phrase lists where words or phrases are assigned weighted values and the sum of the values determines whether the page passes the filter. Pages that fail can be logged and the triggering words or phrases and their sum are also logged. It comes with sample phrase lists but your own can be added easily. This is basically what you are describing, except you probably want to examine a text file directly. You could use the algorithm from the DG source code under GPL to write your own implementation, or you could set up DG normally with your phrase lists and request your text files from a web server. This will pass them through the filter and you can log files that fail because your phrase list has been matched. |
Subject:
Re: profanity/obscenity word and sentance DETECTOR
From: spaarky-ga on 24 Nov 2005 06:01 PST |
Thanks. Thats the sort of thing I'm looking for, although I was hoping for a freeform that trys to analys the structure of sentances. But I guess thats not an easy task with the English language. I could probably use that quite well, if nothing else the wordlists they have generated. |
Subject:
Re: profanity/obscenity word and sentance DETECTOR
From: bozo99-ga on 23 Dec 2005 18:57 PST |
Analysis of sentences tends to involve markov chains. Detection of abuse could be attempted using a bayesian assessment tool trained on sets of acceptable and abusive messages. There are many such tools available - intended as spam filters. |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |