问题描述
考虑到为词性标记的一组单词,我想找到那些在主流英语中有猥亵的词。我怎么能这样做?我应该制作一个巨大的列表,并检查列表中是否存在任何内容?我是否应该尝试使用正则表达式来捕获单个根上的一堆变体?
Given a set of words tagged for part of speech, I want to find those that are obscenities in mainstream English. How might I do this? Should I just make a huge list, and check for the presence of anything in the list? Should I try to use a regex to capture a bunch of variations on a single root?
如果它更容易,我不想过滤掉,只是为了得到一个数。因此,如果有一些误报,那就不是世界末日,只要有一个或多或少均匀过度夸大的利率。
If it makes it easier, I don't want to filter out, just to get a count. So if there are some false positives, it's not the end of the world, as long as there's a more or less uniformly over exaggerated rate.
推荐答案
一个巨大的列表并想到目标受众。是否有专门用于此的第三方服务而非自己推出?
A huge list and think of the target audience. Is there 3rd party service that you can use that specialises in this rather than rolling your own?
一些简单的想法:
- 问题(并点击链接 了解更多信息)
- 英国或美国英语? fanny,fag等
- 政治正确性:黑色还是非裔美国人?
- The Scunthorpe problem (and follow the links to "Swear filter" for more)
- British or American English? fanny, fag etc
- Political correctness: "black" or "Afro-American"?
编辑:
- 并再次。无论是选择还是无知,普通单词都会冒犯
- Be very careful and again here. Normal words can offend, whether by choice or ignorance
这篇关于自然语言处理:用英语查找猥亵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!