本文介绍了自然语言处理:用英语查找猥亵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到为词性标记的一组单词,我想找到那些在主流英语中有猥亵的词。我怎么能这样做?我应该制作一个巨大的列表,并检查列表中是否存在任何内容?我是否应该尝试使用正则表达式来捕获单个根上的一堆变体?

Given a set of words tagged for part of speech, I want to find those that are obscenities in mainstream English. How might I do this? Should I just make a huge list, and check for the presence of anything in the list? Should I try to use a regex to capture a bunch of variations on a single root?

如果它更容易,我不想过滤掉,只是为了得到一个数。因此,如果有一些误报,那就不是世界末日,只要有一个或多或少均匀过度夸大的利率。

If it makes it easier, I don't want to filter out, just to get a count. So if there are some false positives, it's not the end of the world, as long as there's a more or less uniformly over exaggerated rate.

推荐答案

一个巨大的列表并想到目标受众。是否有专门用于此的第三方服务而非自己推出?

A huge list and think of the target audience. Is there 3rd party service that you can use that specialises in this rather than rolling your own?

一些简单的想法:


  • 问题(并点击链接 了解更多信息)

  • 英国或美国英语? fanny,fag等

  • 政治正确性:黑色还是非裔美国人?

  • The Scunthorpe problem (and follow the links to "Swear filter" for more)
  • British or American English? fanny, fag etc
  • Political correctness: "black" or "Afro-American"?

编辑:


  • 并再次。无论是选择还是无知,普通单词都会冒犯

  • Be very careful and again here. Normal words can offend, whether by choice or ignorance

这篇关于自然语言处理:用英语查找猥亵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-30 22:47