本文介绍了在Java中处理字符串时如何避免Twitter的表情符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从Twitter处理Tweets并将其存储在数据库(MySQL)中。



我的进程运行完美,但有时我会收到错误这一个:

  2012-08-31 08:11:23,303 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - SQL错误:1366,SQLState:HY000 
2012-08-31 08:11:23,304错误org.hibernate.engine.jdbc.spi.SqlExceptionHelper - 字符串值不正确:'\xF0\x9F\x98\\ \\ x9D#...'列第1行的twe_text

当查找有问题的推文在我的日志中,我找到以下内容:

  2012-08-31 08:11:22,971 INFO com.myapp.TweetLoaderJob  - tweet的文字241175722096480256:RT @totallytoyosi_:我的
odies,我的好东西,不是你的好东西< U + 1F61D> #m& ms #sweeties #goodies #food @ The Ritzy CinemaCafé,Brixton htt ...

最后,看看是什么,我发现它是Twitter发送的一个表情符号 as-is



我已经调试了,只查看这个特定的推文,我的日食似乎不能识别这个编码字符。所以问题是,我该如何处理这个异常?我寻找配置我的MySQL数据库,但我不能更改编码(这是一个要求),所以我的选择是避免管理这种tweets或抑制这个复杂的字符。



但是如果Java无法识别,那么怎么做?

解决方案

在将数据存储在数据库中之前,您可以过滤字符串并删除不需要的部分(使用简单的正则表达式,如< U + [^>] +> ) 。


I'm working on processing Tweets from Twitter and storing them in a database (MySQL).

I have my process running perfectly but sometimes I get an error like this one:

2012-08-31 08:11:23,303 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - SQL Error: 1366, SQLState: HY000
2012-08-31 08:11:23,304 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - Incorrect string value: '\xF0\x9F\x98\x9D #...' for column 'twe_text' at row 1

When looking for the problematic tweet in my logs I find the following one:

 2012-08-31 08:11:22,971 INFO com.myapp.TweetLoaderJob  - Text for tweet 241175722096480256: RT @totallytoyosi_: My go
odies, my goodies, not your goodies  <U+1F61D> #m&ms #sweeties #goodies #food  @ The Ritzy Cinema Café, Brixton htt ...

And, finally, looking what the hell is , I discovered that it is an emoticon that Twitter sends as-is

I have debugged, looking only for this specific tweet and my eclipse seems to not recognize this encoding character. So the question is, how can I handle this exception? I looked for configuring my MySQL database, but I cannot change the encoding (it's a requirement), so my option is to avoid managing this kind of tweets or supress this complicated character.

But how to do it, if Java does not recognize it?

解决方案

You could filter your strings and remove the undesired part (with a simple regexp like <U+[^>]+>) before storing them in your database.

这篇关于在Java中处理字符串时如何避免Twitter的表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 10:12