已收录 270542 条政策
 政策提纲
  • 暂无提纲
Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets
[摘要] In Natural Language Processing (NLP), Part Of Speech (POS) tagging is an important step; it is a fundamental requirement for many applications, such as information extraction, machine translation, and grammar checking. Successful POS taggers have been developed for many languages, including Arabic. Currently, the spread of social media has increased the diversity of dialects as people use them in their online communications. Therefore, it has become more difficult for researchers to classify some words that are understood by humans but not computers. In addition, most Arabic POS research focuses on Modern Standard Arabic (MSA), while Dialect Arabic (DA) receives less attention. This paper aims to evaluate the performance of two Arabic taggers when used on dialect Arabic tweets and determine which tagger is the appropriate one, which will accordingly help to improve the existent taggers for dialect Arabic tweets. We used the Farasa and CAMeL taggers, which are commonly used to analyze Arabic texts and are considered the best taggers for Arabic. The results indicate that CAMeL tagger performed better than Farasa tagger, with accuracies of 92% and 83% respectively. In other words, a hybrid POS tagger trained with MSA and DA returns better results than the one trained on MSA.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 计算机科学(综合)
[关键词] Dialect arabic tweets;POS;POS tagging;MSA tagger;farasa tagger;CAMeL tagger [时效性] 
   浏览次数:1      统一登录查看全文      激活码登录查看全文