已收录 272606 条政策
 政策提纲
  • 暂无提纲
emrQA: A large corpus for question answering on electronic medical records
[摘要] We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 
[关键词] Electronic Medical Records, Question Answering, Logical Forms, Semantic Parsing, Dataset Generation, Closed Domain, i2b2 [时效性] 
   浏览次数:20      统一登录查看全文      激活码登录查看全文