EdVidParse : detecting people and content in educational videos
[摘要] There are thousands of hours of educational content on the Internet, with services like edX, Coursera, Berkeley WebCasts, and others offering hundreds of courses to hundreds of thousands of learners. Consequently, researchers are interested in the effectiveness of video learning. While educational videos vary, they share two common attributes: people and textual content. People are presenting content to learners in the form of text, graphs, charts, tables, and diagrams. With an annotation of people and textual content in an educational video, researchers can study the relationship between video learning and retention. This thesis presents EdVidParse, an automatic tool that takes an educational video and annotates it with bounding boxes around the people and textual content. EdVidParse uses internal features from deep convolutional neural networks to estimate the bounding boxes, achieving a 0.43 AP score on a test set. Three applications of EdVidParse, including identifying the video type, identifying people and textual content for interface design, and removing a person from a picture-in-picture video are presented. EdVidParse provides an easy interface for identifying people and textual content inside educational videos for use in video annotation, interface design, and video reconfiguration.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]