FlexGP 2.0 : multiple levels of parallelism in distributed machine learning via genetic programming

[摘要] This thesis presents FlexGP 2.0, a distributed cloud-backed machine learning system. FlexGP 2.0 features multiple levels of parallelism which provide a significant improvement in accuracy v.s. elapsed time. The amount of computational resources in FlexGP 2.0 can be scaled along several dimensions to support large, complex data. FlexGP 2.0;;s core genetic programming (GP) learner includes multithreaded C++ model evaluation and a multi-objective optimization algorithm which is extensible to pursue any number of objectives simultaneously in parallel. FlexGP 2.0 parallelizes the entire learner to obtain a large distributed population size and leverages communication between learners to increase performance via transferral of search progress between learners. FlexGP 2.0 factors training data to boost performance and enable support for increased data size and complexity. Several experiments are performed which verify the efficacy of FlexGP 2.0;;s multilevel parallelism. Experiments run on a large dataset from a real-world regression problem. The results demonstrate both less time to achieve the same accuracy and overall increased accuracy, and illustrate the value of FlexGP 2.0 as a platform for machine learning.

[发布日期] [发布机构] Massachusetts Institute of Technology

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：3

统一登录查看全文激活码登录查看全文