{"id":7250,"date":"2013-03-18T14:44:29","date_gmt":"2013-03-18T13:44:29","guid":{"rendered":"http:\/\/www.itwriting.com\/blog\/?p=7250"},"modified":"2013-03-18T14:44:29","modified_gmt":"2013-03-18T13:44:29","slug":"intel-xeon-phi-shines-vs-nvidia-gpu-accelerators-in-ohio-state-university-tests","status":"publish","type":"post","link":"https:\/\/www.itwriting.com\/blog\/7250-intel-xeon-phi-shines-vs-nvidia-gpu-accelerators-in-ohio-state-university-tests.html","title":{"rendered":"Intel Xeon Phi shines vs NVidia GPU accelerators in Ohio State University tests"},"content":{"rendered":"<p>Which is better for massively parallel computing, a GPU accelerator board from NVidia, or Intel\u2019s new Xeon Phi? On the eve of NVidia\u2019s <a href=\"http:\/\/www.gputechconf.com\/page\/home.html\" target=\"_blank\">GPU Technology Conference<\/a> comes a paper which Intel will enjoy. Erik Sauley, Kamer Kayay, and Umit V. C atalyurek from the Ohio State University have issued a paper with performance comparisons between Xeon Phi, NVIDIA Tesla C2050 and NVIDIA Tesla K20. The K20 has 2,496 CUDA cores, versus a mere 61 processor cores on the Xeon Phi, yet on the particular calculations under test the researchers got generally better performance from Xeon Phi. <\/p>\n<p>In the case of sparse-matrix vector multiplication (SpMV):<\/p>\n<blockquote>\n<p>For GPU architectures, the K20 card is typically faster than the C2050 card. It performs better for 18 of the 22 instances. It obtains between 4.9 and 13.2GFlop\/s and the highest performance on 9 of the instances. Xeon Phi reaches the highest performance on 12 of the instances and it is the only architecture which can obtain more than 15GFlop\/s.<\/p>\n<\/blockquote>\n<p>and in the case of sparse-matrix matrix multiplication (SpMM):<\/p>\n<blockquote>\n<p>The K20 GPU is often more than twice faster than C2050, which is much better compared with their relative performances in SpMV. The Xeon Phi coprocessor gets     <br \/>the best performance in 14 instances where this number is 5 and 3 for the CPU and GPU configurations, respectively. Intel Xeon Phi is the only architecture which achieves more than 100GFlop\/s.<\/p>\n<\/blockquote>\n<p>Note that this is a limited test, and that the authors note that SpMV computation is known to be a difficult case for GPU computing:<\/p>\n<blockquote>\n<p>the irregularity and sparsity of SpMV-like kernels create several problems for these architectures.<\/p>\n<\/blockquote>\n<p>They also note that memory latency is the biggest factor slowing performance:<\/p>\n<blockquote>\n<p>At last, for most instances, the SpMV kernel appears to be memory latency bound rather than memory bandwidth bound<\/p>\n<\/blockquote>\n<p>It is difficult to compare like with like. The Xeon Phi implementation uses <a href=\"http:\/\/openmp.org\/wp\/\" target=\"_blank\">OpenMP<\/a>, whereas the GPU implementation uses <a href=\"https:\/\/developer.nvidia.com\/cusparse\" target=\"_blank\">CuSparse<\/a>. I would also be interested to know whether as much effort was made to optimise for the GPU as for the Xeon Phi.<\/p>\n<p>Still, this is a real-world test that, if nothing else, demonstrates that in the right circumstances the smaller number of cores in a Xeon Phi do not prevent it comparing favourably against a GPU accelerator:<\/p>\n<blockquote>\n<p>When compared with cutting-edge processors and accelerators, its SpMV, and especially SpMM, performance are superior thanks to its wide registers     <br \/>and vectorization capabilities. We believe that Xeon Phi will gain more interest in HPC community in the near future.<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Which is better for massively parallel computing, a GPU accelerator board from NVidia, or Intel\u2019s new Xeon Phi? On the eve of NVidia\u2019s GPU Technology Conference comes a paper which Intel will enjoy. Erik Sauley, Kamer Kayay, and Umit V. C atalyurek from the Ohio State University have issued a paper with performance comparisons between &hellip; <a href=\"https:\/\/www.itwriting.com\/blog\/7250-intel-xeon-phi-shines-vs-nvidia-gpu-accelerators-in-ohio-state-university-tests.html\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Intel Xeon Phi shines vs NVidia GPU accelerators in Ohio State University tests<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[39,79,80],"tags":[453,486,654,902,1051],"class_list":["post-7250","post","type-post","status-publish","format-standard","hentry","category-hardware","category-software","category-software-development","tag-high-performance-computing","tag-intel","tag-nvidia","tag-tesla","tag-xeon-phi"],"_links":{"self":[{"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/posts\/7250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/comments?post=7250"}],"version-history":[{"count":0,"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/posts\/7250\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/media?parent=7250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/categories?post=7250"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.itwriting.com\/blog\/wp-json\/wp\/v2\/tags?post=7250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}