Tag Archives: intel

The mysterious microcode: Intel is issuing updates for all its CPUs from the last five years but you might not benefit

The Spectre and Meltdown security holes found in Intel and to a lesser extend AMD CPUs is not only one of the most serious, but also one of the most confusing tech issues that I can recall.

We are all used to the idea of patching to fix security holes, but normally that is all you need to do. Run Windows Update, or on Linux apt-get update, apt-get upgrade, and you are done.

This one is not like that. The reason is that you need to update the firmware; that is, the low-level software that drives the CPU. Intel calls this microcode.

So when Intel CEO Brian Krzanich says:

By Jan. 15, we will have issued updates for at least 90 percent of Intel CPUs introduced in the past five years, with updates for the remainder of these CPUs available by the end of January. We will then focus on issuing updates for older products as prioritized by our customers.

what he means is that Intel has issued new microcode for those CPUs, to mitigate against the newly discovered security holes, related to speculative execution (CPUs getting a performance gain by making calculations ahead of time and throwing them away if you don’t use them).

Intel’s customer are not you and I, the users, but rather the companies who purchase CPUs, which in most cases are the big PC manufacturers together with numerous device manufacturers. My Synology NAS has an Intel CPU, for example.

So if you have a PC or server from Vendor A, then when Intel has new microcode it is available to Vendor A. How it gets to your PC or server which you bought from Vendor A is another matter.

There are several ways this can happen. One is that the manufacturer can issue a BIOS update. This is the normal approach, but it does mean that you have to wait for that update, find it and apply it. Unlike Windows patches, BIOS updates do not come down via Windows update, but have to be applied via another route, normally a utility supplied by the manufacturer. There are thousands of different PC models and there is no guarantee that any specific model will receive an updated BIOS and no guarantee that all users will find and apply it even if they do. You have better chances if your PC is from a big name rather than one with a brand nobody has heard of, that you bought from a supermarket or on eBay.

Are there other ways to apply the microcode? Yes. If you are technical you might be able to hack the BIOS, but leaving that aside, some operating systems can apply new microcode on boot. Therefore VMWare was able to state:

The ESXi patches for this mitigation will include all available microcode patches at the time of release and the appropriate one will be applied automatically if the system firmware has not already done so.

Linux can do this as well. Such updates are volatile; they have to be re-applied on every boot. But there is little harm in that.

What about Windows? Unfortunately there is no supported way to do this. However there is a VMWare experimental utility that will do it:

This Fling is a Windows driver that can be used to update the microcode on a computer system’s central processor(s) (“CPU”). This type of update is most commonly performed by a system’s firmware (“BIOS”). However, if a newer BIOS cannot be obtained from a system vendor then this driver can be a potential substitute.

Check the comments – interest in this utility has jumped following the publicity around spectre/meltdown. If working exploits start circulating you can expect that interest to spike further.

This is a techie and unsupported solution though and comes with a health warning. Most users will never find it or use it.

That said, there is no inherent reason why Microsoft could not come up with a similar solution for PCs and servers for which no BIOS update is available, and even deliver it through Windows Update. If users do start to suffer widespread security problems which require Intel’s new microcode, it would not surprise me if something appears. If it does not, large numbers of PCs will remain unprotected.

Why patching to protect against Spectre and Meltdown is challenging

The tech world has been buzzing with news of bugs (or design flaws, take your pick) in mainly Intel CPUs, going way back, which enables malware to access memory in the computer that should be inaccessible.

How do you protect against this risk? The industry has done a poor job in communicating what users (or even system admins) should do.

A key reason why this problem is so serious is that it risks a nightmare scenario for public cloud vendors, or any hosting company. This is where software running in a virtual machine is able to access memory, and potentially introduce malware, in either the host server or other virtual machines running on the same server. The nature of public cloud is that anyone can run up a virtual machine and do what they will, so protecting against this issue is essential. The biggest providers, including AWS, Microsoft and Google, appear to have moved quickly to protect their public cloud platforms. For example:

The majority of Azure infrastructure has already been updated to address this vulnerability. Some aspects of Azure are still being updated and require a reboot of customer VMs for the security update to take effect. Many of you have received notification in recent weeks of a planned maintenance on Azure and have already rebooted your VMs to apply the fix, and no further action by you is required.

With the public disclosure of the security vulnerability today, we are accelerating the planned maintenance timing and will begin automatically rebooting the remaining impacted VMs starting at 3:30pm PST on January 3, 2018. The self-service maintenance window that was available for some customers has now ended, in order to begin this accelerated update.

Note that this fix is at the hypervisor, host level. It does not patch your VMs on Azure. So do you also need to patch your VM? Yes, you should; and your client PCs as well. For example, KB4056890 (for Windows Server 2016 and Windows 10 1607), or KB4056891 for Windows 10 1703, or KB4056892. This is where it gets complex though, for two main reasons:

1. The update will not be applied unless your antivirus vendor has set a special registry key. The reason is that the update may crash your computer if the antivirus software accesses memory is a certain way, which it may do. So you have to wait for your antivirus vendor to do this, or remove your third-party anti-virus and use the built-in Windows Defender.

2. The software patch is not complete protection. You also need to update your BIOS, if an update is available. Whether or not it is available may be uncertain. For example, I am pretty sure that I found the right update for my HP PC, based on the following clues:

– The update was released on December 20 2017

– The description of the update is “Provides improved security”

image

So now is the time, if you have not done so already, to go to the support sites for your servers and PCs, or motherboard vendor if you assembled your own, see if there is a BIOS update, try to figure out it it addresses Spectre and Meltdown, and apply it.

If you cannot find an update, you are not fully protected.

It is not an easy process and realistically many PCs will never be updated, especially older ones.

What is most disappointing is the lack of clarity or alerts from vendors about the problem. I visited the HPE support site yesterday in the hope of finding up to date information on HP’s server patches,  to find only a maze of twist little link passages, all alike, none of which led to the information I sought. The only thing you can do is to trace the driver downloads for your server in the hope of finding a BIOS update.

Common sense suggests that PCs and laptops will be a bigger risk than private servers, since unlike public cloud vendors you do not allow anyone out there to run up VMs.

At this point it is hard to tell how big a problem this will be. Best practice though suggests updating all your PCs and servers immediately, as well as checking that your hosting company has done the same. In this particular case, achieving this is challenging.

PS kudos to BleepingComputer for this nice article and links; the kind of practical help that hard-pressed users and admins need.

There is also a great list of fixes and mitigations for various platforms here:

https://github.com/hannob/meltdownspectre-patches

PPS see also Microsoft’s guidance on patching servers here:

https://support.microsoft.com/en-us/help/4072698/windows-server-guidance-to-protect-against-the-speculative-execution

and PCs here:

https://support.microsoft.com/en-us/help/4073119/protect-against-speculative-execution-side-channel-vulnerabilities-in

There is a handy PowerShell script called speculationcontrol which you can install and run to check status. I was able to confirm that the HP bios update mentioned above is the right one. Just run PowerShell with admin rights and type:

install-module speculationcontrol

then type

get-speculationcontrolsettings

image

Thanks to @teroalhonen on Twitter for the tip.

Imagination at Mobile World Congress 2015: what is the strategy?

At MWC earlier this month I met with Imagination, best known for its PowerVR video design but also now the owner of the MIPS CPU. Apple is a shareholder and uses Imagination video technology in the iPhone and iPad. This market is highly competitive though, especially since ARM has its own Mali GPU. “You need complete platforms, you need a processor,” Tony King-Smith, executive VP of Technology Marketing, told me. “All the markets that matter to us are integrating towards a single chip. For a single chip you need some mix of central processing, communications, and multimedia.”

MIPS is a supported CPU for Android 2.3 or higher but most Android devices run ARM or Intel CPUs. Why no MIPS devices at MWC?

“There is one and a half to two years between a licensee picking up the IP, and delivering silicon based on it,” an Imagination’s spokesperson said. “We are engaged with customers but until something shows up we cannot disclose any names. Next year we are going to see some progress and potentially something I can show you.” Watch this space then.

What is Imagination’s strategy overall? King-Smith told me that the company is well placed to satisfy the need for optimisation and differentiation in an increasingly mature mobile market. It is also eyeing the IoT (Internet of things) space with interest. “Wearables need completely new architectures,” said King-Smith. “Not just tweaking a mobile chip. That’s where we’re going.”    

I was also interested to see a real demo of Vulkan, the successor to OpenGL, on the Imagination stand, based on the preliminary specification. “It will enable people to make more use of our platform”, said King-Smith, because of the lower level access it offers to the GPU.

image_thumb[10]

For more on Vulkan see this piece on the Reg.

What about the Creator board which Imagination has released, a low-priced starter kit along the lines of Raspberry Pi but of course with MIPS and more powerful graphics? It is an effort to build the ecosystem, said King-Smith. “It is a means for us to deliver our IP and make it easier for developers to engage with us. We also want to enable start-ups and new solutions.” It is primarily for developing and testing ideas, then, but if you want to go into production with it, that is fine too. “That board has been designed to ramp in volume,” King-Smith told me.

Here comes Steam Machine: a quick look

At CES in Las Vegas I got a first look at Valve’s Steam Machine. This Brix Pro model comes from Gigabyte and will cost $499 for bare bones – no RAM or storage.

I was surprised by how small the thing is – quite cheap-looking in fact, especially when compared to something like Microsoft’s Xbox One which is large and sleek, and costs a similar amount (though smaller is good in most ways).

image

Next to it you can see the controller, which gives you an idea of the scale.

Ports on the back are hdmi, DisplayPort (better quality), Ethernet and 2 more USB 3.0.

image

and on the front, two USB and an audio socket (supports digital as well as analogue).

image

Power is on top.

What counts though is the spec. Core i5 4570R (an i7 is also available), Intel Iris Pro 5200 graphics with 128MB ED RAM, wifi included. Max RAM is 16GB. It’s going to cost at least $100 –$150 extra to make it a working box.

Intel showed the Brix Pro driving a large display at 1080p.

image

However, I was told that the little box has enough power to drive a 4K display as well as a second display at 1080p. In principle, you could have a Steam Machine with 3 4K displays for the perfect setup; Intel said that its Iris Pro 5200 is capable of this though not in this particular configuration.

Running Linux (SteamOS) and tapping into the huge Steam community and app store, Steam Machine is one to watch.

China’s Tianhe-2 Supercomputer takes top ranking, a win for Intel vs Nvidia

The International Supercomputing Conference (ISC) is under way in Leipzig, and one of the announcements is that China’s Tianhe-2 is now the world’s fastest supercomputer according to the Top 500 list.

This has some personal interest for me, as I visited its predecessor Tianhe-1A in December 2011, on a press briefing organised by NVidia which was, I guess, on a diplomatic mission to promote Tesla, the GPU accelerator boards used in Tianhe-1A (which was itself the world’s fastest supercomputer for a period).

It appears that the mission failed, insofar as Tianhe-2 uses Intel Phi accelerator boards rather than Nvidia Tesla.

Tianhe-2 has 16,000 nodes, each with two Intel Xeon IvyBridge processors and three Xeon Phi processors for a combined total of 3,120,000 computing cores.

says the press release. Previously, the world’s fastest was the US Titan, which does use NVidia GPUs.

Nvidia has reason to worry. Tesla boards are present on 39 of the top 500, whereas Xeon Phi is only on 11, but it has not been out for long and is growing fast. A newly published paper shows Xeon Phi besting Tesla on sparse matrix-vector multiplication:

we demonstrate that our implementation is 3.52x and 1.32x faster, respectively, than the best available implementations on dual IntelR XeonR Processor E5-2680 and the NVIDIA Tesla K20X architecture.

In addition, Intel has just announced the successor to Xeon Phi, codenamed Knight’s Landing. Knight’s Landing can function as the host CPU as well as an accelerator board, and has integrated on-package memory to reduce data transfer bottlenecks.

Nvidia does not agree that Xeon Phi is faster:

The Tesla K20X is about 50% faster in Linpack performance, and in terms of real application performance we’re seeing from 2x to 5x faster performance using K20X versus Xeon Phi accelerator.

says the company’s Roy Kim, Tesla product manager. The truth I suspect is that it depends on the type of workload and I would welcome more detail on this.

It is also worth noting that Tianhe-2 does not better Titan on power/performance ratio.

  • Tianhe-2: 3,120,00 cores, 1,024,000 GB Memory, Linpack perf 33,862.7 TFlop/s, Power 17,808 kW.
  • Titan: 560,640 cores, 710,144 GB Memory, Linpack perf 17,590 TFlop/s, Power 8,209 kW.

Intel fights back against iOS with free tools for HTML5 cross-platform mobile development

Today at its Software Conference in Paris Intel presented its HTML5 development tools.

image

There are several components, starting with the XDK, a cross-platform development kit based on HTML5, CSS and JavaScript designed to be packaged as mobile apps using Cordova, the open source variant of PhoneGap.

There is an intriguing comment here:

The XDK is fully compatible with the PhoneGap HTML5 cross platform development project, providing many features that are missing from the open source project.

PhoneGap is Adobe’s commercial variant of Cordova. It looks as if Intel is doing its own implementation of features which are in PhoneGap but not Cordova, which might not please Adobe. Apparently code that Intel adds will be fed back into Cordova in due course.

Intel has its own JavaScript app framework, formerly called jqMobi and now called Intel’s App Framework. This is an open source framework hosted on Github.

There are also developer tools which run as an extension to Google Chrome, and a cloud-based build service which targets the following platforms:

  • Apple App Store
  • Google Play
  • Nook Store
  • Amazon Appstore for Android
  • Windows 8 Store
  • Windows Phone 8

And web applications:

  • Facebook
  • Intel AppUp
  • Chrome Store
  • Self-hosted

The build service lets you compile and deploy for these platforms without requiring a local install of the various mobile SDKs. It is free and according to Intel’s Thomas Zipplies there are no plans to charge in future. The build service is Intel’s own, and not related to Adobe’s PhoneGap Build, other than the fact that both share common source in Cordova. This also is unlikely to please Adobe.

You can start a new app in the browser, using a wizard.

image

Intel also has an iOS to HTML5 porting tool in beta, called the App Porter Tool. The aim is to convert Objective C to JavaScript automatically, and while the tool will not convert all the code successfully it should be able to port most of it, reducing the overall porting effort.

Given that the XDK supports Windows 8 modern apps and Windows Phone 8, this is also a route to porting from iOS to those platforms.

Why is Intel doing this, especially on a non-commercial basis? According to Zipplies, it is a reaction to “walled garden” development platforms, which while not specified must include Apple iOS and to some extent Google Android.

Note that both iOS and almost all Android devices run on ARM, so another way of looking at this is that Intel would rather have developers work on cross-platform apps than have them develop exclusively for ARM devices.

Zipplies also says that Intel can optimise the libraries in the XDK to improve performance on its processors.

You can access the HTML5 development tools here.

Intel Xeon Phi shines vs NVidia GPU accelerators in Ohio State University tests

Which is better for massively parallel computing, a GPU accelerator board from NVidia, or Intel’s new Xeon Phi? On the eve of NVidia’s GPU Technology Conference comes a paper which Intel will enjoy. Erik Sauley, Kamer Kayay, and Umit V. C atalyurek from the Ohio State University have issued a paper with performance comparisons between Xeon Phi, NVIDIA Tesla C2050 and NVIDIA Tesla K20. The K20 has 2,496 CUDA cores, versus a mere 61 processor cores on the Xeon Phi, yet on the particular calculations under test the researchers got generally better performance from Xeon Phi.

In the case of sparse-matrix vector multiplication (SpMV):

For GPU architectures, the K20 card is typically faster than the C2050 card. It performs better for 18 of the 22 instances. It obtains between 4.9 and 13.2GFlop/s and the highest performance on 9 of the instances. Xeon Phi reaches the highest performance on 12 of the instances and it is the only architecture which can obtain more than 15GFlop/s.

and in the case of sparse-matrix matrix multiplication (SpMM):

The K20 GPU is often more than twice faster than C2050, which is much better compared with their relative performances in SpMV. The Xeon Phi coprocessor gets
the best performance in 14 instances where this number is 5 and 3 for the CPU and GPU configurations, respectively. Intel Xeon Phi is the only architecture which achieves more than 100GFlop/s.

Note that this is a limited test, and that the authors note that SpMV computation is known to be a difficult case for GPU computing:

the irregularity and sparsity of SpMV-like kernels create several problems for these architectures.

They also note that memory latency is the biggest factor slowing performance:

At last, for most instances, the SpMV kernel appears to be memory latency bound rather than memory bandwidth bound

It is difficult to compare like with like. The Xeon Phi implementation uses OpenMP, whereas the GPU implementation uses CuSparse. I would also be interested to know whether as much effort was made to optimise for the GPU as for the Xeon Phi.

Still, this is a real-world test that, if nothing else, demonstrates that in the right circumstances the smaller number of cores in a Xeon Phi do not prevent it comparing favourably against a GPU accelerator:

When compared with cutting-edge processors and accelerators, its SpMV, and especially SpMM, performance are superior thanks to its wide registers
and vectorization capabilities. We believe that Xeon Phi will gain more interest in HPC community in the near future.

Images of Eurora, the world’s greenest supercomputer

Yesterday I was in Bologna for the press launch of Eurora at Cineca, a non-profit consortium of universities and other public bodies. The claim is that Eurora is the world’s greenest supercomputer.

image

Eurora is a prototype deployment of Aurora Tigon, made by Eurotech. It is a hybrid supercomputer, with 128 CPUs supplemented by 128 NVidia Kepler K20 GPUs.

What makes it green? Of course, being new is good, as processor efficiency improves with every release, and “green-ness” is measured in floating point operations per watt. Eurora does 3150 Mflop/s per watt.

There is more though. Eurotech is a believer in water cooling, which is more efficient than air. Further, it is easier to do something useful with the hot water you generate than with hot air, such as generating energy.

Other factors include underclocking slightly, and supplying 48 volt DC power in order to avoid power conversion steps.

Eurora is composed of 64 nodes. Each node has a board with 2 Intel Xeon E5-2687W CPUs, an Altera Stratix V FPGA (Field Programmable Gate Array), an SSD drive, and RAM soldered to the board; apparently soldering the RAM is more efficient than using DIMMs.

image

Here is the FPGA:

image

and one of the Intel-confidential CPUs:

image

On top of this board goes a water-cooled metal block. This presses against the CPU and other components for efficient heat exchange. There is no fan.

Then on top of that go the K20 GPU accelerator boards. The design means that these can be changed for Intel Xeon Phi accelerator boards. Eurotech is neutral in the NVidia vs Intel accelerator wars.

image

Here you can see where the water enters and leaves the heatsink. When you plug a node into the rack, you connect it to the plumbing as well as the electrics.

image

Here are 8 nodes in a rack.

image

Under the floor is a whole lot more plumbing. This is inside the Aurora cabinet where pipes and wires rise from the floor.

image

Here is a look under the floor outside the cabinet.

image

while at the corner of the room is a sort of pump room that pumps the water, monitors the system, adds chemicals to prevent algae from growing, and no doubt a few other things.

image

The press was asked NOT to operate this big red switch:

image

I am not sure whether the switch we were not meant to operate is the upper red button, or the lower red lever. To be on the safe side, I left them both as-is.

So here is a thought. Apparently Eurora is 15 times more energy-efficient than a typical desktop. If the mobile revolution continues and we all use tablets, which also tend to be relatively energy-efficient, could we save substantial energy by using the cloud when we need more grunt (whether processing or video) than a tablet can provide?

Programming NVIDIA GPUs and Intel MIC with directives: OpenACC vs OpenMP

Last month I was at Intel’s software conference learning about Many Integrated Core (MIC), the company’s forthcoming accelerator card for HPC (High Performance Computing). This month I am in San Jose for NVIDIA’s GPU Technology Conference learning about the latest development in NVIDIA’s platform for accelerated massively parallel computing using GPU cards and the CUDA architecture. The approaches taken by NVIDIA and Intel have much in common – focus on power efficiency, many cores, accelerator boards with independent memory space controlled by the CPU – but also major differences. Intel’s boards have familiar x86 processors, whereas NVIDIA’s have GPUs which require developer to learn CUDA C or an equivalent such as OpenCL.

In order to simplify this, NVIDIA and partners Cray, CAPS and PGI announced OpenACC last year, a set of directives which when added to C/C++ code instruct the compiler to run code parallelised on the GPU, or potentially on other accelerators such as Intel MIC. The OpenACC folk have stated from the outset their hope and intention that OpenACC will converge with OpenMP, an existing standard for directives enabling shared memory parallelisation. OpenMP is not suitable for accelerators since these have their own memory space.

One thing that puzzled me though: Intel clearly stated at last month’s event that it would support OpenMP (not OpenACC) on MIC, due to go into production at the end of this year or early next. How can this be?

I took the opportunity here at NVIDIA’s conference to ask Duncan Poole, who is NVIDIA’s Senior Manager for High Performance Computing and also the President of OpenACC, about what is happening with these two standards. How can Intel implement OpenMP on MIC, if it is not suitable for accelerators?

“I think OpenMP in the form that’s being discussed inside of the sub-committee is suitable. There’s some debate about some of the specific features that continues. Also, in the OpenMP committee they’re trying to address the concerns of TI and IBM so it’s a broader discussion than just the Intel architecture. So OpenMP will be useful on this class of processor. What we needed to do is not wait for it. That standard, if we’re lucky it will be draft at the end of this year, and maybe a year later will be ratified. We want to unify this developer base now,” Poole told me.

How similar will this adapted OpenMP be to what OpenACC is now?

“It’s got the potential to be quite close. The guy that drafted OpenACC is the head of that sub-committee. There’ll probably be changes in keywords, but there’s also some things being proposed now that were not conceived of. So there’s good debate going on, and I expect that we’ll benefit from it.

“Some of the features for example that are shared by Kepler and MIC with respect to nested parallelism are very useful. Nested parallelism did not exist at the time that we started this work. So there’ll be an evolution that will happen and probably a logical convergence over time.

If OpenMP is not set to support acclerators until two years hence, what can Intel be doing with it?

“It will be a vendor implementation of a pre-release standard. Something like that,” said Poole, emphasising that he cannot speak for Intel. “To be complementary to Intel, they have some good ideas and it’s a good debate right now.”

Incidentally, I also asked Intel about OpenACC last month, and was told that the company has no plans to implement it on its compilers. OpenMP is the standard it supports.

The topic is significant, in that if a standard set of directives is supported across both Intel and NVIDIA’s HPC platforms, developers can easily port code from one to the other. You can do this today with OpenCL, but converting an application to use OpenCL to enhance performance is a great deal more effort than adding directives.