Tag Archives: azure

The case of the disappearing Azure AD application registration

Some time ago I wrote a simple web application which runs on Microsoft Azure and uses Azure Active Directory for authentication. The application is used constantly and has proved reliable; however yesterday it stopped working. A quick debug session showed that the problem was an Azure AD permissions error.

In order to use Azure AD, applications have to be registered in the Azure management portal. I use the old portal for this; I am not sure that the functionality exists in the new portal yet. There is a nice how-to here.

image

One of the elements in the registration is a key which has a maximum lifetime of 2 years:

image

My application was deployed about two years ago so I went to the portal to see if it had expired.

What I found surprised me. The application was not listed at all. It had disappeared.

Instead of simply obtaining a new key and updating my application config, I had to create a new application registration and update several keys in the config, which was an annoyance.

There is a wider point here, in the whole category of dealing with “things that expire”. Some time ago, Microsoft suffered an extended Azure outage because of an expired certificate. It is a shame that Microsoft insists on a maximum 2 year lifetime for this key but does not provide a check box for “alert me when this key is about to expire”, how difficult would that be?

Problems like this also mean that things which “just work” may not continue to do so. Of course a well organised enterprise setup can deal with this type of problem, but imagine, for example, the case of a small business with an application running on Azure where the developers have gone out of business, perhaps, or are no longer available. In fact the only code I needed to change was in web.config, but I can imagine it could take some time to figure out what to do and what to change.

Microsoft’s story continues: Windows down, cloud up in financials Oct-Dec 2015

Microsoft has reported its latest financial results, for the quarter ending December 31st 2015.

Here are the latest figures (see end of post for what is in the segments):

Quarter ending  December 31st 2015 vs quarter ending December 31st 2014, $millions

Segment Revenue Change Operating income Change
Productivity and Business Processes 6690 -132 6460 -528
Intelligent Cloud 6343 +302 4977 +272
More Personal Computing 12660 -622 3542 +528
Corporate and Other -1897 -2222 -1897 -1980

A few points to note.

Revenue is down: Revenue overall was $million 23.8, $million 2.67 down on the same quarter in 2014. This is because cloud revenue has increased by less than personal computing has declined. The segments are rather opaque. We have to look at Microsoft’s comments on its results to get a better picture of how the company’s business is changing.

Windows: Revenue down 5% “due primarily to lower phone and Windows revenue and negative impact from foreign currency”.

Windows 10: Not much said about this specifically, except that search revenue grew 21% overall, and “nearly 30% of search revenue in the month of December was driven by Windows 10 devices.” That enforced Cortana/Bing search integration is beginning to pay off.

Surface: Revenue up 29%, but not enough to offset a 49% decline in phone revenue.

Azure: Azure revenue grew 140%, compute usage doubled year on year, Azure SQL database usage increased by 5 times year on year.

Office 365: 59% growth in commercial seats.

Server products: Revenue is up 5% after allowing for currency movements.

Xbox: Xbox Live revenue is growing (up 30% year on year) but hardware revenue declined, by how much is undisclosed. Microsoft attributes this to “lower volumes of Xbox 360” which is lame considering that the shiny Xbox One is also available.

Further observations

This is a continuing story of cloud growth and consumer decline, with Microsoft’s traditional business market somewhere in between. The slow, or not so slow, death of Windows Phone is sad to see; Microsoft’s dismal handling of its Nokia acquisition is among its biggest mis-steps and hugely costly.

CEO Satya Nadella came from the server side of the business and seems to be shaping the company in that direction, if he had any choice.

Azure and Office 365 are its big success stories. Nadella said in the earnings call that “the enterprise cloud opportunity is massive, larger than any market we’ve ever participated in.”

A reminder of Microsoft’s segments:

Productivity and Business Processes: Office, both commercial and consumer, including retail sales, volume licenses, Office 365, Exchange, SharePoint, Skype for Business, Skype consumer, OneDrive, Outlook.com. Microsoft Dynamics including Dynamics CRM, Dynamics ERP, both online and on-premises sales.

Intelligent Cloud: Server products not mentioned above, including Windows server, SQL Server, Visual Studio, System Center, as well as Microsoft Azure.

More Personal Computing: What a daft name, more than what? Still, this includes Windows in all its non-server forms, Windows Phone both hardware and licenses, Surface hardware, gaming including Xbox, Xbox Live, and search advertising.

Microsoft financials July-Sept 2015: decline of Windows hits home, cloud rises

Microsoft has reported its financials for its first quarter. Making sense of these is harder than usual because the company has changed its segment breakdown (and the names are misleading). The new segments are as follows:

Productivity and Business Processes: Office, both commercial and consumer, including retail sales, volume licenses, Office 365, Exchange, SharePoint, Skype for Business, Skype consumer, OneDrive, Outlook.com. Microsoft Dynamics including Dynamics CRM, Dynamics ERP, both online and on-premises sales.

Intelligent Cloud: Server products not mentioned above, including Windows server, SQL Server, Visual Studio, System Center, as well as Microsoft Azure.

More Personal Computing: What a daft name, more than what? Still, this includes Windows in all its non-server forms, Windows Phone both hardware and licenses, Surface hardware, gaming including Xbox, Xbox Live, and search advertising.

Here are the latest figures:

Quarter ending  Sept 30th 2015 vs quarter ending Sept 30th 2014, $millions

Segment Revenue Change Operating income Change
Productivity and Business Processes 6306 -184 3105 -233
Intelligent Cloud 5892 +417 2400 +294
More Personal Computing 9381 -1855 1562 -57
Corporate and Other -1200 -1200 -1274 -55

A few points to note.

Death of Windows Phone: Microsoft acquired Nokia’s Devices and Services business in April 2014. In fiscal year 2015, according to Microsoft’s 10-Q report, the company “eliminated approximately 19,000 positions in fiscal year 2015, including approximately 13,000 professional and factory positions related to the Nokia Devices and Services business.” This was rationalisation following the acquisition; the real blow came a year later. “In June 2015, management approved a plan to restructure our phone business to better focus and align resources (the “Phone Hardware Restructuring Plan”), under which we will eliminate up to 7,800 positions in fiscal year 2016.”

Windows Phone is not quite dead, but Microsoft seems to have given up on the idea of competing with Android and iOS in the mainstream. Year on year, phone revenue is down 58%, Lumia units down from 9.3 million to 5.8 million, non-Lumia phones down from 42.9 million to 25.5 million. This is what happens when you tell the world you are giving up.

Windows: Revenue down 7% “driven by declines in the business and consumer PC markets”.

Surface: Revenue down by 26% because Surface Pro 3 launched in June 2014; this should pick up following the launch of new Surface hardware recently.

Cloud: Microsoft’s “Commercial cloud” comprises Office 365 Commercial, Azure and Dynamics CRM online. All are booming. Azure revenue and usage more than doubled year on year, with 121% revenue growth. In addition, Office 365 consumer subscribers increased by 3 million in the quarter, to 18.2 million, an increase of nearly 20%.

Server products: Revenue is up 6% thanks to “higher revenue from premium versions of Microsoft SQL Server, Windows Server, and System Center”

Xbox: Steady, with Live revenue up 17%, Minecraft adding 17% to game revenue, and hardware revenue down 17% because of Xbox 360 declining (and by implication, not being replaced by Xbox One, a worrying trend).

Further observations

Is Microsoft now facing permanent long (but slow) decline in Windows as a client or standalone operating system? It certainly looks that way. The last hope is that Windows 10 in laptop, tablet and hybrid forms wins some users over from Mac computers and iPad/Android tablets. Despite some progress, Microsoft still has work to do before Windows delivers the smooth appliance-like experience of competing tablets, so I do not regard this as likely. The app ecosystem is also a problem. Tablets need Universal Windows Platform (UWP) apps but developers can still target more Windows users with desktop apps, discouraging UWP development.

Microsoft is also busy removing the advantage of Windows by stepping up its first-party Mac, iOS and Android application development, though this makes sense as a way of promoting Office 365.

That leads on to the next question. If Windows continues to decline, can Microsoft still grow with Office 365 and Azure? Of course it is possible, and on these figures that strategy looks to be going reasonably well. That said, you can expect both Google to continue integrating Android and of course Chromebook with its rival cloud services. Apple today does not compete so much in the cloud, but may do in future. If the future Microsoft has to relying on third-party operating systems for user interaction it will be a long-term weakness.

Microsoft completes Visual Studio 2015

Microsoft has completed Visual Studio 2015, the latest version of its all-encompassing development tool. You can download it here. Today is also the release day for TypeScript 1.5 (a language which compiles to JavaScript)

image

Windows 10 is released in just 9 days, so all eyes will be on this and its new/old app platform – the Universal Windows Platform, based on the Windows Runtime, as found in Windows 8, but considerably revised so that developers can in theory write one app and run it on any Windows 10 device, from PC to tablet to phone to Xbox to HoloLens, and sell or distribute it from a unified Windows Store.

Microsoft CEO Satya Nadella recently confirmed that the Windows Store is a key part of the Windows 10 strategy:

Why then make all these changes to the Start Menu with Windows 10? It’s not because I just want to bring back the old. It’s because that’s the best way to improve the liquidity [of] our store. Windows 8 was great except that nobody discovered the store. In Windows 10, the store is right there and done in a tasteful way.

The Store is more visible in Windows 10 than in 8 because in Windows 10 there are no longer two separate environments (Metro and desktop), but only one (desktop). Windows Runtime apps run in desktop windows. This makes the experience a little worse for tablet users, but the advantage is that now desktop users are more likely to interact with the Store, and more likely to use the apps they install, since they run in a familiar environment.

Another key change is “Project Centennial”, which I wrote up for the Register here. This lets developers package desktop apps for delivery from the Store, using app virtualisation (based on an Enterprise product called App-V). If Microsoft gets this right, Project Centennial will be the preferred way to deliver most desktop apps, since it is both easier and safer for the user.

If the Store does take off (and if it does not, Windows 10 will in part have failed), then Visual Studio will be the key tool for created or repackaging apps for Windows.

Windows 10 is important, but so too is Azure, Microsoft’s cloud platform. Visual Studio has a key role here, too. Microsoft has an entire stack, including Windows as both operating system and development environment, Visual Studio for coding and testing, and Azure for hosting cloud applications. Since the early days of Azure, the development experience has improved, so that with a modest understanding of the ASP.NET MVC framework you can go from an idea to a working demo, hosted on Azure, that you can show customers, in a short space of time.

There is also a new Cloud Explorer in Visual Studio which lets you view Azure resources from the IDE.

image

Mobile is Microsoft’s weak point, but the the company has made efforts to support Android and iOS both through mobile service back-ends hosted on Azure, and by supporting various approaches to building cross-platform apps. Visual Studio 2015 includes Xamarin project types, though out of the box these just tell you to go and install Xamarin, which lets you build Android and iOS apps with C#, subject to a separate Xamarin subscription.

Another option is to use Microsoft’s new iOS tools to code in Visual Studio while targeting Apple’s mobile platform, though this does require a Mac running a remote agent.

There is also Visual Studio Tools for Apache Cordova, where you code in JavaScript and wrap the results as native apps for both Windows and mobile platforms.

Visual Studio comes with an Android emulator, based on Hyper-V, for debugging either Xamarin or Cordova apps. Xamarin also offers its own emulator and I am not sure how these compare.

In addition to the above, Visual Studio 2015 introduces C# 6.0, Visual Basic 12, the Roslyn compiler platform which enables new IDE features, and .NET Core which is an open source, cross-platform fork of the .NET Framework. Thanks to .NET Core, the latest version of ASP.NET runs on Mac and Linux as well as Windows.

Despite Microsoft’s new cross-platform focus, Visual Studio itself runs only on Windows. In a world of Mac-wielding developers that is a problem, so the company has come up with Visual Studio code, an editor with some IDE features that runs on Window, Mac and Linux. Other options for non-Windows developers are to run Windows in an emulator such as Parallels, or on a virtual machine hosted in the code (Azure has suitable pre-baked images with Visual Studio), or to use third-party tools.

Visual Studio is a critical product then, but is it really done? Although you can download the final product today, many parts are not available (Project Centennial) or still in beta (ASP.NET 5 is beta 5). This is a milestone though, and credit to the team for bringing it out in advance of Windows 10 (I recall some Windows releases where Visual Studio was still in preview on release day).

Why Windows Server is going Nano: think automation, Cloud OS

Yesterday Microsoft announced Windows Nano Server which is essentially an installation option that is even more stripped-down than Server Core. Server Core, introduced with Windows Server 2008, removed the GUI in order to make the OS lighter weight and more secure. It is particularly suitable for installations that do nothing more than run Hyper-V to host VMs. You want your Hyper-V host to be rock-solid and removing unnecessary clutter makes sense.

There was more to the strategy than that though, and it was at last week’s ChefConf in Santa Clara (attended by both Windows Server architect Jeffrey Snover and Azure CTO Mark Russinovich) that the pieces fell into place for me. Here are two key areas which Snover has worked on over the last 16 years or so (he joined Microsoft in 1999):

  • PowerShell, first announced as “Monad” in August 2002 and presented at the PDC conference in September 2003. Originally presented as a scripting platform, it is now described as an “automation engine”, though it is still pretty good for scripting.
  • Windows Server componentisation, that is, the ability to configure Windows Server by adding and removing components. Server Core was a sign of progress here, especially in the Server 2012 version where you can move seamlessly between Core and full Windows Server by adding or removing the various pieces. It is still not perfect, mainly because of dependencies that make you drag in more than you might really want when enabling a specific feature.
  • PowerShell Desired State Configuration, introduced in Server 2012 R2, which puts these together by letting you define the state of a server in a declarative configuration file and apply it to an OS instance.

I am not sure how much of this strategy was in Snover’s mind when he came up with PowerShell, but today it looks far-sighted. The role of a server OS has changed since Windows first entered this market, with Windows NT in 1993. Today, when most server instances are virtual, the focus is on efficiency (making maximum use of the hardware) and agility (quick configuration and on-demand scaling). How is that achieved? Two things:

1. For efficiency, you want an OS that runs only what is necessary to run the applications it is hosting, and on the hypervisor side, the ability to load the right number of VMs to make maximum use of the hardware.

2. For agility, you want fully automated server deployment and configuration. We take this for granted in cloud platforms such as Amazon Web Services and Azure, in that you can run up a new server instance in a few minutes. However, there is still manual configuration on the server once launched. Azure web apps (formerly web sites) are better: you just upload your application. Better still, you can scale it by adding or removing instances with a script or through the web-based management portal. Web apps are limited though and for more complex applications you may need full access to the server. Greater ability to automate the server means that the web app experience can become the norm for a wider range of applications.

Nano Server is more efficient. Look at these stats (compared to full Server):

  • 93 percent lower VHD size
  • 92 percent fewer critical bulletins
  • 80 percent fewer reboots

Microsoft has removed not only the GUI, but also 32-bit support and MSI (I presume the Windows Installer services). Nano Server is designed to work well both sides of the hypervisor, either hosting Hyper-V or itself running in a VM.

Microsoft has also improved automation:

All management is performed remotely via WMI and PowerShell. We are also adding Windows Server Roles and Features using Features on Demand and DISM. We are improving remote manageability via PowerShell with Desired State Configuration as well as remote file transfer, remote script authoring and remote debugging.

Returning for a moment to ChefConf, the DevOps concept is that you define the configuration of your application infrastructure in code, as well as that for the application itself. Deployment can then be automated. Or you could use the container concept to build your application as a deployable package that has no dependencies other than a suitable host – this is where Microsoft’s other announcement from yesterday comes in, Hyper-V Containers which provide a high level of isolation without quite being a full VM. Or the already-announced Windows Server Containers which are similar but a bit less isolated.

image

This is the right direction for Windows Server though the detail to be revealed at the Build and Ignite conferences in a few weeks time will no doubt show limitations.

A bigger issue though is whether the Windows Server ecosystem is ready to adapt. I spoke to an attendee at ChefConf who told me his Windows servers were more troublesome than Linux,. Do you use Server Core I asked? No he said, we like to be able to log on to the GUI. It is hard to change the culture so that running a GUI on the server is no longer the norm. The same applies to third-party applications: what will be the requirements if you want to install on Nano Server (no MSI)? Even if Microsoft has this right, it will take a while for its users to catch up.

Reserved IPs and other Microsoft Azure annoyances

I have been doing a little work with Microsoft’s Azure platform recently. A common requirement is that you want a VM which is internet-accessible with a custom domain, for which the best solution is to create a A record in your DNS pointing to the IP number of the VM. In order to do this reliably, you need to reserve an IP number for the VM; otherwise Azure may assign a different IP number if you shut it down and later restart it. If you keep it running you can keep the IP number, but this also means you are have to pay for the VM continuously.

Azure now offers reserved IP numbers. Useful; but note that you can only link a VM with a reserved IP number when it is created, and to do this you have to create the VM with PowerShell.

What if you want to assign a reserved IP number to an existing VM? One suggestion is that you can capture an image from the VM, and then create a new VM from the image, complete with reserved IP. I went partially down this route but came unstuck because Azure for some reason captured the image into a different region (West Europe) than the region where the VM used to be (North Europe). When I ran the magic PowerShell script, it complained that the image was in the wrong region. I then found a post explaining how to move images between regions, which I did, but the metadata of the moved image was not quite the same and creating a new VM from the image did not work. At this point I realised that it would  be easier to recreate the VM from scratch.

Note that when reserved IP number were announced in May 2014, program manager Mahesh Thiagarajan said:

The platform doesn’t support reserving the IP address of the existing Cloud Services or Virtual machines. We expect to announce support for this in the near future.

You can debate what is meant by “near future” and whether Microsoft has already failed this expectation.

There is another wrinkle here that I am not clear about. Some Azure VMs have special pricing, such as those with SQL Server pre-installed. The special pricing is substantial, often forming the largest part of the price, since it includes licensing fees. What happens to the special pricing if you fiddle with cloning VMs, creating new VMs with existing VHDs, moving VMs between regions, or the like? If the special pricing is somehow lost, how do you restore it so SQL Server (for example) is still properly licensed? I imagine this would mean a call to support. I have not seen any documentation relating to this in posts like this about moving a virtual machine into a virtual network.

And there’s another thing. If you want your VM to be in a virtual network, you have to do that when you create it as well; it is a similar problem.

While I am in complaining mode, here is another. Creating a VM with PowerShell is easy enough, but you do need to know the image name you are using. This is not shown in the friendly portal GUI:

image

In order to get the image names, I ran a PowerShell script that exports the available images to a file. I was surprised how many there are: the resulting output has around 13,500 lines and finding what you want is tedious.

Azure is mostly very good in my experience, but I would like to see these annoyances fixed. I would be interested to hear of other things that make the cloud admin or developer’s life harder than it should be.

SSD storage has come to Azure VMs, along with faster Azure SQL

Microsoft has introduced SSD storage for Azure VMs. This is a catch-up with Amazon which has been offering this at least since June 2014. It is an important feature though, and now in preview. The SSDs are part of the Azure storage service but can only be used for disks attached to VMs, not for general-purpose block files. There are three virtual disks available:

  P10 P20 P30
Disk size 128GB 512GB 1TB
IOPS 500 2300 5000
Throughput 100 MB/s 150 MB/s 200 MB/s

Price is $6.90 per 100GB per month, which if I am reading this right is less than Amazon’s $0.10 per GB per month ($10 per 100GB) as shown here.

One obvious use case is for SQL Server running on a VM. This generally performs better than Microsoft’s Azure SQL database service. That said, Microsoft is also previewing an improved Azure SQL which supports most of the features of SQL Server 2014, including .NET stored procedures and in-memory columnstore queries. Microsoft’s Scott Guthrie says performance is better:

Our internal benchmark tests (using over 600 million rows of data) show query performance improvements of around 5x with today’s preview relative to our existing Premium Tier SQL Database offering and up to 100x performance improvements when using the new In-memory columnstore technology.

If you can make it work, Azure SQL is better sense than running SQL Server in a VM with all the hassles of server patching and of course Microsoft’s licensing fees; but the performance has to be there. Another factor which drives users to the VM option is that SQL Reporting Service is not available in Azure SQL.

Quick reflections on Amazon re:Invent, open source, and Amazon Web Services

Last week I was in Las Vegas for my first visit to Amazon’s annual developer conference re:Invent. There were several announcements, the biggest being a new relational database service called RDS Aurora – a drop-in replacement for MySQL but with 3x write performance and 5x read performance as well as resiliency benefits – and EC2 Container Service, for deploying and managing Docker app containers. There is also AWS Lambda, a service which runs code in response to events.

You could read this news anywhere, but the advantage of being in Vegas was to immerse myself in the AWS culture and get to know the company better. Amazon is both distinctive and disruptive, and threes things that its retail operation and its web services have in common are large scale, commodity pricing, and customer focus.

Customer focus? Every company I have ever spoken to says it is customer focused, so what is different? Well, part of the press training at Amazon seems to be that when you ask about its future plans, the invariable answer is “what customers demand.” No doubt if you could eavesdrop at an Amazon executive meeting you would find that this is not entirely true, that there are matters of strategy and profitability which come into play, but this is the story the company wants us to hear. It also chimes with that of the retail operation, where customer service is generally excellent; the company would rather risk giving a refund or replacement to an undeserving customer and annoy its suppliers than vice versa. In the context of AWS this means something a bit different, but it does seem to me part of the company culture. “If enough customers keep asking for something, it’s very likely that we will respond to that,” marketing executive Paul Duffy told me.

That said, I would not describe Amazon as an especially open company, which is one reason I was glad to attend re:Invent. I was intrigued for example that Aurora is a drop-in replacement for an open source product, and wondered if it actually uses any of the MySQL code, though it seems unlikely since MySQL’s GPL license would require Amazon to publish its own code if it used any MySQL code; that said, the InnoDB storage engine code at least used to be available under a dual license so it is possible. When I asked Duffy though he said:

We don’t … at that level, that’s why we say it is compatible with MySQL. If you run the MySQL compatibility tool that will all check out. We don’t disclose anything about the inner workings of the service.

This of course touches on the issue of whether Amazon takes more from the open source community than it gives back.

image
Senior VP of AWS Andy Jassy

Someone asked Senior VP of AWS Andy Jassy, “what is your strategy of contributing to the open source ecosystem”, to which he replied:

We contribute to the open source ecosystem for many years. Zen, MySQL space, Linux space, we’re very active contributors, and will continue to do so in future.

That was it, that was the whole answer. Aurora, despite Duffy’s reticence, seems to be a completely new implementation of the MySQL API and builds on its success and popularity; could Amazon do more to share some of its breakthroughs with the open source community from which MySQL came? I think that is arguable; but Amazon is hard to hate since it tends to price so competitively.

Is Amazon worried about competition from Microsoft, Google, IBM or other cloud providers? I heard this question asked on several occasions, and the answer was generally along the lines that AWS is too busy to think about it. Again this is perhaps not the whole story, but it is true that AWS is growing fast and dominates the market to the extent that, say, Azure’s growth does not keep it awake at night. That said, you cannot accuse Amazon of complacency since it is adding new services and features at a high rate; 449 so far in 2014 according to VP and Distinguished Engineer James Hamilton, who also mentioned 99% usage growth in EC2 year on year, over 1,000,000 active customers, and 132% data transfer growth in the S3 storage service.

Cloud thinking

Hamilton’s session on AWS Innovation at Scale was among the most compelling of those I attended. His theme was that cloud computing is not just a bunch of hosted servers and services, but a new model of computing that enables new and better ways to run applications that are fast, resilient and scalable. Aurora is actually an example of this. Amazon has separated the storage engine from the relational engine, he explained, so that only deltas (the bits that have changed) are passed down for storage. The data is replicated 6 times across three Amazon availability zones, making it exceptionally resilient. You could not implement Aurora on-premises; only a cloud provider with huge scale can do it, according to Hamilton.

image
Distinguished Engineer James Hamilton

Hamilton was fascinating on the subject of networking gear – the cards, switches and routers that push bits across the network. Five years ago Amazon decided to build its own, partly because it considered the commercial products to be too expensive. Amazon developed its own custom network protocol stack. It worked out a lot cheaper, he said, since “even the support contract for networking gear was running into 10s of millions of dollars.” The company also found that reliability increased. Why was that? Hamilton quipped about how enterprise networking products evolve:

Enterprise customers give lots of complicated requirements to networking equipment producers who aggregate all these complicated requirements into 10s of billions of lines of code that can’t be maintained and that’s what gets delivered.

Amazon knew its own requirements and built for those alone. “Our gear is more reliable because we took on an easier problem,” he said.

AWS is also in a great position to analyse performance. It runs so much kit that it can see patterns of failure and where the bottlenecks lie. “We love metrics,” he said. There is an analogy with the way the popularity of Google search improves Google search; it is a virtuous circle that is hard for competitors can replicate.

Closing reflections

Like all vendor-specific conferences there was more marketing that I would have liked at re:Invent, but there is no doubting the excellence of the platform and its power to disrupt. There are aspects of public cloud that remain unsettling; things can go wrong and there will be nothing you can do but wait for them to be fixed. The benefits though are so great that it is worth the risk – though I would always advocate having some sort of plan B and off-cloud (or backup with another cloud provider) if that is feasible.

Microsoft’s Azure outage: a troubling account of what went wrong

Microsoft’s Jason Zander has published an account of what went wrong yesterday, causing failure of many Azure services for a number of hours. The incident is described as running from 0.51 AM to 11.45 AM on November 19th though the actual length of the outage varied; an Azure application which I developed was offline for 3.5 hours.

Customers are not happy. From the comments:

So much for traffic manager for our VM’s running SQL server in a high availability SQL cluster $6k per month if every data center goes down. We were off for 3 hrs during the worst time of day for us; invoicing and loading for 10,000 deliveries. CEO is wanting to pull us out of the cloud.

So what went wrong? It was a bug in an update to the Storage Service, which impacts other services such as VMs and web sites since they have a dependency on the Storage Service. The update was already in production but only for Azure Tables; this seems to have given the team the confidence to deploy the update generally but a bug in the Blob service caused it to loop and stop responding.

Here is the most troubling line in Zander’s report:

Unfortunately the issue was wide spread, since the update was made across most regions in a short period of time due to operational error, instead of following the standard protocol of applying production changes in incremental batches.

In other words, this was not just a programming error, it was an operational error that meant the usual safeguards whereby a service in one datacenter takes over when another fails did not work.

Then there is the issue of communication. This is critical since while customers understand that sometimes things go wrong, they feel happier if they know what is going on. It is partly human nature, and partly a matter of knowing what mitigating action you need to take.

In this case Azure’s Service Health Dashboard failed:

There was an Azure infrastructure issue that impacted our ability to provide timely updates via the Service Health Dashboard. As a mitigation, we leveraged Twitter and other social media forums.

This is an issue I see often; online status dashboards are great for telling you all is well, but when something goes wrong they are the first thing to fall over, or else fail to report the problem. In consequence users all pick up the phone simultaneously and cannot get through. Twitter is no substitute; frankly if my business were paying thousands every month to Microsoft for Azure services I would find it laughable to be referred to Twitter in the event of a major service interruption.

Zander also says that customers were unable to create support cases. Hmm, it does seem to me that Microsoft should isolate its support services from its production services in some way so that both do not fail at once.

Still, of the above it is the operational error that is of most concern.

What are the wider implications? There are two takes on this. One is to say that since Azure is not reliable try another public cloud, probably Amazon Web Services. My sense is that the number and severity of AWS outages has reduced over the years. Inherently though, it is always possible that human error or a hardware failure can have a cascading effect; there is no guarantee that AWS will not have its own equally severe outage in future.

The other take is to give up on the cloud, or at least build in a plan B in the event of failure. Hybrid cloud has its merits in this respect.

My view in general though is that cloud reliability will get better and that its benefits exceed the risk – though when I heard last week, at Amazon Re:Invent, of large companies moving their entire datacenter infrastructure to AWS I did think to myself, “that’s brave”.

Finally, for the most critical services it does make sense to spread them across multiple public clouds (if you cannot fallback to on-premises). It should not be necessary, but it is.

An Azure Web Site is a VM which supports multiple applications

This will be unnecessary for Azure experts, but I have seen some misunderstanding on this point, hence this post.

A “web site” is a unit of service on the Azure cloud platform which represents a web application hosted on IIS, Microsoft’s web server (but see below). You write a standard ASP.NET application and deploy it. Azure takes care of configuring the host VM, the server operating system, and IIS.

Using a web site is preferable to creating your own VM and installing IIS on it, for several reasons. One is that you do not have to worry about patching and maintaining the operating system. Another is that web sites can be scaled, manually or automatically, with an option for scheduling so that you can scale down the site for periods of low demand.

image

The main reason for using a VM rather than a web site is if the app has dependencies that fall outside what a web site can handle.

Another thing to know about Azure web sites is that they have four “plan modes,” but only two are worth considering for production. The Free and Shared modes host your application on a shared VM, and quotas are applied. If Azure decides your site is out of quota, it will stop responding. Fine for a prototype, but not something you want customers or users to see. This feature is not shown clearly on the table of features but it is in note 2:

Shared Instance: Free and Shared (Preview) tiers include 60 minutes and 240 minutes of CPU capacity per day, respectively. The Shared (Preview) Website rates are applied per website instance.

The Basic tier on the other hand is decent. It is a dedicated VM, and you can scale it (manually) to 3 instances. It costs around 25% less than a Standard tier site.

Why go Standard? You get 50B storage thrown in (a Basic tier site has 10GB), auto-backup, auto-scale up to 10 instances, and a fixed IP address for SSL. If you have to buy a fixed IP address for a single instance Basic tier site, the price goes above a Standard tier site, except for a Large instance.

Currently a Basic tier web site costs from £35.64 to £141.92 per month, and a Standard tier from £47.10 to £189.65, depending on the size of the VM.

It is a significant cost, but what may not be obvious is that you can deploy multiple applications to a single web site, which makes my statement above, “A ‘web site’ is a unit of service on the Azure cloud platform which represents a web application hosted on IIS”, not quite correct.

When you create a new web site, if you have one already, you can choose a “web hosting plan”. Here is an example:

image

In this case, there are two pre-existing web site VMs, one in East Asia and one in Europe. If you choose one of these two, the new web site will be added to that VM. If you choose “Create new web hosting plan”, you will create a new dedicated instance (or free, or Shared). Adding to an existing VM means no extra cost.

If you are a developer, it may well be better to run a single Basic VM for prototyping, and add multiple sites, rather than risking a free or shared instance which might be out of quota when you demonstrate it to your customer.

What is the limit to the number of web sites you can add? There is none, other than the overloading the VM and getting unresponsive applications.

Postscript: the Web Site service is interesting as an example which blurs the boundaries between IaaS (Infrastructure as a service) and PaaS (Platform as a service). It is more PaaS than IaaS, in that you do not have to worry about maintaining the OS, but more IaaS than PaaS, in that you are still having to think about individual VMs. It would be more purist if Microsoft abstracted away the VMs and simply guaranteed a certain level of service, or scaled up automatically and billed for what you use. On the other hand, the Web Site concept puts a lot of control in the hands of the developer/admin and help them to make best use of the resources, while still removing most of the maintenance burden. I think it is a good compromise.