Tag Archives: cloud computing

AWS Summit London: cloud growth, understanding Lambda, Machine Learning

I attended the Amazon Web Services (AWS) London Summit. Not much news there, since the big announcements were the week before in San Francisco, but a chance to drill into some of the AWS services and keep up to date with the platform.

image

The keynote by CTO Werner Vogels was a bit too much relentless promotion for my taste, but I am interested in the idea he put forward that cloud computing will gradually take over from on-premises and that more and more organisations will go “all in” on Amazon’s cloud. He instanced some examples (Netflix, Intuit, Tibco, Splunk) though I am not quite clear whether these companies have 100% of their internal IT systems on AWS, or merely that they run the entirety of their services (their product) on AWS. The general argument is compelling, especially when you consider the number of services now on offer from AWS and the difficulty of replicating them on-premises (I wrote this up briefly on the Reg). I don’t swallow it wholesale though; you have to look at the costs carefully, but even more than security, the loss of control when you base your IT infrastructure on a public cloud provider is a negative factor.

As it happens, the ticket systems for my train into London were down that morning, which meant that purchasers of advance tickets online could not collect their tickets.

image

The consequences of this outage were not too serious, in that the trains still ran, but of course there were plenty of people travelling without tickets (I was one of them) and ticket checking was much reduced. I am not suggesting that this service runs on AWS (I have no idea) but it did get me thinking about the impact on business when applications fail; and that led me to the question: what are the long-term implications of our IT systems and even our economy becoming increasingly dependent on a (very) small number of companies for their health? It seems to me that the risks are difficult to assess, no matter how much respect we have for the AWS engineers.

I enjoyed the technical sessions more than the keynote. I attended Dean Bryen’s session on AWS Lambda, “Event-driven code in the cloud”, where I discovered that the scope of Lambda is greater than I had previously realised. Lambda lets you write code that runs in response to events, but what is also interesting is that it is a platform as a service offering, where you simply supply the code and AWS runs it for you:

AWS Lambda runs your custom code on a high-availability compute infrastructure and administers all of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code, and security patches.

This is a different model than running applications in EC2 (Elastic Compute Cloud) VMs or even in Docker containers, which are also VM based. Of course we know that Lambda ultimately runs in VMs as well, but these details are abstracted away and scaling is automatic, which arguably is a better model for cloud computing. Azure Cloud Services or Heroku apps are somewhat like this, but neither is very pure; with Azure Cloud Services you still have to worry about how many VMs you are using, and with Heroku you have to think about dynos (app containers). Google App Engine is another example and autoscales, though you are charged by application instance count so you still have to think in those terms. With Lambda you are charged based on the number of requests, the duration of your code, and the amount of memory allocated, making it perhaps the best abstracted of all these PaaS examples.

But Lambda is just for event-handing, right? Not quite; it now supports synchronous as well as asynchronous event handling and you could create large applications on the service if you chose. It is well suited to services for mobile applications, for example. Java support is on the way, as an alternative to the existing Node.js support. I will be interested to see how this evolves.

I also went along to Carlos Conde’s session on Amazon Machine Learning (one instance in which AWS has trailed Microsoft Azure, which already has a machine learning service). Machine learning is not that easy to explain in simple terms, but I thought Conde did a great job. He showed us a spreadsheet which was a simple database of contacts with fields for age, income, location, job and so on. There was also a Boolean field for whether they had purchased a certain financial product after it had been offered to them. The idea was to feed this spreadsheet to the machine learning service, and then to upload a similar table but of different contacts and without the last field. The job of the service was to predict whether or not each contact listed would purchase the product. The service returned results with this field populated along with a confidence indicator. A simple example with obvious practical benefit, presuming of course that the prediction has reasonable accuracy.

Reserved IPs and other Microsoft Azure annoyances

I have been doing a little work with Microsoft’s Azure platform recently. A common requirement is that you want a VM which is internet-accessible with a custom domain, for which the best solution is to create a A record in your DNS pointing to the IP number of the VM. In order to do this reliably, you need to reserve an IP number for the VM; otherwise Azure may assign a different IP number if you shut it down and later restart it. If you keep it running you can keep the IP number, but this also means you are have to pay for the VM continuously.

Azure now offers reserved IP numbers. Useful; but note that you can only link a VM with a reserved IP number when it is created, and to do this you have to create the VM with PowerShell.

What if you want to assign a reserved IP number to an existing VM? One suggestion is that you can capture an image from the VM, and then create a new VM from the image, complete with reserved IP. I went partially down this route but came unstuck because Azure for some reason captured the image into a different region (West Europe) than the region where the VM used to be (North Europe). When I ran the magic PowerShell script, it complained that the image was in the wrong region. I then found a post explaining how to move images between regions, which I did, but the metadata of the moved image was not quite the same and creating a new VM from the image did not work. At this point I realised that it would  be easier to recreate the VM from scratch.

Note that when reserved IP number were announced in May 2014, program manager Mahesh Thiagarajan said:

The platform doesn’t support reserving the IP address of the existing Cloud Services or Virtual machines. We expect to announce support for this in the near future.

You can debate what is meant by “near future” and whether Microsoft has already failed this expectation.

There is another wrinkle here that I am not clear about. Some Azure VMs have special pricing, such as those with SQL Server pre-installed. The special pricing is substantial, often forming the largest part of the price, since it includes licensing fees. What happens to the special pricing if you fiddle with cloning VMs, creating new VMs with existing VHDs, moving VMs between regions, or the like? If the special pricing is somehow lost, how do you restore it so SQL Server (for example) is still properly licensed? I imagine this would mean a call to support. I have not seen any documentation relating to this in posts like this about moving a virtual machine into a virtual network.

And there’s another thing. If you want your VM to be in a virtual network, you have to do that when you create it as well; it is a similar problem.

While I am in complaining mode, here is another. Creating a VM with PowerShell is easy enough, but you do need to know the image name you are using. This is not shown in the friendly portal GUI:

image

In order to get the image names, I ran a PowerShell script that exports the available images to a file. I was surprised how many there are: the resulting output has around 13,500 lines and finding what you want is tedious.

Azure is mostly very good in my experience, but I would like to see these annoyances fixed. I would be interested to hear of other things that make the cloud admin or developer’s life harder than it should be.

Quick reflections on Amazon re:Invent, open source, and Amazon Web Services

Last week I was in Las Vegas for my first visit to Amazon’s annual developer conference re:Invent. There were several announcements, the biggest being a new relational database service called RDS Aurora – a drop-in replacement for MySQL but with 3x write performance and 5x read performance as well as resiliency benefits – and EC2 Container Service, for deploying and managing Docker app containers. There is also AWS Lambda, a service which runs code in response to events.

You could read this news anywhere, but the advantage of being in Vegas was to immerse myself in the AWS culture and get to know the company better. Amazon is both distinctive and disruptive, and threes things that its retail operation and its web services have in common are large scale, commodity pricing, and customer focus.

Customer focus? Every company I have ever spoken to says it is customer focused, so what is different? Well, part of the press training at Amazon seems to be that when you ask about its future plans, the invariable answer is “what customers demand.” No doubt if you could eavesdrop at an Amazon executive meeting you would find that this is not entirely true, that there are matters of strategy and profitability which come into play, but this is the story the company wants us to hear. It also chimes with that of the retail operation, where customer service is generally excellent; the company would rather risk giving a refund or replacement to an undeserving customer and annoy its suppliers than vice versa. In the context of AWS this means something a bit different, but it does seem to me part of the company culture. “If enough customers keep asking for something, it’s very likely that we will respond to that,” marketing executive Paul Duffy told me.

That said, I would not describe Amazon as an especially open company, which is one reason I was glad to attend re:Invent. I was intrigued for example that Aurora is a drop-in replacement for an open source product, and wondered if it actually uses any of the MySQL code, though it seems unlikely since MySQL’s GPL license would require Amazon to publish its own code if it used any MySQL code; that said, the InnoDB storage engine code at least used to be available under a dual license so it is possible. When I asked Duffy though he said:

We don’t … at that level, that’s why we say it is compatible with MySQL. If you run the MySQL compatibility tool that will all check out. We don’t disclose anything about the inner workings of the service.

This of course touches on the issue of whether Amazon takes more from the open source community than it gives back.

image
Senior VP of AWS Andy Jassy

Someone asked Senior VP of AWS Andy Jassy, “what is your strategy of contributing to the open source ecosystem”, to which he replied:

We contribute to the open source ecosystem for many years. Zen, MySQL space, Linux space, we’re very active contributors, and will continue to do so in future.

That was it, that was the whole answer. Aurora, despite Duffy’s reticence, seems to be a completely new implementation of the MySQL API and builds on its success and popularity; could Amazon do more to share some of its breakthroughs with the open source community from which MySQL came? I think that is arguable; but Amazon is hard to hate since it tends to price so competitively.

Is Amazon worried about competition from Microsoft, Google, IBM or other cloud providers? I heard this question asked on several occasions, and the answer was generally along the lines that AWS is too busy to think about it. Again this is perhaps not the whole story, but it is true that AWS is growing fast and dominates the market to the extent that, say, Azure’s growth does not keep it awake at night. That said, you cannot accuse Amazon of complacency since it is adding new services and features at a high rate; 449 so far in 2014 according to VP and Distinguished Engineer James Hamilton, who also mentioned 99% usage growth in EC2 year on year, over 1,000,000 active customers, and 132% data transfer growth in the S3 storage service.

Cloud thinking

Hamilton’s session on AWS Innovation at Scale was among the most compelling of those I attended. His theme was that cloud computing is not just a bunch of hosted servers and services, but a new model of computing that enables new and better ways to run applications that are fast, resilient and scalable. Aurora is actually an example of this. Amazon has separated the storage engine from the relational engine, he explained, so that only deltas (the bits that have changed) are passed down for storage. The data is replicated 6 times across three Amazon availability zones, making it exceptionally resilient. You could not implement Aurora on-premises; only a cloud provider with huge scale can do it, according to Hamilton.

image
Distinguished Engineer James Hamilton

Hamilton was fascinating on the subject of networking gear – the cards, switches and routers that push bits across the network. Five years ago Amazon decided to build its own, partly because it considered the commercial products to be too expensive. Amazon developed its own custom network protocol stack. It worked out a lot cheaper, he said, since “even the support contract for networking gear was running into 10s of millions of dollars.” The company also found that reliability increased. Why was that? Hamilton quipped about how enterprise networking products evolve:

Enterprise customers give lots of complicated requirements to networking equipment producers who aggregate all these complicated requirements into 10s of billions of lines of code that can’t be maintained and that’s what gets delivered.

Amazon knew its own requirements and built for those alone. “Our gear is more reliable because we took on an easier problem,” he said.

AWS is also in a great position to analyse performance. It runs so much kit that it can see patterns of failure and where the bottlenecks lie. “We love metrics,” he said. There is an analogy with the way the popularity of Google search improves Google search; it is a virtuous circle that is hard for competitors can replicate.

Closing reflections

Like all vendor-specific conferences there was more marketing that I would have liked at re:Invent, but there is no doubting the excellence of the platform and its power to disrupt. There are aspects of public cloud that remain unsettling; things can go wrong and there will be nothing you can do but wait for them to be fixed. The benefits though are so great that it is worth the risk – though I would always advocate having some sort of plan B and off-cloud (or backup with another cloud provider) if that is feasible.

How is Microsoft Azure doing? Some stats from Satya Nadella and Scott Guthrie

Microsoft financials are hard to parse these days, with figures broken down into broad categories that reveal little about what is succeeding and what is not.

image
CEO Satya Nadella speaks in San Francisco

At a cloud platform event yesterday in San Francisco, CEO Satya Nadella and VP of cloud and enterprise Scott Guthrie offered some figures. Here is what I gleaned:

  • Projected revenue of $4.4Bn if current trends continue (“run rate”)
  • Annual investment of $4.5Bn
  • Over 10,000 new customers per week
  • 1,200,000 SQL databases
  • Over 30 trillion storage objects
  • 350 million users in Azure Active Directory
  • 19 Azure datacentre regions, up to 600,000 servers in each region

image

Now, one observation from the above is that Microsoft says it is spending more on Azure than it is earning – not unreasonable at a time of fast growth.

However, I do not know how complete the figures are. Nadella said Office 365 runs on Azure (though this may be only partially true; that certainly used to be the case); but I doubt that all Office 365 revenue is included in the above.

What about SQL Server licensing, for example, does Microsoft count it under SQL Server, or Azure, or both depending which marketing event it is?

If you know the answer to this, I would love to hear.

At the event, Guthrie (I think) made a bold statement. He said that there would only be three vendors in hyper-scale cloud computing, being Microsoft, Amazon and Google.

IBM for one would disagree; but there are huge barriers to entry even for industry giants.

I consider Microsoft’s progress extraordinary. Guthrie said that it was just two years ago that he announced the remaking of Azure – this is when things like Azure stateful VMs and the new portal arrived. Prior to that date, Azure stuttered.

Now, here is journalist and open source advocate Matt Asay:

Microsoft used to be evil. Then it was irrelevant. Now it looks like a winner.

He quotes Bill Bennett

Microsoft has created a cloud computing service that makes creating a server as simple as setting up a Word document

New features are coming apace to Azure, and Guthrie showed this slide of what has been added in the last 12 months:

image

The synergy of Azure with Visual Studio, Windows Server and IIS is such that it is a natural choice for Microsoft-platform developers hosting web applications, and Azure VMs are useful for experimentation.

Does anything spoil this picture? Well, when I sat down to write what I thought would be a simple application, I ran into familiar problems. Half-baked samples, ever changing APIs and libraries, beta code evangelised by Microsoft folk with little indication of what to do if you would rather not use this in production, and so on.

There is also a risk that as Azure services multiply, working out what to use and when becomes harder, and complexity increases.

Azure also largely means Windows – and yes, I heard yesterday that 20% of Azure VMs run Linux – but if you have standardised on Linux servers and use a Mac or Linux for development, Azure looks to me less attractive than AWS which has more synergy with that approach.

Still, it is a bright spot in Microsoft’s product line and right now I expect its growth to continue.

Adobe opens up Creative Cloud to app developers

At the Adobe Max conference in Los Angeles, Adobe has announced enhancements and additions to its Creative Cloud service, which includes core applications such as Photoshop, Illustrator, InDesign and Dreamweaver, mobile apps for Apple’s iPad, and the online portfolio site Behance. Creative Cloud is also the mechanism by which Adobe has switched its customers from perpetual software licences to subscription, even for desktop applications.

One of today’s announcements is a public preview version of the Creative SDK for iOS, with an Android version also available on request. Nothing for Windows Phone, though Adobe does seem interested in supporting high-end Windows tablets such as Surface Pro 3, thanks to their high quality screens and pen input support.

image

The Creative SDK lets developers integrate apps with Adobe’s cloud, including access to cloud storage, import and export of PSD (Photoshop) layers, and image processing using cloud services. It also gives developers the ability to support Adobe hardware such as Ink and Slide, which offers accurate drawing even on iOS tablets designed exclusively for touch control.

Adobe’s brand guidelines forbid the use of Adobe product names like Photoshop or Illustrator in your app name, but do allow words such as “Photoshop enabled” and “Creative Cloud connected.”

Other Adobe announcements today include:

Mobile app changes

Adobe’s range of mobile apps has been revised:

  • Adobe Sketch is now Photoshop Sketch and lets you send drawings to Photoshop.
  • Adobe Line is now Illustrator Line and lets you send sketches to Illustrator.
  • Adobe Ideas is now Illustrator Draw, again with Illustrator integration.
  • Adobe Kuler is now Adobe Colour CC and lets you capture colours and save them as themes for use elsewhere.
  • Adobe Brush CC and Adobe Shape CC are new apps for creating new brushes and shapes respectively. For example, you could convert a photo into a vector art that you can use for drawing in Illustrator.
  • Adobe Premiere Clip is a simple video editor for iOS that allows export to Premiere Pro CC.
  • Lightroom Mobile has been updated to enable comments on photos shared online, and synchronisation with Lightroom desktop.

There are now a confusingly large number of ways you can draw or paint on the iPad using an Adobe app, but the common theme is better integration with the desktop Creative cloud applications.

Desktop app enhancements

On the desktop app side, Adobe announcements include Windows 8 touch support in Illustrator, Photoshop, Premiere Pro and After Effects; 3D print features in Photoshop CC; a new curvature tool in Illustrator; and HiDPI (high resolution display support) in After Effects.

New cloud services

New Adobe cloud services include Creative Cloud Libraries,a design asset management service that connects with both mobile and desktop Adobe apps, and Creative Cloud Extract which converts Photoshop PSD imagines into files that web designers and developers can use, such as colours, fonts and CSS files.

Adobe’s Creative Cloud is gradually growing its capabilities, even though Adobe’s core products remain desktop applications, and its move to subscription licensing has been executed smoothly and effectively despite annoying some users. The new SDK is mainly an effort to hook more third-party apps into the Adobe design workflow, though the existence of hosted services for image processing is an intriguing development.

It is a shame though that the new SDK is so platform-specific, causing delays to the Android version and lack of support for other platforms such as Windows Phone.

Adobe actually has its own cross-platform mobile toolkit, called PhoneGap, though I imagine Adobe’s developers feel that native code rather than JavaScript is the best fit for design-oriented apps.

Microsoft Azure: new preview portal is “designed like an operating system” but is it better?

How important is the Azure portal, the web-based user interface for managing Microsoft’s cloud computing platform? You can argue that it is not all that important. Developers and users care more about the performance and reliability of the services themselves. You can also control Azure services through PowerShell scripts.

My view is the opposite though. The portal is the entry point for Azure and a good experience makes developers more likely to continue. It is also a dashboard, with an overview of everything you have running (or not running) on Azure, the health of your services, and how much they are costing you. I also think of the portal as an index of resources. Can you do this on Azure? Browsing through the portal gives you a quick answer.

The original Azure portal was pretty bad. I wish I had more screenshots; this 2009 post comparing getting started on Google App Engine with Azure may bring back some memories. In 2011 there were some big management changes at Microsoft, and Scott Guthrie moved over to Azure along with various other executives. Usability and capability improved fast, and one of the notable changes was the appearance of a new portal. Written in HTML 5, it was excellent, showing all the service categories in a left-hand column. Select a category, and all your services in that category are listed. Select a service and you get a detailed dashboard. This portal has evolved somewhat since it was introduced, notably through the addition of many more services, but the design is essentially the same.

image

The New button lets you create a new service:

image

The portal also shows credit status right there – no need to hunt through links to account management pages:

image

It is an excellent portal, in other words, logically laid out, easy to use, and effective.

That is the old portal though. Microsoft has introduced a new portal, first demonstrated at the Build conference in April. The new portal is at http://portal.azure.com, versus http://manage.windowsazure.com for the old one.

The new portal is different in look and feel:

image

Why a new portal and how does it work? Microsoft’s Justin Beckwith, a program manager, has a detailed explanatory post. He says that the old portal worked well at first but became difficult to manage:

As we started ramping up the number of services in Azure, it became infeasible for one team to write all of the UI. The teams which owned the service were now responsible (mostly) for writing their own UI, inside of the portal source repository. This had the benefit of allowing individual teams to control their own destiny. However – it now mean that we had hundreds of developers all writing code in the same repository. A change made to the SQL Server management experience could break the Azure Web Sites experience. A change to a CSS file by a developer working on virtual machines could break the experience in storage. Coordinating the 3 week ship schedule became really hard. The team was tracking dependencies across multiple organizations, the underlying REST APIs that powered the experiences, and the release cadence of ~40 teams across the company that were delivering cloud services.

The new portal is the outcome of some deep thinking about the future. It is architected, according to Beckwith, more like an operating system than like a web application.

The new portal is designed like an operating system. It provides a set of UI widgets, a navigation framework, data management APIs, and other various services one would expect to find with any UI framework. The portal team is responsible for building the operating system (or the shell, as we like to call it), and for the overall health of the portal.

Each service has its own extension, or “application”, which runs in an iframe (inline frame) and is isolated from other extensions. Unusually, the iframes are not used to render content, but only to run scripts. These scripts communicate with the main frame using the window.postMessage API call – familiar territory for Windows developers, since messages also drive the Windows desktop operating system.

Microsoft is also using TypeScript, a high-level language that compiles to JavaScript, and open source resources including Less and Knockout.

Beckwith’s post is good reading, but the crunch question is this: how does the new portal compare to the old one?

I get the sense that Microsoft has put a lot of effort into the new portal (which is still in preview) and that it is responsive to feedback. I expect that the new portal will in time be excellent. Currently though I have mixed feeling about it, and often prefer to use the old portal. The new portal is busier, slower and more confusing. Here is the equivalent to the previous New screen shown above:

image

The icons are prettier, but there is something suspiciously like an ad at top right; I would rather see more services, with bigger text and smaller icons; the text conveys more information.

Let’s look at scaling a website. In the old portal, you select a website, then click Scale in the top menu to get to a nice scaling screen where you can set up autoscaling, define the number of instances and so on.

How do you find this in the new portal? You get this screen when you select a website (I have blanked out the name of the site).

image

This screen scrolls vertically and if you scroll down you can find a small Scale panel. Click it and you get to the scaling panel, which has a nicely done UI though the way panels constantly appear and disappear is something you have to get used to.

There are also additional scaling options in the preview portal (the old one only offers scaling based on CPU usage):

image

The preview portal also integrates with Visual Studio online for cloud-based devops.

The challenge for Microsoft is that the old portal set a high bar for clarity and usability. The preview portal does more than the old, and is more fit for purpose as the number and capability of Azure services increases, but its designers need to resist the temptation to let prettiness obstruct performance and efficiency.

Developers can give feedback on the portal here.

Developing an app on Microsoft Azure: a few quick reflections

I have recently completed (if applications are ever completed) an application which runs on Microsoft’s Azure platform. I used lots of Microsoft technology:

  • Visual Studio 2013
  • Visual Studio Online with Team Foundation version control
  • ASP.NET MVC 4.0
  • Entity Framework 4.0
  • Azure SQL
  • Azure Active Directory
  • Azure Web Sites
  • Azure Blob Storage
  • Microsoft .NET 4.5 with C#

The good news: the app works well and performance is good. The application handles the upload and download of large files by authorised users, and replaces a previous solution using a public file sending service. We were pleased to find that the new application is a little faster for upload and download, as well as offering better control over user access and a more professional appearance.

There were some complications though. The requirement was for internal users to log in with their Office 365 (Azure Active Directory) credentials, but for external users (the company’s customers) to log in with credentials stored in a SQL Server database – in other words, hybrid authentication. It turns out you can do this reasonably seamlessly by implementing IPrincipal in a custom class to support the database login. This is largely uncharted territory though in terms of official documentation and took some effort.

Second, Microsoft’s Azure Active Directory support for custom applications is half-baked. You can create an application that supports Azure AD login in a few moments with Visual Studio, but it does not give you any access to metadata like to which security groups the user belongs. I have posted about this in more detail here. There is an API of course, but it is currently a moving target: be prepared for some hassle if you try this.

Third, while Azure Blob Storage itself seems to work well, most of the resources for developers seem to have little idea of what a large file is. Since a primary use case for cloud storage is to cover scenarios where email attachments are not good enough, it seems to me that handling large files (by which I mean multiple GB) should be considered normal rather than exceptional. By way of mitigation, the API itself has been written with large files in mind, so it all works fine once you figure it out. More on this here.

What about Visual Studio? The experience has been good overall. Once you have configured the project correctly, you can update the site on Azure simply by hitting Publish and clicking Next a few times. There is some awkwardness over configuration for local debugging versus deployment. You probably want to connect to a local SQL Server and the Azure storage emulator when debugging, and the Azure hosted versions after publishing. Visual Studio has a Web.Debug.Config and a Web.Release.Config which lets you apply a transformation to your main Web.Config when publishing – though note that these do not have any effect when you simply run your project in Release mode. The correct usage is to set Web.Config to what you want for debugging, and apply the deployment configuration in Web.Release.Config; then it all works.

The piece that caused me most grief was a setting for <wsFederation>. When a user logs in with Azure AD, they get redirected to a Microsoft site to log in, and then back to the application. Applications have to be registered in Azure AD for this to work. There is some uncertainty though about whether the reply attribute, which specifies the redirection back to the app, needs to be set explicitly or not. In practice I found that it does need to be explicit, otherwise you get redirected to the deployed site even when debugging locally – not good.

I have mixed feelings about Team Foundation version control. It works, and I like having a web-based repository for my code. On the other hand, it is slow, and Visual Studio sulks from time to time and requires you to re-enter credentials (Microsoft seems to love making you do that). If you have a less than stellar internet connection (or even a good one), Visual Studio freezes from time to time since the source control stuff is not good at working in the background. It usually unfreezes eventually.

As an experiment, I set the project to require a successful build before check-in. The idea is that you cannot check in a broken build. However, this build has to take place on the server, not locally. So you try to check in, Visual Studio says a build is required, and prompts you to initiate it. You do so, and a build is queued. Some time later (5-10 minutes) the build completes and a dialog appears behind the IDE saying that you need to reconcile changes – even if there are none. Confusing.

What about Entity Framework? I have mixed feelings here too, and have posted separately on the subject. I used code-first: just create your classes and add them to your DbContext and all the data access code is handled for you, kind-of. It makes sense to use EF in an ASP.NET MVC project since the framework expects it, though it is not compulsory. I do miss the control you get from writing your own SQL though; and found myself using the SqlQuery method on occasion to recover some of that control.

Finally, a few notes on ASP.NET MVC. I mostly like it; the separation between Razor views (essentially HTML templates into which you pour your data at runtime) and the code which implements your business logic and data access is excellent. The code can get convoluted though. Have a look at this useful piece on the ASP.NET MVC WebGrid and this remark:

grid.Column("Name",
  format: @<text>@Html.ActionLink((string)item.Name,
  "Details", "Product", new { id = item.ProductId }, null)</text>),

The format parameter is actually a Func, but the Razor view engine hides that from us. But you’re free to pass a Func—for example, you could use a lambda expression.

The code works fine but is it natural and intuitive? Why, for example, do you have to cast the first argument to ActionLink to a string for it to work (I can confirm that it is necessary), and would you have worked this out without help?

I also hit a problem restyling the pages generated by Visual Studio, which use the twitter Bootstrap framework. The problem is that bootstrap.css is a generated file and it does not make sense to edit it directly. Rather, you should edit some variables and use them as input to regenerate it. I came up with a solution which I posted on stackoverflow but no comments yet – perhaps this post will stimulate some, as I am not sure if I found the best approach.

My sense is that what ASP.NET MVC is largely a thing of beauty, it has left behind more casual developers who want a quick and easy way to write business applications. Put another way, the framework is somewhat challenging for newcomers and that in turn affects the breadth of its adoption.

Developing on Azure and using Azure AD makes perfect sense for businesses which are using the Microsoft platform, especially if they use Office 365, and the level of integration on offer, together with the convenience of cloud hosting and anywhere access, is outstanding. There remain some issues with the maturity of the frameworks, ever-changing libraries, and poor or confusing documentation.

Since this area is strategic for Microsoft, I suggest that it would benefit the company to work hard on pulling it all together more effectively.

Notes from the field: putting Azure Blob storage into practice

I rashly agreed to create a small web application that uploads files into Azure storage. Azure Blob storage is Microsoft’s equivalent to Amazon’s S3 (Simple Storage Service), a cloud service for storing files of up to 200GB.

File upload performance can be an issue, though if you want to test how fast your application can go, try it from an Azure VM: performance is fantastic, as you would expect from an Azure to Azure connection in the same region.

I am using ASP.NET MVC and thought a sample like this official one, Uploading large files using ASP.NET Web API and Azure Blob Storage, would be all I needed. It is a start, but the method used only works for small files. What it does is:

1. Receive a file via HTTP Post.

2. Once the file has been received by the web server, calls CloudBlob.UploadFile to upload the file to Azure blob storage.

What’s the problem? Leaving aside the fact that CloudBlob is deprecated (you are meant to use CloudBlockBlob), there are obvious problems with files that are more than a few MB in size. The expectation today is that users see some sort of progress bar when uploading, and a well-written application will be resistant to brief connection breaks. Many users have asynchronous internet connections (such as ADSL) with slow upload; large files will take a long time and something can easily go wrong. The sample is not resilient at all.

Another issue is that web servers do not appreciate receiving huge files in one operation. Imagine you are uploading the ISO for a DVD, perhaps a 3GB file. The simple approach of posting the file and having the web server upload it to Azure blob storage introduces obvious strain and probably will not work, even if you do mess around with maxRequestLength and maxAllowedContentLength in ASP.NET and IIS. I would not mind so much if the sample were not called “Uploading large files”; the author perhaps has a different idea of what is a large file.

Worth noting too that one developer hit a bug with blobs greater than 5.5MB when uploaded over HTTPS, which most real-world businesses will require.

What then are you meant to do? The correct approach, as far as I can tell, is to send your large files in small chunks called blocks. These are uploaded to Azure using CloudBlockBlob.PutBlock. You identify each block with an ID string, and when all the blocks are uploaded, called CloudBlockBlob.PutBlockList with a list of IDs in the correct order.

This is the approach taken by Suprotim Agarwal in his example of uploading big files, which works and is a great deal better than the Microsoft sample. It even has a progress bar and some retry logic. I tried this approach, with a few tweaks. Using a 35MB file, I got about 80 KB/s with my ADSL broadband, a bit worse than the performance I usually get with FTP.

Can performance be improved? I wondered what benefit you get from uploading blocks in parallel. Azure Storage does not mind what order the blocks are uploaded. I adapted Agarwal’s sample to use multiple AJAX calls each uploading a block, experimenting with up to 8 simultaneous uploads from the browser.

The initial results were disappointing. Eventually I figured out that I was not actually achieving parallel uploads at all. The reason is that the application uses ASP.NET session state, and IIS will block multiple connections in the same session unless you mark your ASP.NET MVC controller class  with the SessionStateBehavior.ReadOnly attribute.

I fixed that, and now I do get multiple parallel uploads. Performance improved to around 105 KB/s, worthwhile though not dramatic.

What about using a Windows desktop application to upload large files? I was surprised to find little improvement. But can parallel uploading help here too? The answer is that it should happen anyway, handled by the .NET client library, according to this document:

If you are writing a block blob that is no more than 64 MB in size, you can upload it in its entirety with a single write operation. Storage clients default to a 32 MB maximum single block upload, settable using the SingleBlobUploadThresholdInBytes property. When a block blob upload is larger than the value in this property, storage clients break the file into blocks. You can set the number of threads used to upload the blocks in parallel using the ParallelOperationThreadCount property.

It sounds as if there is little advantage in writing your own chunking code, except that if you just call the UploadFromFile or UploadFromStream methods of CloudBlockBlob, you do not get any progress notification event (though you can get a retry notification from an OperationContext object passed to the method). Therefore I looked around for a sample using parallel uploads, and found this one from Microsoft MVP Tyler Doerksen, using C#’s Parallel.For.

Be warned: it does not work! Doerksen’s approach is to upload the entire file into memory (not great, but not as bad as on a web server), send it in chunks using CloudBlockBlob.PutBlock, adding the block ID to a collection at the same time, and then to call CloudBlockBlob.PutBlockList. The reason it does not work is that the order of the loops in Parallel.For is indeterminate, so the block IDs are unlikely to be in the right order.

I fixed this, it tested OK, and then I decided to further improve it by reading each chunk from the file within the loop, rather than loading the entire file into memory. I then puzzled over why my code was broken. The files uploaded, but they were corrupt. I worked it out. In the following code, fs is a FileStream object:

fs.Position = x * blockLength;
bytesread = fs.Read(chunk, 0, currentLength);

Spot the problem? Since fs is a variable declared outside the loop, other threads were setting its position during the read operation, with random results. I fixed it like this:

lock (fs)
{
fs.Position = x * blockLength;
bytesread = fs.Read(chunk, 0, currentLength);
}

and the file corruption disappeared.

I am not sure why, but the manually coded parallel uploads seem to slightly but not dramatically improve performance, to around 100-105 KB/s, almost exactly what my ASP.NET MVC application achieves over my broadband connection.

image

There is another approach worth mentioning. It is possible to bypass the web server and upload directly from the browser to Azure storage. To do this, you need to allow cross-origin resource sharing (CORS) as explained here. You also need to issue a Shared Access Signature, a temporary key that allows read-write access to Azure storage. A guy called Blair Chen seems to have this all figured out, as you can see from his Azure speed test and jazure JavaScript library, which makes it easy to upload a blob from the browser.

I was contemplating going that route, but it seems that performance is no better (judging by the Test Upload Big Files section of Chen’s speed test), so I should probably be content with the parallel JavaScript upload solution, which avoids fiddling with CORS.

Overall, has my experience with the Blob storage API been good? I have not found any issues with the service itself so far, but the documentation and samples could be better. This page should be the jumping off point for all you need to know for a basic application like mine, but I did not find it easy to find good samples or documentation for what I thought would be a common scenario, uploading large files with ASP.NET MVC.

Update: since writing this post I have come across this post by Rob Gillen which addresses the performance issue in detail (and links to working Parallel.For code); however I suspect that since the post is four years old the conclusions are no longer valid, because of improvements to the Azure storage client library.

Microsoft Azure: growing but still has image problems

I attended a Microsoft Cloud Day in London organised by the Azure User Group; I booked this when Technical Fellow Mark Russinovich was set to attend, but regrettably he cancelled at a late stage. I skipped the substitute keynote by UK Microsoftie Dave Coplin as I heard the very same talk earlier this month, so arrived mid-morning at the venue in Whitechapel; not that easy to find amid the stalls of Whitechapel Market (well, not quite), but if you seek out the Whitechapel branch of the Foxcroft and Ginger cafe (not known to Here Maps on Windows Phone, incidentally) then you will find premises upstairs with logos for Barclays Accelerator and Microsoft Ventures; something to do with assisting the flow of cash from corporate giants desperate for community engagement to business start-ups desperate for cash.

Giving technical presentations is hard, and while I admired Richard Conway’s efforts at showing how, with some PowerShell, he could transform some large dataset into rows of numbers using the magic of Azure HDInsight I didn’t think it quite worked. Beat Schwegler dived into code to explain the how and why of Azure Notification Hubs, a service which delivers push notifications to mobile apps; useful material, but could have been compressed. Then there was Richard Astbury at software development company two10degrees who talked about Project Orleans, high scale applications via “an Actor Model framework of programmable in-memory objects”; we learned about grains and silos (or software equivalents) in a session that was mostly new to me.

At the break I chatted with a somewhat bemused attendee who had come in the hope of learning about whether he should migrate some or all of his small company’s server requirements to Azure. I explained about Office 365 and Azure Active Directory which he said was more relevant to him than the intricacies of software development. It turns out that the Azure User Group is really about software development using Azure services, which is only one perspective on Microsoft’s cloud platform.

For me the most intriguing presentation was from Michael Delaney at ElevateDirect, a young business which has a web application to assist businesses in finding employees directly rather than via recruitment agencies. His company picked Amazon Web Services (AWS) over Azure two and a half years ago, but is now moving to Microsoft’s cloud.

image
Michael Delaney, CTO and co-founder ElevateDirect

Why did he pick AWS? He is not a typical Microsoft-platform person, preferring open source products including Linux, Apache Solr, Python and MySQL. When he chose AWS, Azure was not a suitable platform for a mainly Linux-based application. However, he does prefer C# to Java. According to Delaney, AWS is a Java-first platform and he found this getting in the way of development.

Azure today, says Delaney, has the first-class support for Linux that it lacked a few years back, and is a better platform for C# applications than AWS even though AWS does support Windows servers.

Migrating the application was relatively straightforward, he said, with the biggest issue being the move from Amazon S3 (Simple Storage Service) to Azure Storage, though he overcame this by abstracting the storage API behind his own wrapper code.

Azure is not all the way there though. Delaney is disappointed with the relational database options on offer, essentially SQL Server or third-party managed MySQL from ClearDB. He would like to see options for PostgreSQL and others. He would also like the open source Elastic Search to be offered as an Azure service.

There was a panel discussion later at which the question of Azure’s market perception was discussed. Most businesses, according to one attendee, think of AWS as the only option for cloud, even if they are Microsoft-platform businesses for whom Azure might be more suitable. It is a branding problem caused by the AWS first-mover advantage and market dominance, said Microsoft’s Steve Plank.

I would add that Azure is relatively new, at least in its new incarnation offering full IaaS (infrastructure as a service). AWS is also ahead on the number and variety of services on offer, and has not really messed up, which means there is little incentive for existing users to move unless, like Delaney, they find some aspect of Microsoft’s platform (in his case C#) particularly compelling.

This leads me back to the bemused attendee. It seems to me that Azure’s biggest advantage is Azure Active Directory and seamless integration with Office 365. Having said that, it is not difficult to host an application on AWS that uses Azure Active Directory, but there may be some advantage in working with a single cloud provider (and you can expect fast low-latency networking between Azure and Office 365).

Amazon AWS and the continuing trend towards cloud services. Desktops next?

It was a lightbulb moment. The problem:  how to migrate a document store from one Office 365 (hosted SharePoint) instance to another. Copy it all out and copy it back in, obviously, but that is painful over ADSL (which is all I had at my disposal) since the “asynchronous” part of ADSL means slow uploads; and download from Office 365 was not that fast either.

Solution: use an Azure virtual machine. VM hosted by Microsoft, SharePoint hosted by Microsoft, result – a fast connection between the two. I ran up the VM in a few minutes using Microsoft’s nice Azure portal, used Remote Desktop to connect, and copied the documents out and back in no time.

There is a general point here. If you are contemplating cloud-hosted VDI (Virtual Desktop Infrastructure), there is huge advantage in having the server applications and data close to the VDI instances. All you then need is a connection good enough to work on that remote desktop, which is relatively lightweight. If the cloud vendor is doing its job, the internal connections in that cloud should be fast. In addition, from the client’s perspective, most of the data is download, transferring the screen image to the client, rather than upload, transmitting mouse and keyboard interactions, so that is a good use case for ADSL.

The further implication is that the more you use cloud services, the more attractive hosted desktops become. Desktops are expensive to manage, which is why I would expect a service like Amazon Workspaces, hosted Windows desktops as a service, to find a ready market – even at $600 per year for a desktop with Office Professional 2010 preinstalled, or $420 per year if you install and license Office yourself, or use Open Office or some other alternative.

Workspaces are currently in limited preview, which means a closed beta, but there are hints that a public beta is coming soon.

Adopting this kind of setup means a massive dependency on Amazon of course, which is a concern if you worry about that kind of thing (and I think you should); but how much business is now dependent on one of the major cloud providers (I tend to think of Amazon, Microsoft and Google as the top three) already?

Thinking back to my Office 365 example, it also seems to me that Microsoft will make a serious play for cloud VDI in the not too distant future, since it makes so much sense. The problem for Microsoft is further cannibalisation of its on-premise business, and further disruption for Microsoft partners, but if the alternative is giving away business to Amazon, it has little choice.

I was at an Amazon Web Services briefing today and asked whether we might see an Office 365-like package from AWS in future. Unlikely, I was told; but many customers do use AWS for hosting the likes of Exchange and SharePoint.

The really clever thing for Amazon would be a package that looked like Office 365, but using either open source or internally developed applications that removed the need to pay license fees to Microsoft.

What else is new from AWS? I have no exclusives to share, since Amazon has a policy of never pre-announcing new features or services. There were a few statistics, one of which is that Redshift, hosted data warehousing, is Amazon’s fastest-growing product.

Amazon also talked about Kinesis, which lets you analyse streams of data in a 24-hour window. For example, if you wanted to analyse the output from thousands of sensors (say,weather) but do not need to store the data, you can use Kinesis. If you do want to store the data, you can integrate with Redshift or DynamoDb, two of Amazon’s database services.

The company also talked up its Relational Database Service (RDS), where you purchase a managed database service which can currently be MySQL, PostgreSQL, Oracle or Microsoft SQL Server. Amazon handles all the infrastructure management so you only need worry about your data and applications.

RSD pricing ranges start from $25 a month for MySQL, to $514 a month for SQL Server Standard (which is actually more expensive than Oracle at $223 per month for the same instance size). Higher capacity instances cost more of course. SQL Server Web edition comes down below Oracle at $194 per month, but I was surprised to see how high the SQL Server costs are. Note that these prices include all the CALs (Client Access Licenses). The prices are actually per hour, eg $0.715 for SQL Server Standard, so you could save money if your business can turn off or reduce the service out of working hours, for example.

How much premium does Amazon charge for its managed RDS versus what you would pay for equivalent capacity in a VM that you manage yourself? I asked this question but did not receive a meaningful reply; you need to do your own homework.

My reflection on this is that just as supermarkets make more money from pre-packaged ready meals than from basic groceries, so too the cloud providers can profit by bundling management and applications into their products rather than offering only basic infrastructure services. You still have the choice; but database admin costs money too.

Finally, we took a quick look at AppStream, which is a proprietary protocol, SDK and service for multimedia applications. You write applications such as games that render video on the server and stream it efficiently to the client, which could be a smartphone or low-power tablet. In this case again, you are taking a total dependency on Amazon to enable your application to run.

If you are interested in AWS, look out for a summit near you. There is one in London on 30th April. Or go to the Reinvent conference in Las Vegas in November.

My overall reflection is that the momentum behind AWS and its pace of innovation is impressive; yet it also seems to me that rivals like Microsoft and Google are becoming more effective. The cloud computing market is such that there is room for all to grow.