Category Archives: azure

Reserved IPs and other Microsoft Azure annoyances

I have been doing a little work with Microsoft’s Azure platform recently. A common requirement is that you want a VM which is internet-accessible with a custom domain, for which the best solution is to create a A record in your DNS pointing to the IP number of the VM. In order to do this reliably, you need to reserve an IP number for the VM; otherwise Azure may assign a different IP number if you shut it down and later restart it. If you keep it running you can keep the IP number, but this also means you are have to pay for the VM continuously.

Azure now offers reserved IP numbers. Useful; but note that you can only link a VM with a reserved IP number when it is created, and to do this you have to create the VM with PowerShell.

What if you want to assign a reserved IP number to an existing VM? One suggestion is that you can capture an image from the VM, and then create a new VM from the image, complete with reserved IP. I went partially down this route but came unstuck because Azure for some reason captured the image into a different region (West Europe) than the region where the VM used to be (North Europe). When I ran the magic PowerShell script, it complained that the image was in the wrong region. I then found a post explaining how to move images between regions, which I did, but the metadata of the moved image was not quite the same and creating a new VM from the image did not work. At this point I realised that it would  be easier to recreate the VM from scratch.

Note that when reserved IP number were announced in May 2014, program manager Mahesh Thiagarajan said:

The platform doesn’t support reserving the IP address of the existing Cloud Services or Virtual machines. We expect to announce support for this in the near future.

You can debate what is meant by “near future” and whether Microsoft has already failed this expectation.

There is another wrinkle here that I am not clear about. Some Azure VMs have special pricing, such as those with SQL Server pre-installed. The special pricing is substantial, often forming the largest part of the price, since it includes licensing fees. What happens to the special pricing if you fiddle with cloning VMs, creating new VMs with existing VHDs, moving VMs between regions, or the like? If the special pricing is somehow lost, how do you restore it so SQL Server (for example) is still properly licensed? I imagine this would mean a call to support. I have not seen any documentation relating to this in posts like this about moving a virtual machine into a virtual network.

And there’s another thing. If you want your VM to be in a virtual network, you have to do that when you create it as well; it is a similar problem.

While I am in complaining mode, here is another. Creating a VM with PowerShell is easy enough, but you do need to know the image name you are using. This is not shown in the friendly portal GUI:

image

In order to get the image names, I ran a PowerShell script that exports the available images to a file. I was surprised how many there are: the resulting output has around 13,500 lines and finding what you want is tedious.

Azure is mostly very good in my experience, but I would like to see these annoyances fixed. I would be interested to hear of other things that make the cloud admin or developer’s life harder than it should be.

SSD storage has come to Azure VMs, along with faster Azure SQL

Microsoft has introduced SSD storage for Azure VMs. This is a catch-up with Amazon which has been offering this at least since June 2014. It is an important feature though, and now in preview. The SSDs are part of the Azure storage service but can only be used for disks attached to VMs, not for general-purpose block files. There are three virtual disks available:

  P10 P20 P30
Disk size 128GB 512GB 1TB
IOPS 500 2300 5000
Throughput 100 MB/s 150 MB/s 200 MB/s

Price is $6.90 per 100GB per month, which if I am reading this right is less than Amazon’s $0.10 per GB per month ($10 per 100GB) as shown here.

One obvious use case is for SQL Server running on a VM. This generally performs better than Microsoft’s Azure SQL database service. That said, Microsoft is also previewing an improved Azure SQL which supports most of the features of SQL Server 2014, including .NET stored procedures and in-memory columnstore queries. Microsoft’s Scott Guthrie says performance is better:

Our internal benchmark tests (using over 600 million rows of data) show query performance improvements of around 5x with today’s preview relative to our existing Premium Tier SQL Database offering and up to 100x performance improvements when using the new In-memory columnstore technology.

If you can make it work, Azure SQL is better sense than running SQL Server in a VM with all the hassles of server patching and of course Microsoft’s licensing fees; but the performance has to be there. Another factor which drives users to the VM option is that SQL Reporting Service is not available in Azure SQL.

Microsoft’s Azure outage: a troubling account of what went wrong

Microsoft’s Jason Zander has published an account of what went wrong yesterday, causing failure of many Azure services for a number of hours. The incident is described as running from 0.51 AM to 11.45 AM on November 19th though the actual length of the outage varied; an Azure application which I developed was offline for 3.5 hours.

Customers are not happy. From the comments:

So much for traffic manager for our VM’s running SQL server in a high availability SQL cluster $6k per month if every data center goes down. We were off for 3 hrs during the worst time of day for us; invoicing and loading for 10,000 deliveries. CEO is wanting to pull us out of the cloud.

So what went wrong? It was a bug in an update to the Storage Service, which impacts other services such as VMs and web sites since they have a dependency on the Storage Service. The update was already in production but only for Azure Tables; this seems to have given the team the confidence to deploy the update generally but a bug in the Blob service caused it to loop and stop responding.

Here is the most troubling line in Zander’s report:

Unfortunately the issue was wide spread, since the update was made across most regions in a short period of time due to operational error, instead of following the standard protocol of applying production changes in incremental batches.

In other words, this was not just a programming error, it was an operational error that meant the usual safeguards whereby a service in one datacenter takes over when another fails did not work.

Then there is the issue of communication. This is critical since while customers understand that sometimes things go wrong, they feel happier if they know what is going on. It is partly human nature, and partly a matter of knowing what mitigating action you need to take.

In this case Azure’s Service Health Dashboard failed:

There was an Azure infrastructure issue that impacted our ability to provide timely updates via the Service Health Dashboard. As a mitigation, we leveraged Twitter and other social media forums.

This is an issue I see often; online status dashboards are great for telling you all is well, but when something goes wrong they are the first thing to fall over, or else fail to report the problem. In consequence users all pick up the phone simultaneously and cannot get through. Twitter is no substitute; frankly if my business were paying thousands every month to Microsoft for Azure services I would find it laughable to be referred to Twitter in the event of a major service interruption.

Zander also says that customers were unable to create support cases. Hmm, it does seem to me that Microsoft should isolate its support services from its production services in some way so that both do not fail at once.

Still, of the above it is the operational error that is of most concern.

What are the wider implications? There are two takes on this. One is to say that since Azure is not reliable try another public cloud, probably Amazon Web Services. My sense is that the number and severity of AWS outages has reduced over the years. Inherently though, it is always possible that human error or a hardware failure can have a cascading effect; there is no guarantee that AWS will not have its own equally severe outage in future.

The other take is to give up on the cloud, or at least build in a plan B in the event of failure. Hybrid cloud has its merits in this respect.

My view in general though is that cloud reliability will get better and that its benefits exceed the risk – though when I heard last week, at Amazon Re:Invent, of large companies moving their entire datacenter infrastructure to AWS I did think to myself, “that’s brave”.

Finally, for the most critical services it does make sense to spread them across multiple public clouds (if you cannot fallback to on-premises). It should not be necessary, but it is.

An Azure Web Site is a VM which supports multiple applications

This will be unnecessary for Azure experts, but I have seen some misunderstanding on this point, hence this post.

A “web site” is a unit of service on the Azure cloud platform which represents a web application hosted on IIS, Microsoft’s web server (but see below). You write a standard ASP.NET application and deploy it. Azure takes care of configuring the host VM, the server operating system, and IIS.

Using a web site is preferable to creating your own VM and installing IIS on it, for several reasons. One is that you do not have to worry about patching and maintaining the operating system. Another is that web sites can be scaled, manually or automatically, with an option for scheduling so that you can scale down the site for periods of low demand.

image

The main reason for using a VM rather than a web site is if the app has dependencies that fall outside what a web site can handle.

Another thing to know about Azure web sites is that they have four “plan modes,” but only two are worth considering for production. The Free and Shared modes host your application on a shared VM, and quotas are applied. If Azure decides your site is out of quota, it will stop responding. Fine for a prototype, but not something you want customers or users to see. This feature is not shown clearly on the table of features but it is in note 2:

Shared Instance: Free and Shared (Preview) tiers include 60 minutes and 240 minutes of CPU capacity per day, respectively. The Shared (Preview) Website rates are applied per website instance.

The Basic tier on the other hand is decent. It is a dedicated VM, and you can scale it (manually) to 3 instances. It costs around 25% less than a Standard tier site.

Why go Standard? You get 50B storage thrown in (a Basic tier site has 10GB), auto-backup, auto-scale up to 10 instances, and a fixed IP address for SSL. If you have to buy a fixed IP address for a single instance Basic tier site, the price goes above a Standard tier site, except for a Large instance.

Currently a Basic tier web site costs from £35.64 to £141.92 per month, and a Standard tier from £47.10 to £189.65, depending on the size of the VM.

It is a significant cost, but what may not be obvious is that you can deploy multiple applications to a single web site, which makes my statement above, “A ‘web site’ is a unit of service on the Azure cloud platform which represents a web application hosted on IIS”, not quite correct.

When you create a new web site, if you have one already, you can choose a “web hosting plan”. Here is an example:

image

In this case, there are two pre-existing web site VMs, one in East Asia and one in Europe. If you choose one of these two, the new web site will be added to that VM. If you choose “Create new web hosting plan”, you will create a new dedicated instance (or free, or Shared). Adding to an existing VM means no extra cost.

If you are a developer, it may well be better to run a single Basic VM for prototyping, and add multiple sites, rather than risking a free or shared instance which might be out of quota when you demonstrate it to your customer.

What is the limit to the number of web sites you can add? There is none, other than the overloading the VM and getting unresponsive applications.

Postscript: the Web Site service is interesting as an example which blurs the boundaries between IaaS (Infrastructure as a service) and PaaS (Platform as a service). It is more PaaS than IaaS, in that you do not have to worry about maintaining the OS, but more IaaS than PaaS, in that you are still having to think about individual VMs. It would be more purist if Microsoft abstracted away the VMs and simply guaranteed a certain level of service, or scaled up automatically and billed for what you use. On the other hand, the Web Site concept puts a lot of control in the hands of the developer/admin and help them to make best use of the resources, while still removing most of the maintenance burden. I think it is a good compromise.

Microsoft Azure: new preview portal is “designed like an operating system” but is it better?

How important is the Azure portal, the web-based user interface for managing Microsoft’s cloud computing platform? You can argue that it is not all that important. Developers and users care more about the performance and reliability of the services themselves. You can also control Azure services through PowerShell scripts.

My view is the opposite though. The portal is the entry point for Azure and a good experience makes developers more likely to continue. It is also a dashboard, with an overview of everything you have running (or not running) on Azure, the health of your services, and how much they are costing you. I also think of the portal as an index of resources. Can you do this on Azure? Browsing through the portal gives you a quick answer.

The original Azure portal was pretty bad. I wish I had more screenshots; this 2009 post comparing getting started on Google App Engine with Azure may bring back some memories. In 2011 there were some big management changes at Microsoft, and Scott Guthrie moved over to Azure along with various other executives. Usability and capability improved fast, and one of the notable changes was the appearance of a new portal. Written in HTML 5, it was excellent, showing all the service categories in a left-hand column. Select a category, and all your services in that category are listed. Select a service and you get a detailed dashboard. This portal has evolved somewhat since it was introduced, notably through the addition of many more services, but the design is essentially the same.

image

The New button lets you create a new service:

image

The portal also shows credit status right there – no need to hunt through links to account management pages:

image

It is an excellent portal, in other words, logically laid out, easy to use, and effective.

That is the old portal though. Microsoft has introduced a new portal, first demonstrated at the Build conference in April. The new portal is at http://portal.azure.com, versus http://manage.windowsazure.com for the old one.

The new portal is different in look and feel:

image

Why a new portal and how does it work? Microsoft’s Justin Beckwith, a program manager, has a detailed explanatory post. He says that the old portal worked well at first but became difficult to manage:

As we started ramping up the number of services in Azure, it became infeasible for one team to write all of the UI. The teams which owned the service were now responsible (mostly) for writing their own UI, inside of the portal source repository. This had the benefit of allowing individual teams to control their own destiny. However – it now mean that we had hundreds of developers all writing code in the same repository. A change made to the SQL Server management experience could break the Azure Web Sites experience. A change to a CSS file by a developer working on virtual machines could break the experience in storage. Coordinating the 3 week ship schedule became really hard. The team was tracking dependencies across multiple organizations, the underlying REST APIs that powered the experiences, and the release cadence of ~40 teams across the company that were delivering cloud services.

The new portal is the outcome of some deep thinking about the future. It is architected, according to Beckwith, more like an operating system than like a web application.

The new portal is designed like an operating system. It provides a set of UI widgets, a navigation framework, data management APIs, and other various services one would expect to find with any UI framework. The portal team is responsible for building the operating system (or the shell, as we like to call it), and for the overall health of the portal.

Each service has its own extension, or “application”, which runs in an iframe (inline frame) and is isolated from other extensions. Unusually, the iframes are not used to render content, but only to run scripts. These scripts communicate with the main frame using the window.postMessage API call – familiar territory for Windows developers, since messages also drive the Windows desktop operating system.

Microsoft is also using TypeScript, a high-level language that compiles to JavaScript, and open source resources including Less and Knockout.

Beckwith’s post is good reading, but the crunch question is this: how does the new portal compare to the old one?

I get the sense that Microsoft has put a lot of effort into the new portal (which is still in preview) and that it is responsive to feedback. I expect that the new portal will in time be excellent. Currently though I have mixed feeling about it, and often prefer to use the old portal. The new portal is busier, slower and more confusing. Here is the equivalent to the previous New screen shown above:

image

The icons are prettier, but there is something suspiciously like an ad at top right; I would rather see more services, with bigger text and smaller icons; the text conveys more information.

Let’s look at scaling a website. In the old portal, you select a website, then click Scale in the top menu to get to a nice scaling screen where you can set up autoscaling, define the number of instances and so on.

How do you find this in the new portal? You get this screen when you select a website (I have blanked out the name of the site).

image

This screen scrolls vertically and if you scroll down you can find a small Scale panel. Click it and you get to the scaling panel, which has a nicely done UI though the way panels constantly appear and disappear is something you have to get used to.

There are also additional scaling options in the preview portal (the old one only offers scaling based on CPU usage):

image

The preview portal also integrates with Visual Studio online for cloud-based devops.

The challenge for Microsoft is that the old portal set a high bar for clarity and usability. The preview portal does more than the old, and is more fit for purpose as the number and capability of Azure services increases, but its designers need to resist the temptation to let prettiness obstruct performance and efficiency.

Developers can give feedback on the portal here.

Microsoft integrates Azure websites with hybrid cloud

Microsoft has announced the integration of Azure websites with Azure virtual networks, including access to on-premise resources if you have a site-to-site VPN.

The Virtual Network feature grants your website access to resources running your VNET that includes being able to access web services or databases running on your Azure Virtual Machines. If your VNET is connected to your on premise network with Site to Site VPN, then your Azure Website will now be able to access on premise systems through the Azure Websites Virtual Network feature.

Azure websites let you deploy web applications running on IIS (Microsoft’s web server) hosted in Microsoft’s cloud. The application platform can be framework can be ASP.NET, Java, PHP, Node.js or Python. There are Free, Shared and Basic tiers which are mainly for prototyping, and a Standard tier which has auto-scaling features, managed through Microsoft’s web portal:

image

The development tool is Visual Studio, which now has strong integration with Azure.

Integration with virtual networks is a significant feature. You could now host what is in effect an intranet application on Azure if it is convenient. If it is only used in working hours, say, or mainly used in the first couple of hours in the morning, you could scale it accordingly.

Have a look at that web configuration page above, and compare it with the intricacies of System Center. It is a huge difference and shows that some parts of Microsoft have learned that usability matters, even for systems aimed at IT professionals.

Developing an app on Microsoft Azure: a few quick reflections

I have recently completed (if applications are ever completed) an application which runs on Microsoft’s Azure platform. I used lots of Microsoft technology:

  • Visual Studio 2013
  • Visual Studio Online with Team Foundation version control
  • ASP.NET MVC 4.0
  • Entity Framework 4.0
  • Azure SQL
  • Azure Active Directory
  • Azure Web Sites
  • Azure Blob Storage
  • Microsoft .NET 4.5 with C#

The good news: the app works well and performance is good. The application handles the upload and download of large files by authorised users, and replaces a previous solution using a public file sending service. We were pleased to find that the new application is a little faster for upload and download, as well as offering better control over user access and a more professional appearance.

There were some complications though. The requirement was for internal users to log in with their Office 365 (Azure Active Directory) credentials, but for external users (the company’s customers) to log in with credentials stored in a SQL Server database – in other words, hybrid authentication. It turns out you can do this reasonably seamlessly by implementing IPrincipal in a custom class to support the database login. This is largely uncharted territory though in terms of official documentation and took some effort.

Second, Microsoft’s Azure Active Directory support for custom applications is half-baked. You can create an application that supports Azure AD login in a few moments with Visual Studio, but it does not give you any access to metadata like to which security groups the user belongs. I have posted about this in more detail here. There is an API of course, but it is currently a moving target: be prepared for some hassle if you try this.

Third, while Azure Blob Storage itself seems to work well, most of the resources for developers seem to have little idea of what a large file is. Since a primary use case for cloud storage is to cover scenarios where email attachments are not good enough, it seems to me that handling large files (by which I mean multiple GB) should be considered normal rather than exceptional. By way of mitigation, the API itself has been written with large files in mind, so it all works fine once you figure it out. More on this here.

What about Visual Studio? The experience has been good overall. Once you have configured the project correctly, you can update the site on Azure simply by hitting Publish and clicking Next a few times. There is some awkwardness over configuration for local debugging versus deployment. You probably want to connect to a local SQL Server and the Azure storage emulator when debugging, and the Azure hosted versions after publishing. Visual Studio has a Web.Debug.Config and a Web.Release.Config which lets you apply a transformation to your main Web.Config when publishing – though note that these do not have any effect when you simply run your project in Release mode. The correct usage is to set Web.Config to what you want for debugging, and apply the deployment configuration in Web.Release.Config; then it all works.

The piece that caused me most grief was a setting for <wsFederation>. When a user logs in with Azure AD, they get redirected to a Microsoft site to log in, and then back to the application. Applications have to be registered in Azure AD for this to work. There is some uncertainty though about whether the reply attribute, which specifies the redirection back to the app, needs to be set explicitly or not. In practice I found that it does need to be explicit, otherwise you get redirected to the deployed site even when debugging locally – not good.

I have mixed feelings about Team Foundation version control. It works, and I like having a web-based repository for my code. On the other hand, it is slow, and Visual Studio sulks from time to time and requires you to re-enter credentials (Microsoft seems to love making you do that). If you have a less than stellar internet connection (or even a good one), Visual Studio freezes from time to time since the source control stuff is not good at working in the background. It usually unfreezes eventually.

As an experiment, I set the project to require a successful build before check-in. The idea is that you cannot check in a broken build. However, this build has to take place on the server, not locally. So you try to check in, Visual Studio says a build is required, and prompts you to initiate it. You do so, and a build is queued. Some time later (5-10 minutes) the build completes and a dialog appears behind the IDE saying that you need to reconcile changes – even if there are none. Confusing.

What about Entity Framework? I have mixed feelings here too, and have posted separately on the subject. I used code-first: just create your classes and add them to your DbContext and all the data access code is handled for you, kind-of. It makes sense to use EF in an ASP.NET MVC project since the framework expects it, though it is not compulsory. I do miss the control you get from writing your own SQL though; and found myself using the SqlQuery method on occasion to recover some of that control.

Finally, a few notes on ASP.NET MVC. I mostly like it; the separation between Razor views (essentially HTML templates into which you pour your data at runtime) and the code which implements your business logic and data access is excellent. The code can get convoluted though. Have a look at this useful piece on the ASP.NET MVC WebGrid and this remark:

grid.Column("Name",
  format: @<text>@Html.ActionLink((string)item.Name,
  "Details", "Product", new { id = item.ProductId }, null)</text>),

The format parameter is actually a Func, but the Razor view engine hides that from us. But you’re free to pass a Func—for example, you could use a lambda expression.

The code works fine but is it natural and intuitive? Why, for example, do you have to cast the first argument to ActionLink to a string for it to work (I can confirm that it is necessary), and would you have worked this out without help?

I also hit a problem restyling the pages generated by Visual Studio, which use the twitter Bootstrap framework. The problem is that bootstrap.css is a generated file and it does not make sense to edit it directly. Rather, you should edit some variables and use them as input to regenerate it. I came up with a solution which I posted on stackoverflow but no comments yet – perhaps this post will stimulate some, as I am not sure if I found the best approach.

My sense is that what ASP.NET MVC is largely a thing of beauty, it has left behind more casual developers who want a quick and easy way to write business applications. Put another way, the framework is somewhat challenging for newcomers and that in turn affects the breadth of its adoption.

Developing on Azure and using Azure AD makes perfect sense for businesses which are using the Microsoft platform, especially if they use Office 365, and the level of integration on offer, together with the convenience of cloud hosting and anywhere access, is outstanding. There remain some issues with the maturity of the frameworks, ever-changing libraries, and poor or confusing documentation.

Since this area is strategic for Microsoft, I suggest that it would benefit the company to work hard on pulling it all together more effectively.

A note on Azure storage and downloading large files

I have written a simple ASP.NET MVC application for upload and download of files to/from Azure storage.

Getting large file upload to work was the first exercise, described here. That is working well; but what about download?

If your files in Azure storage are public, you can simply serve an URL to the file. If it is not public though, you have a couple of choices:

1. Download the file under application control, by writing to Response.OutputStream or using a FileResult action.

2. Issue a Shared Access Signature (SAS) to the client which enables it to retrieve the file directly from Azure storage. The SAS is sent as an URL argument which tells Azure storage that the request is authorised. The browser downloads the file directly, so it makes no difference to your web application if the file is large.

Note that if you use the first option, it will not work with large files if you simply call DownloadToStream or similar:

container.GetBlockBlobReference(FileName).DownloadToStream(Response.OutputStream);

Why not? Well, the way this code works is that it downloads the large file to the web server, then sends it to the browser. What if your large file is 5GB? The browser will wait a long time for the first byte to be served (giving the user an unresponsive page); but before that happens, the web application will probably throw an exception because it does not like downloading such a large file.

This means the SAS option is a good one, though note that you have to specify an expiry time which could cause problems for users on a slow connection.

Another option is to serve the file in chunks. Use CloudBlockBlob.DownloadRangeToStream to write to Response.OutputStream in a loop until the download is complete. Call Response.Flush() after each chunk to send the chunk to the browser immediately.

This gives the user a nice responsive download experience complete with a cancel option as provided by the browser, and does not crash the application on the server. It seems to me a reasonable approach if the web application is also hosted on Azure and therefore has a fast connection to Azure storage.

What about resuming a failed download? The SAS approach should work as Azure supports it. You could also support this in your app with some additional work since Resume means reading the Range header in a GET request. I have not tried doing this but you might find some clues here.

Developing an ASP.NET MVC app with Azure Active Directory: an ordeal

Regular readers will know that I am working on a simple (I thought) ASP.NET MVC application which is hosted on Azure and uses Azure Blob Storage.

So far so good; but since this business uses Office 365 it seemed to me logical to have users log in using Azure Active Directory (AD). Visual Studio 2013, with the latest update, has a nice wizard to set this up. Just complete the following dialog when starting your new project:

image

This worked fairly well, and users can log in successfully using Azure AD and their normal Office 365 credentials.

I love this level of integration and it seems to me key and strategic for the Microsoft platform. If an employee leaves, or changes role, just update Active Directory and all application access comes into line automatically, whether on premise or in the cloud.

The next stage though was to define some user types; to keep things simple, let us say we have an AppAdmin role for users with full access to the application, and an AppUser role for users with limited access. Other users in the organisation do not need access at all and should not be able to log in.

The obvious way to do this is with AD groups, but I was surprised to discover that there is no easy way to discover to which groups an AD user belongs. The Azure AD integration which the wizard generates is only half done. Users can log in, and you can programmatically retrieve basic information including the firstname, lastname, User Principal Name and object ID, but nothing further.

Fair enough, I thought, there will be some libraries out there that fill the gap; and this is how the nightmare begins. The problem is that this is the cutting edge of .NET cloud development and is an area of rapid change. Yes there are samples out there, but each one (including the official ones on MSDN) seems to be written at a different time, with a different approach, with different .NET assembly dependencies, and varying levels of alpha/beta/experimental status.

The one common thread is that to get the AD group information you need to use the Graph API, a REST API for querying and even writing to Azure Active Directory. In January 2013, Microsoft identity expert Vittorio Bertocci (Principal Program Manager in the Windows Azure Active Directory team at Microsoft) wrote a helpful post about how to restore IsInRole() and [Authorize] in ASP.NET apps using Azure AD – exactly what I wanted to do. He describes essentially a manual approach, though he does make use of a library called Azure Authentication Library (AAL) which you can find on Nuget (the package manager for .NET libraries used by Visual Studio) described as a Beta.

That would probably work, but AAL is last year’s thing and you are meant to use ADAL (Active Directory Authentication Library) instead. ADAL is available in various versions ranging from 1.0.3 which is a finished release, to 2.6.2 which is an alpha release. Of course Bertocci has not updated his post so you can use the obsolete AAL beta if you dare, or use ADAL if you can figure out how to amend the code and which version is the best/safest to employ. Or you can write your own wrapper for the Graph API and bypass all the Nuget packages.

I searched for a better sample, but it gets worse. If you browse around MSDN you will probably come across this article along with this sample which is a Task Tracker application using Azure AD, though note the warnings:

NOTE: This sample is outdated. Its technology, methods, and/or user interface instructions have been replaced by newer features. To see an updated sample that builds a similar application, see WebApp-GraphAPI-DotNet.

Despite the warnings, the older sample is widely referenced in Microsoft posts like this one by Rick Anderson.

OK then, let’s look at at the shiny new sample, even though it is less well documented. It is called WebApp-GraphAPI-DotNet and includes code to get the user profile, roles, contacts and groups from Azure AD using the latest Graph API client: Microsoft.Azure.ActiveDirectory.GraphClient. This replaces an older effort called the GraphHelper which you will find widely used elsewhere.

If you dig into this new sample though, you will find a ton of dependencies on pre-release assemblies. You are not just dealing the Graph API, but also with OWIN (Open Web Interface for .NET), which seems to be Microsoft’s current direction for communication between web applications.

After messing around with Nuget packages and trying to get WebApp-GraphAPI-DotNet working I realised that I was not happy with all this preview code which is likely to break as further updates come along. Further, it does far more than I want. All I need is actually contained in Bertocci’s January 2013 post about getting back IsInRole.

I ended up patching together some code using the older GraphHelper (as found in the obsolete Task Tracker application) and it is working. I can now use IsInRole based on AD groups.

This is a mess. It is a simple requirement and it should not be necessary to plough through all these complicated and conflicting documents and samples to achieve it.

Notes from the field: putting Azure Blob storage into practice

I rashly agreed to create a small web application that uploads files into Azure storage. Azure Blob storage is Microsoft’s equivalent to Amazon’s S3 (Simple Storage Service), a cloud service for storing files of up to 200GB.

File upload performance can be an issue, though if you want to test how fast your application can go, try it from an Azure VM: performance is fantastic, as you would expect from an Azure to Azure connection in the same region.

I am using ASP.NET MVC and thought a sample like this official one, Uploading large files using ASP.NET Web API and Azure Blob Storage, would be all I needed. It is a start, but the method used only works for small files. What it does is:

1. Receive a file via HTTP Post.

2. Once the file has been received by the web server, calls CloudBlob.UploadFile to upload the file to Azure blob storage.

What’s the problem? Leaving aside the fact that CloudBlob is deprecated (you are meant to use CloudBlockBlob), there are obvious problems with files that are more than a few MB in size. The expectation today is that users see some sort of progress bar when uploading, and a well-written application will be resistant to brief connection breaks. Many users have asynchronous internet connections (such as ADSL) with slow upload; large files will take a long time and something can easily go wrong. The sample is not resilient at all.

Another issue is that web servers do not appreciate receiving huge files in one operation. Imagine you are uploading the ISO for a DVD, perhaps a 3GB file. The simple approach of posting the file and having the web server upload it to Azure blob storage introduces obvious strain and probably will not work, even if you do mess around with maxRequestLength and maxAllowedContentLength in ASP.NET and IIS. I would not mind so much if the sample were not called “Uploading large files”; the author perhaps has a different idea of what is a large file.

Worth noting too that one developer hit a bug with blobs greater than 5.5MB when uploaded over HTTPS, which most real-world businesses will require.

What then are you meant to do? The correct approach, as far as I can tell, is to send your large files in small chunks called blocks. These are uploaded to Azure using CloudBlockBlob.PutBlock. You identify each block with an ID string, and when all the blocks are uploaded, called CloudBlockBlob.PutBlockList with a list of IDs in the correct order.

This is the approach taken by Suprotim Agarwal in his example of uploading big files, which works and is a great deal better than the Microsoft sample. It even has a progress bar and some retry logic. I tried this approach, with a few tweaks. Using a 35MB file, I got about 80 KB/s with my ADSL broadband, a bit worse than the performance I usually get with FTP.

Can performance be improved? I wondered what benefit you get from uploading blocks in parallel. Azure Storage does not mind what order the blocks are uploaded. I adapted Agarwal’s sample to use multiple AJAX calls each uploading a block, experimenting with up to 8 simultaneous uploads from the browser.

The initial results were disappointing. Eventually I figured out that I was not actually achieving parallel uploads at all. The reason is that the application uses ASP.NET session state, and IIS will block multiple connections in the same session unless you mark your ASP.NET MVC controller class  with the SessionStateBehavior.ReadOnly attribute.

I fixed that, and now I do get multiple parallel uploads. Performance improved to around 105 KB/s, worthwhile though not dramatic.

What about using a Windows desktop application to upload large files? I was surprised to find little improvement. But can parallel uploading help here too? The answer is that it should happen anyway, handled by the .NET client library, according to this document:

If you are writing a block blob that is no more than 64 MB in size, you can upload it in its entirety with a single write operation. Storage clients default to a 32 MB maximum single block upload, settable using the SingleBlobUploadThresholdInBytes property. When a block blob upload is larger than the value in this property, storage clients break the file into blocks. You can set the number of threads used to upload the blocks in parallel using the ParallelOperationThreadCount property.

It sounds as if there is little advantage in writing your own chunking code, except that if you just call the UploadFromFile or UploadFromStream methods of CloudBlockBlob, you do not get any progress notification event (though you can get a retry notification from an OperationContext object passed to the method). Therefore I looked around for a sample using parallel uploads, and found this one from Microsoft MVP Tyler Doerksen, using C#’s Parallel.For.

Be warned: it does not work! Doerksen’s approach is to upload the entire file into memory (not great, but not as bad as on a web server), send it in chunks using CloudBlockBlob.PutBlock, adding the block ID to a collection at the same time, and then to call CloudBlockBlob.PutBlockList. The reason it does not work is that the order of the loops in Parallel.For is indeterminate, so the block IDs are unlikely to be in the right order.

I fixed this, it tested OK, and then I decided to further improve it by reading each chunk from the file within the loop, rather than loading the entire file into memory. I then puzzled over why my code was broken. The files uploaded, but they were corrupt. I worked it out. In the following code, fs is a FileStream object:

fs.Position = x * blockLength;
bytesread = fs.Read(chunk, 0, currentLength);

Spot the problem? Since fs is a variable declared outside the loop, other threads were setting its position during the read operation, with random results. I fixed it like this:

lock (fs)
{
fs.Position = x * blockLength;
bytesread = fs.Read(chunk, 0, currentLength);
}

and the file corruption disappeared.

I am not sure why, but the manually coded parallel uploads seem to slightly but not dramatically improve performance, to around 100-105 KB/s, almost exactly what my ASP.NET MVC application achieves over my broadband connection.

image

There is another approach worth mentioning. It is possible to bypass the web server and upload directly from the browser to Azure storage. To do this, you need to allow cross-origin resource sharing (CORS) as explained here. You also need to issue a Shared Access Signature, a temporary key that allows read-write access to Azure storage. A guy called Blair Chen seems to have this all figured out, as you can see from his Azure speed test and jazure JavaScript library, which makes it easy to upload a blob from the browser.

I was contemplating going that route, but it seems that performance is no better (judging by the Test Upload Big Files section of Chen’s speed test), so I should probably be content with the parallel JavaScript upload solution, which avoids fiddling with CORS.

Overall, has my experience with the Blob storage API been good? I have not found any issues with the service itself so far, but the documentation and samples could be better. This page should be the jumping off point for all you need to know for a basic application like mine, but I did not find it easy to find good samples or documentation for what I thought would be a common scenario, uploading large files with ASP.NET MVC.

Update: since writing this post I have come across this post by Rob Gillen which addresses the performance issue in detail (and links to working Parallel.For code); however I suspect that since the post is four years old the conclusions are no longer valid, because of improvements to the Azure storage client library.