Hook yourself to CLOUD: September 2017

Monday, 25 September 2017

Scaling Azure SQL databases up and down

I've recently been working on a project where we push data from an on-premise database to a set of Azure SQL databases daily, and use that Azure SQL database as the source for a Power BI dataset. The dataset gets refreshed once a day.

Our Azure SQL data marts are pretty small - all but one of them are under 2 GB, and the largest one is about 3 GB. The ETL and Power BI refresh happen overnight - the ETL starts at 1 am, and the Power BI refresh is scheduled at 3 am.

In my SSIS packages, I am loading a few tables that have anywhere from a few thousand to hundreds of thousands of rows in them, and a few tables that have less than 100 rows.

For the small tables, I just truncate them and insert the new rows.

For the large tables, I use some change detection logic to determine which rows are the same in Azure as in the source, and which ones are new or changed, and which ones should no longer exist in Azure. That pattern will be the subject of another blog post.

What I was finding was that I was exceeding my DTU allocation for my Azure SQL databases frequently, and Azure was restricting the SQL database's response times according to its DTU limit.

I decided that I could decrease the overnight data refresh window by scaling up the Azure SQL databases before loading them, and scaling them down again after the Power BI refresh was complete.

After a failed attempt at using Azure automation and adapting a runbook that uses AzureRM PowerShell scripts to set database properties, I happened across this T-SQL equivalent.

ALTER DATABASE MyAzureDb MODIFY (Edition='basic', Service_objective='basic')

When you run this statement, it returns immediately. In the background, Azure is making a copy of the current database with the new service objective level, and when the new copy is ready, it swaps out the current database for the new copy. Based on how much activity is happening at the time of the switch, the change will be transparent to any existing connections. Microsoft does warn that some transactions may be rolled back, so I wanted to wait until the scaling request was complete before I started the heavy-duty ETL (even though I did build in retry logic into my SSIS packages).

Finding the T-SQL that told me when a scale request was complete was a little more difficult. If you try this:

select DATABASEPROPERTYEX('MyAzureDb','Edition') as Edition, DATABASEPROPERTYEX('MyAzureDb','Service_Objective') as ServiceObjective

... you will see the old service objective until the switch occurs. However, if you happen to try another ALTER Database statement to change the objective again while the switch is still happening, you'll get an error message like this:

A service objective assignment on server 'yyyyy' and database 'CopyOfXXXX' is already in progress. Please wait until the service objective assignment state for the database is marked as 'Completed'.

So I went looking to find this "objective assignment state" which was a little difficult to find. Eventually, I came across the Azure SQL-only view sys.dm_operation_status that tells me all the operations that are applied in Azure SQL. It has a row for every operation, so if you've done a few service objective changes, you'll see a row for each. So basically I needed to find the most recent operation and see if it has an IN_PROGRESS status.

with currentDb as (
   select *,
      row_number() over (partition by resource_type, major_resource_id, minor_resource_id order by start_time desc) as rowNum
   from sys.dm_operation_status)
select
   major_resource_id as DbName,
   operation,
   state,
   state_desc,
   percent_complete,
   error_code,
   error_desc,
   error_severity,
   start_time
from 
   currentdb
where 
   rowNum = 1
   and resource_type_desc = 'Database'

You must be in master to get any results back from sys.dm_operation_status. This is what it returns for me:

You can see that I kicked off some service objective changes that were in progress when I captured this image.

In my scenario, I had several databases that I wanted to scale, and I had metadata in my on-premise database that specified what the scale up level was (S2) and the base level (most of them were basic, but one was larger than 2GB so it had to be S0). I wrote a stored procedure that returned a script containing all the ALTER DATABASE statements, and a WHILE loop that slept for 10 seconds if there were any rows in the sys.dm_operation_status view that were IN_PROGRESS.

In an SSIS package, I called the stored procedure in the on-premise database to get the script, and then I executed the script in the Azure SQL Server (in the master database context). The script would run as long as the service objective changes were still in progress in any database. In my case, this was only a couple minutes. (It takes much less time to change service objectives if you stay within the same edition - Standard or Premium. Moving between Basic and Standard or vice versa probably involves more changes in Azure.)

If I needed to, I might have separated the ALTER DATABASE script from the WHILE IN_PROGRESS script, and then only had the wait script for a specific database, not all databases that I changed. I just kept it simple.

And to help you out, I'm going to simplify it even further - here is a script to create a stored proc that takes the database name and service objective as input parameters, and returns a script that you execute on your SQL Azure server. The stored procedure can be created in an Azure SQL database if you like, or somewhere on your own server. Or you can just take this code and adapt it for your own use. The output script will change the service objective level for one database and wait until it is complete.

create proc Control.AzureDbScale (@dbName sysname, @serviceObjective nvarchar(20))
as
-- declare @serviceObjective nvarchar(20)
-- 'basic' | 'S0' | 'S1' | 'S2' | 'S3'| 'P1' | 'P2' | 'P3' | 'P4'| 'P6' | 'P11' | 'P15'
-- declare @dbName sysname

declare @sql nvarchar(3000)
select @sql = replace(replace(replace(replace(
   'Alter database [{{dbName}}] modify (Edition="{{edition}}", service_objective="{{service}}");
   waitfor delay "00:00:05"
   while (exists(select top 1 1 from (
      select
         *,
         row_number() over (partition by resource_type, major_resource_id, minor_resource_id order by start_time desc) as rowNum
      from sys.dm_operation_status) currentdb
      where rowNum = 1
         and resource_type_desc = "Database"
         and major_resource_id = "{{dbName}}"
         and state=1))
      begin
         waitfor delay "00:00:10";
      end'

      , '{{dbName}}', @DbName)
      , '{{edition}}', case when @serviceObjective like 'S%' then 'standard'
                            when @serviceObjective like 'P%' then 'premium'
                       else 'basic'
                       end)
      , '{{service}}', @serviceObjective)
      , '"', '''') 

select @sql as Script

I should give a shout-out to one of the Speaker Idol presenters at the PASS Summit last week whose topic was writing more readable dynamic SQL. I've used their technique above, using {{tokens}} in the string portion, then using REPLACE to replace the tokens with values. I've written my fair share of sprocs that generate and/or execute some dynamic sql and I always try to make it readable, and this technique is something I'll add to my toolbelt.

Here is a sample execution:

If you take the results and execute it while connected to an Azure SQL Server, in the master database context, it will change that database to service objective level S0 and wait until the change is complete.

This should help anyone who wants to scale an Azure SQL database and wait until the operation has completed.

Scaling out with Azure SQL Database

You can easily scale out Azure SQL databases using the Elastic Database tools. These tools and features let you use the virtually unlimited database resources of Azure SQL Database to create solutions for transactional workloads, and especially Software as a Service (SaaS) applications. Elastic Database features are composed of the following:

Elastic Database client library: The client library is a feature that allows you to create and maintain sharded databases. See Get started with Elastic Database tools.
Elastic Database split-merge tool: moves data between sharded databases. This is useful for moving data from a multi-tenant database to a single-tenant database (or vice-versa). See Elastic database Split-Merge tool tutorial.
Elastic Database jobs (preview): Use jobs to manage large numbers of Azure SQL databases. Easily perform administrative operations such as schema changes, credentials management, reference data updates, performance data collection or tenant (customer) telemetry collection using jobs.
Elastic Database query (preview): Enables you to run a Transact-SQL query that spans multiple databases. This enables connection to reporting tools such as Excel, PowerBI, Tableau, etc.
Elastic transactions: This feature allows you to run transactions that span several databases in Azure SQL Database. Elastic database transactions are available for .NET applications using ADO .NET and integrate with the familiar programming experience using the System.Transaction classes.

The graphic below shows an architecture that includes the Elastic Database features in relation to a collection of databases.

In this graphic, colors of the database represent schemas. Databases with the same color share the same schema.

A set of Azure SQL databases are hosted on Azure using sharding architecture.
The Elastic Database client library is used to manage a shard set.
A subset of the databases are put into an elastic pool. (See What is a pool?).
An Elastic Database job runs scheduled or ad-hoc T-SQL scripts against all databases.
The split-merge tool is used to move data from one shard to another.
The Elastic Database query allows you to write a query that spans all databases in the shard set.
Elastic transactions allows you to run transactions that span several databases.

Why use the tools?

Achieving elasticity and scale for cloud applications has been straightforward for VMs and blob storage--simply add or subtract units, or increase power. But it has remained a challenge for stateful data processing in relational databases. Challenges emerged in these scenarios:

Growing and shrinking capacity for the relational database part of your workload.
Managing hotspots that may arise affecting a specific subset of data - such as a particularly busy end-customer (tenant).

Traditionally, scenarios like these have been addressed by investing in larger-scale database servers to support the application. However, this option is limited in the cloud where all processing happens on predefined commodity hardware. Instead, distributing data and processing across many identically-structured databases (a scale-out pattern known as "sharding") provides an alternative to traditional scale-up approaches both in terms of cost and elasticity.

Horizontal and vertical scaling

The figure below shows the horizontal and vertical dimensions of scaling, which are the basic ways the elastic databases can be scaled.

Horizontal scaling refers to adding or removing databases in order to adjust capacity or overall performance. This is also called “scaling out”. Sharding, in which data is partitioned across a collection of identically structured databases, is a common way to implement horizontal scaling.

Vertical scaling refers to increasing or decreasing the performance level of an individual database—this is also known as “scaling up.”

Most cloud-scale database applications will use a combination of these two strategies. For example, a Software as a Service application may use horizontal scaling to provision new end-customers and vertical scaling to allow each end-customer’s database to grow or shrink resources as needed by the workload.

Horizontal scaling is managed using the Elastic Database client library.
Vertical scaling is accomplished using Azure PowerShell cmdlets to change the service tier, or by placing databases in an elastic pool.

Sharding

Sharding is a technique to distribute large amounts of identically-structured data across a number of independent databases. It is especially popular with cloud developers creating Software as a Service (SAAS) offerings for end customers or businesses. These end customers are often referred to as “tenants”. Sharding may be required for any number of reasons:

The total amount of data is too large to fit within the constraints of a single database
The transaction throughput of the overall workload exceeds the capabilities of a single database
Tenants may require physical isolation from each other, so separate databases are needed for each tenant
Different sections of a database may need to reside in different geographies for compliance, performance or geopolitical reasons.

In other scenarios, such as ingestion of data from distributed devices, sharding can be used to fill a set of databases that are organized temporally. For example, a separate database can be dedicated to each day or week. In that case, the sharding key can be an integer representing the date (present in all rows of the sharded tables) and queries retrieving information for a date range must be routed by the application to the subset of databases covering the range in question.

Sharding works best when every transaction in an application can be restricted to a single value of a sharding key. That ensures that all transactions will be local to a specific database.

Multi-tenant and single-tenant

Some applications use the simplest approach of creating a separate database for each tenant. This is the single tenant sharding pattern that provides isolation, backup/restore ability and resource scaling at the granularity of the tenant. With single tenant sharding, each database is associated with a specific tenant ID value (or customer key value), but that key need not always be present in the data itself. It is the application’s responsibility to route each request to the appropriate database - and the client library can simplify this.

Others scenarios pack multiple tenants together into databases, rather than isolating them into separate databases. This is a typical multi-tenant sharding pattern - and it may be driven by the fact that an application manages large numbers of very small tenants. In multi-tenant sharding, the rows in the database tables are all designed to carry a key identifying the tenant ID or sharding key. Again, the application tier is responsible for routing a tenant’s request to the appropriate database, and this can be supported by the elastic database client library. In addition, row-level security can be used to filter which rows each tenant can access - for details, see Multi-tenant applications with elastic database tools and row-level security. Redistributing data among databases may be needed with the multi-tenant sharding pattern, and this is facilitated by the elastic database split-merge tool. To learn more about design patterns for SaaS applications using elastic pools, see Design Patterns for Multi-tenant SaaS Applications with Azure SQL Database.

Move data from multiple to single-tenancy databases

When creating a SaaS application, it is typical to offer prospective customers a trial version of the software. In this case, it is cost-effective to use a multi-tenant database for the data. However, when a prospect becomes a customer, a single-tenant database is better since it provides better performance. If the customer had created data during the trial period, use the split-merge tool to move the data from the multi-tenant to the new single-tenant database.

Next steps

For a sample app that demonstrates the client library, see Get started with Elastic Datababase tools.1

To convert existing databases to use the tools, see Migrate existing databases to scale-out.

To see the specifics of the elastic pool, see Price and performance considerations for an elastic pool, or create a new pool with elastic pools.

Additional resources

Not using elastic database tools yet? Check out our Getting Started Guide. For questions, please reach out to us on the SQL Database forum and for feature requests, please add them to the SQL Database feedback forum.

Thursday, 14 September 2017

Application Design: Going Stateless on Azure

The components of a cloud application are distributed and deployed among multiple cloud resources (virtual machines) to benefit from the elastic demand driven environment. One of the most important factor in this elastic cloud is the ability to add or remove application components and resources as and when required to fulfil scalability needs.

However, while removing the components, this internal state or information may be lost.

That’s when the application needs to segregate their internal state from an in-memory store to a persistent data store so that the scalability and reliability are assured even in case of reduction of components as well as in the case of failures. In this article, we will understand ‘being stateless’ and will explore strategies like Database-driven State Management, and Cache driven State Management.

Being stateless

Statelessness refers to the fact that no data is preserved in the application memory itself between multiple runs of the strategy (i.e. action). When same strategy is executed multiple times, no data from a run of strategy is carried over to another. Statelessness allows our system to execute the first run of the strategy on a resource (say X) in cloud, the second one on another available resource (say Y, or even on X) in cloud and so on.

This doesn’t mean that applications should not have any state. It merely means that the actions should be designed to be stateless and should be provided with the necessary context to build up the state.

If our application has a series of such actions (say A1, A2, A3…) to be performed, each action (say A1) receives context information (say C1), executes the action and builds up the context (say C2) for next action (say A2). However, Action A2 should not necessarily depend on Action A1 and should be able to be executed independently using context C2 available to it.

How can we make our application stateless?

The conventional approach to having stateless applications is to push the state from web/services out of the application tier to somewhere else – either in configuration or persistent store. As shown in diagram below, the user request is routed through App Tier that can refer to the configuration to decide the persistent store (like, database) to store the state. Finally, an application utility service (preferably, isolated from application tier) can perform state management

The App Utility Service (in the above diagram) takes the onus of state management. It requires the execution context from App Tier so that it can trigger either a data-driven state machine or an event-drive state machine. An example of state machine for bug management system would have 4 states as shown below

To achieve this statelessness in application, there are several strategies to push the application state out of the application tier. Let’s consider few of them.

Database-drive State Management

Taking the same bug management system as an example, we can derive the state using simple data structures stored in database tables.

Current State	Event	Action	Next State
START	NewBug	OpenNew	Bug Opened
Bug Opened	Assigned	AssignForFix	Fix Needed
Bug Opened	Not A Bug	MarkClosed	Bug Closed
Fix Needed	Resolved	MarkResolved	Bug Fixed
Fix Needed	ReOpened	AssignForFix	Fix Needed
Bug Fixed	Tested	MarkClosed	Bug Closed
Bug Fixed	ReOpened	MarkOpen	Fix Needed
Bug Closed	–	–	END

The above structure only defines the finite states that a bug resolution can visit. Each action needs to be context-aware (i.e. minimal bug information and sometimes the state from which the action was invoked) so that it can independently process the bug and identify the next state (especially when multiple end-states are possible).

When we look at database-drive state management on Azure, we can leverage one of these out-of-the-box solutions

Azure SQL Database: The Best choice when we want to work with relational & structured data using relations, indexes, constraints, etc. It is a complete suite of MS-SQL database hosted on Azure.
Azure Storage Tables: Works great when we want to work with structured data without relationships, possibly with larger volumes. A lot of times better performance at lower cost is observed with Storage Tables especially when used for data without relationships. Further read on this topic – SQL Azure and Microsoft Azure Table Storage by Joseph Fultz
DocumentDB: DocumentDB, a NoSQL database, pitches itself as a solution to store unstructured data (schema-free) and can have rich query capabilities at blazing speeds. Unlike other document based NoSQL databases, it allows creation of stored procedures and querying with SQL statements.

Depending on our tech stack, size of the state and the expected number of state retrievals, we can choose one of the above solutions.

While moving the state management to database works for most of the scenarios, there are times when these read-writes to database may slow down the performance of our application. Considering state is transient data and most of it is not required to be persisted across two sessions of the user, there is a need of a cache system that provides us state objects with low-latency speeds.

Cache driven state management

To persist state data using a cache store is also an excellent option available to developers. Web developers have been storing state data (like, user preferences, shopping carts, etc.) in cache stores ever since ASP.NET was introduced. By default, ASP.NET allows state storage in memory of the hosting application pool. In-memory state storage is required following reasons:

The frequency at which ASP.NET worker process recycles is beyond the scope of application and it can cause the in-memory cache to be wiped off
With a load balancer in the cloud, there isn’t any guarantee that the host that processed first request will also receive the second one. So there are chances that the in-memory information on multiple servers may not be in sync

The typical in-memory state management is referred as ‘In-role’ cache when this application is hosted on Azure platform.

Other alternatives to in-memory state management are out-of-proc management where state is managed either by a separate service or in SQL server – something that we discussed in the last section. This mechanism assures resiliency at the cost of performance. For every request to be processed, there will be additional network calls to retrieve state information before the request is processed, and another network call to store the new state.

The need of the hour is to have a high-performance, in-memory or distributed caching service that can leverage Azure infrastructure to act as a low-latency state store – like, Azure Redis Cache.

Based on the tenancy of the application, we can have a single node or multiple node (primary/secondary) node of Redis Cache to store data types such as lists, hashed sets, sorted sets and bitmaps.

Azure Redis cache supports master-slave replication with very fast non-blocking first synchronization and auto-reconnection on net split. So, when we choose multiple-nodes for Redis cache management, we are ensuring that our application state is not managed on single server. Our application state get replicated on multiple nodes (i.e. slaves) at real-time. It also promises to bring up the slave node automatically when the master node is offline.

Fault tolerance with State Management Strategies

With both database-driven state management and cache-driven state management, we also need to handle temporary service interruptions – possibly due to network connections, layers of load-balancers in the cloud or some backbone services that these solutions use. To give a seamless experience to our end-users, our application design should cater to handle these transient failures.

Handling database transient errors

Using Transient Fault Handling Application Block, with plain vanilla ADO.NET, we can define policy to retry execution of database command and wait period between tries to provide a reliable connection to database. Or, if our application is using any version of Entity Framework, we can include SqlAzureExecutionStrategy, an execution strategy that configures the policy to retry 3 times with an exponential wait between tries.

Every retry consumes computation power and slows down the application performance. So, we should define a policy, a circuit breaker that prevents throttling of service by processing the failed requests. There is no-one-size-fits all solution to breaking the retries.

There are 2 ways to implement a circuit breaker for state management –

Fallback or Fail silent– If there is a fallback mechanism to complete the requested functionality without the state management, the application should attempt executing it. For example, when the database is not available, the application can fallback on cache object. If no fallback is available, our application can fail silent (i.e. return a void state for a request).
Fail fast – Error out the user to avoid flooding the retry service and provide a friendly response to try later.

Handling cache transient errors

Azure Redis cache internally uses ConnectionMultiplexer that automatically reconnects to Redis cache should there be disconnection or Internet glitch. However, the StackExchange.Redis does not retry for the get and set commands. To overcome this limitation, we can use library such as Polly that provide policies like Retry, Retry Forever, Wait and Retry and Circuit Breaker in a fluent manner.

The take-away!

The key take-away is to design applications considering that the infrastructure in cloud is elastic and that our applications should be designed to leverage its benefits without compromising the stability and user experience. It is, hence, utmost important to think about application information storage, its access mechanisms, exception handling and dynamic demand.

Wednesday, 6 September 2017

Azure – 24 Must Know Cloud Patterns With Sample Code

Your application may start with a single idea as a single website. It will often have a website, some business logic tied to a database. Those stand alone applications have a way of adding features.

Or your application may want to be “cloud ready” from the beginning. The vision may begin with a set of servers, each doing a specific task, each that can be scalable to meet demand, provide reliability. As soon as you take that second step, it’s time to look to well known practices.

Microsoft’s Patterns and Practices team has put together architectural guidance to help you design your cloud applications, Cloud Design Patterns: Prescriptive Architecture Guidance for Cloud Applications. Each pattern is provided in a common format that describes the context and problem, the solution, issues and considerations for applying the pattern, and an example based on Azure.

It also discusses the benefits and considerations for each pattern. Most of the patterns have code samples or snippets that show how to implement the patterns using the features of Microsoft Azure.

Although the guidance helps you adopt Azure, the patterns are relevant to all kinds of distributed systems, whether or not they are hosted on Azure or on other cloud platforms.

Resources

The Pattern and Practices team provides:

A downloadable book: Download the book in PDF format
Code samples: Download the code samples
An actual paperback book: Order paperback book
Poster: Download poster

My goal here is to give you a quick idea of each pattern and provide links so you can get started when that pattern fits your need.

The Patterns

Cache-aside Pattern. Load data on demand into a cache from a data store. This pattern can improve performance and also helps to maintain consistency between data held in the cache and the data in the underlying data store.

Circuit Breaker Pattern. Handle faults that may take a variable amount of time to rectify when connecting to a remote service or resource. This pattern can improve the stability and resiliency of an application.

Compensating Transaction Pattern. Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the operations fails. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.

Competing Consumers Pattern. Enable multiple concurrent consumers to process messages received on the same messaging channel. This pattern enables a system to process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload. This was the original pattern that I taught as an evangelist. It’s where one or more applications (or instances) generate tasks (usually in a Web role) then send a message in the queue. A worker role is listening on the queue for work to do and then executes some task.

Compute Resource Consolidation Pattern. Consolidate multiple tasks or operations into a single computational unit. This pattern can increase compute resource utilization, and reduce the costs and management overhead associated with performing compute processing in cloud-hosted applications.

Command and Query Responsibility Segregation (CQRS) Pattern. Segregate operations that read data from operations that update data by using separate interfaces. This pattern can maximize performance, scalability, and security; support evolution of the system over time through higher flexibility; and prevent update commands from causing merge conflicts at the domain level.

Event Sourcing Pattern. Use an append-only store to record the full series of events that describe actions taken on data in a domain, rather than storing just the current state, so that the store can be used to materialize the domain objects. This pattern can simplify tasks in complex domains by avoiding the requirement to synchronize the data model and the business domain; improve performance, scalability, and responsiveness; provide consistency for transactional data; and maintain full audit trails and history that may enable compensating actions.

External Configuration Store Pattern. Move configuration information out of the application deployment package to a centralized location. This pattern can provide opportunities for easier management and control of configuration data, and for sharing configuration data across applications and application instances.

Federated Identity Pattern. Delegate authentication to an external identity provider. This pattern can simplify development, minimize the requirement for user administration, and improve the user experience of the application.

Gatekeeper Pattern. Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. This pattern can provide an additional layer of security, and limit the attack surface of the system.

Health Endpoint Monitoring Pattern. Implement functional checks within an application that external tools can access through exposed endpoints at regular intervals. This pattern can help to verify that applications and services are performing correctly.

Index Table Pattern. Create indexes over the fields in data stores that are frequently referenced by query criteria. This pattern can improve query performance by allowing applications to more quickly retrieve data from a data store.

Leader Election Pattern. Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances. This pattern can help to ensure that tasks do not conflict with each other, cause contention for shared resources, or inadvertently interfere with the work that other task instances are performing.

Materialized View Pattern. Generate pre-populated views over the data in one or more data stores when the data is formatted in a way that does not favor the required query operations. This pattern can help to support efficient querying and data extraction, and improve application performance.

Pipes and Filters Pattern. Decompose a task that performs complex processing into a series of discrete elements that can be reused. This pattern can improve performance, scalability, and reusability by allowing task elements that perform the processing to be deployed and scaled independently.

Priority Queue Pattern. Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those of a lower priority. This pattern is useful in applications that offer different service level guarantees to individual types of client.

Queue-based Load Leveling Pattern. Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads that may otherwise cause the service to fail or the task to timeout. This pattern can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.

Retry Pattern. Enable an application to handle temporary failures when connecting to a service or network resource by transparently retrying the operation in the expectation that the failure is transient. This pattern can improve the stability of the application.

Runtime Reconfiguration Pattern. Design an application so that it can be reconfigured without requiring redeployment or restarting the application. This helps to maintain availability and minimize downtime.

Scheduler Agent Supervisor Pattern. Coordinate a set of actions across a distributed set of services and other remote resources, attempt to transparently handle faults if any of these actions fail, or undo the effects of the work performed if the system cannot recover from a fault. This pattern can add resiliency to a distributed system by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.

Sharding Pattern. Divide a data store into a set of horizontal partitions shards. This pattern can improve scalability when storing and accessing large volumes of data.

Static Content Hosting Pattern. Deploy static content to a cloud-based storage service that can deliver these directly to the client. This pattern can reduce the requirement for potentially expensive compute instances.

Throttling Pattern. Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This pattern can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.

Valet Key Pattern. Use a token or key that provides clients with restricted direct access to a specific resource or service in order to offload data transfer operations from the application code. This pattern is particularly useful in applications that use cloud-hosted storage systems or queues, and can minimize cost and maximize scalability and performance.

Guidance

Patterns and Practices offers a set of conceptual introductions to the concepts of cloud computing too. They call these primer and guidance topics, which are related to specific areas of application development, such as caching, data partitioning, and autoscaling.

Asynchronous Messaging Primer. Messaging is a key strategy employed in many distributed environments such as the cloud. It enables applications and services to communicate and cooperate, and can help to build scalable and resilient solutions. Messaging supports asynchronous operations, enabling you to decouple a process that consumes a service from the process that implements the service.

Autoscaling Guidance. Constantly monitoring performance and scaling a system to adapt to fluctuating workloads to meet capacity targets and optimize operational cost can be a labor-intensive process. It may not be feasible to perform these tasks manually. This is where autoscaling is useful.

Caching Guidance. Caching is a common technique that aims to improve the performance and scalability of a system by temporarily copying frequently accessed data to fast storage located close to the application. Caching is most effective when an application instance repeatedly reads the same data, especially if the original data store is slow relative to the speed of the cache, it is subject to a high level of contention, or it is far away resulting in network latency.

Compute Partitioning Guidance. When deploying an application to the cloud it may be desirable to allocate the services and components it uses in a way that helps to minimize running costs while maintaining the scalability, performance, availability, and security of the application.

Data Consistency Primer. In many large-scale solutions, data is divided into separate partitions that can be managed and accessed separately. The partitioning strategy must be chosen carefully to maximize the benefits while minimizing adverse effects. Partitioning can help to improve scalability, reduce contention, and optimize performance.

Data Replication and Synchronization Guidance. When you deploy an application to more than one datacenter, such as cloud and on-premises locations, you must consider how you will replicate and synchronize the data each instance of the application uses in order to maximize availability and performance, ensure consistency, and minimize data transfer costs between locations.

Instrumentation and Telemetry Guidance. Most applications will include diagnostics features that generate custom monitoring and debugging information, especially when an error occurs. This is referred to as instrumentation, and is usually implemented by adding event and error handling code to the application. The process of gathering remote information that is collected by instrumentation is usually referred to as telemetry.

Multiple Datacenter Deployment Guidance. Deploying an application to more than one datacenter can provide benefits such as increased availability and a better user experience across wider geographical areas. However, there are challenges that must be resolved, such as data synchronization and regulatory limitations.

Service Metering Guidance. You may need to meter the use of applications or services in order to plan future requirements; to gain an understanding of how they are used; or to bill users, organization departments, or customers. This is a common requirement, particularly in large corporations and for independent software vendors and service providers.

Code Samples

You can find the cloud code samples. Ten example applications that demonstrate the implementation of some of the patterns in this guide are available for you to download and run on your own computer or in your own Microsoft Azure subscription.