Thursday, 14 September 2017

Application Design: Going Stateless on Azure


The components of a cloud application are distributed and deployed among multiple cloud resources (virtual machines) to benefit from the elastic demand driven environment. One of the most important factor in this elastic cloud is the ability to add or remove application components and resources as and when required to fulfil scalability needs.
However, while removing the components, this internal state or information may be lost.
That’s when the application needs to segregate their internal state from an in-memory store to a persistent data store so that the scalability and reliability are assured even in case of reduction of components as well as in the case of failures.  In this article, we will understand ‘being stateless’ and will explore strategies like Database-driven State Management, and Cache driven State Management.

Being stateless


Statelessness refers to the fact that no data is preserved in the application memory itself between multiple runs of the strategy (i.e. action). When same strategy is executed multiple times, no data from a run of strategy is carried over to another. Statelessness allows our system to execute the first run of the strategy on a resource (say X) in cloud, the second one on another available resource (say Y, or even on X) in cloud and so on.
This doesn’t mean that applications should not have any state. It merely means that the actions should be designed to be stateless and should be provided with the necessary context to build up the state.
If our application has a series of such actions (say A1, A2, A3…) to be performed, each action (say A1) receives context information (say C1), executes the action and builds up the context (say C2) for next action (say A2). However, Action A2 should not necessarily depend on Action A1 and should be able to be executed independently using context C2 available to it.

How can we make our application stateless?


The conventional approach to having stateless applications is to push the state from web/services out of the application tier to somewhere else – either in configuration or persistent store. As shown in diagram below, the user request is routed through App Tier that can refer to the configuration to decide the persistent store (like, database) to store the state. Finally, an application utility service (preferably, isolated from application tier) can perform state management


The App Utility Service (in the above diagram) takes the onus of state management. It requires the execution context from App Tier so that it can trigger either a data-driven state machine or an event-drive state machine. An example of state machine for bug management system would have 4 states as shown below

To achieve this statelessness in application, there are several strategies to push the application state out of the application tier. Let’s consider few of them.

Database-drive State Management


Taking the same bug management system as an example, we can derive the state using simple data structures stored in database tables.
Current State
Event
Action
Next State
START
NewBug
OpenNew
Bug Opened
Bug Opened
Assigned
AssignForFix
Fix Needed
Not A Bug
MarkClosed
Bug Closed
Fix Needed
Resolved
MarkResolved
Bug Fixed
ReOpened
AssignForFix
Fix Needed
Bug Fixed
Tested
MarkClosed
Bug Closed
ReOpened
MarkOpen
Fix Needed
Bug Closed
END

The above structure only defines the finite states that a bug resolution can visit. Each action needs to be context-aware (i.e. minimal bug information and sometimes the state from which the action was invoked) so that it can independently process the bug and identify the next state (especially when multiple end-states are possible).
When we look at database-drive state management on Azure, we can leverage one of these out-of-the-box solutions
  • Azure SQL Database: The Best choice when we want to work with relational & structured data using relations, indexes, constraints, etc. It is a complete suite of MS-SQL database hosted on Azure.
  • Azure Storage Tables: Works great when we want to work with structured data without relationships, possibly with larger volumes. A lot of times better performance at lower cost is observed with Storage Tables especially when used for data without relationships. Further read on this topic – SQL Azure and Microsoft Azure Table Storage by Joseph Fultz
  • DocumentDB: DocumentDB, a NoSQL database, pitches itself as a solution to store unstructured data (schema-free) and can have rich query capabilities at blazing speeds. Unlike other document based NoSQL databases, it allows creation of stored procedures and querying with SQL statements.
Depending on our tech stack, size of the state and the expected number of state retrievals, we can choose one of the above solutions.
While moving the state management to database works for most of the scenarios, there are times when these read-writes to database may slow down the performance of our application. Considering state is transient data and most of it is not required to be persisted across two sessions of the user, there is a need of a cache system that provides us state objects with low-latency speeds.

Cache driven state management


To persist state data using a cache store is also an excellent option available to developers.  Web developers have been storing state data (like, user preferences, shopping carts, etc.) in cache stores ever since ASP.NET was introduced.  By default, ASP.NET allows state storage in memory of the hosting application pool.  In-memory state storage is required following reasons:
  • The frequency at which ASP.NET worker process recycles is beyond the scope of application and it can cause the in-memory cache to be wiped off
  • With a load balancer in the cloud, there isn’t any guarantee that the host that processed first request will also receive the second one. So there are chances that the in-memory information on multiple servers may not be in sync
The typical in-memory state management is referred as ‘In-role’ cache when this application is hosted on Azure platform.
Other alternatives to in-memory state management are out-of-proc management where state is managed either by a separate service or in SQL server – something that we discussed in the last section.  This mechanism assures resiliency at the cost of performance.  For every request to be processed, there will be additional network calls to retrieve state information before the request is processed, and another network call to store the new state.
The need of the hour is to have a high-performance, in-memory or distributed caching service that can leverage Azure infrastructure to act as a low-latency state store – like, Azure Redis Cache.
Based on the tenancy of the application, we can have a single node or multiple node (primary/secondary) node of Redis Cache to store data types such as lists, hashed sets, sorted sets and bitmaps.

Azure Redis cache supports master-slave replication with very fast non-blocking first synchronization and auto-reconnection on net split. So, when we choose multiple-nodes for Redis cache management, we are ensuring that our application state is not managed on single server. Our application state get replicated on multiple nodes (i.e. slaves) at real-time. It also promises to bring up the slave node automatically when the master node is offline.

Fault tolerance with State Management Strategies

With both database-driven state management and cache-driven state management, we also need to handle temporary service interruptions – possibly due to network connections, layers of load-balancers in the cloud or some backbone services that these solutions use. To give a seamless experience to our end-users, our application design should cater to handle these transient failures.
Handling database transient errors
Using Transient Fault Handling Application Block, with plain vanilla ADO.NET, we can define policy to retry execution of database command and wait period between tries to provide a reliable connection to database. Or, if our application is using any version of Entity Framework, we can include SqlAzureExecutionStrategy, an execution strategy that configures the policy to retry 3 times with an exponential wait between tries.
Every retry consumes computation power and slows down the application performance. So, we should define a policy, a circuit breaker that prevents throttling of service by processing the failed requests. There is no-one-size-fits all solution to breaking the retries.
There are 2 ways to implement a circuit breaker for state management –
  • Fallback or Fail silent– If there is a fallback mechanism to complete the requested functionality without the state management, the application should attempt executing it. For example, when the database is not available, the application can fallback on cache object. If no fallback is available, our application can fail silent (i.e. return a void state for a request).
  • Fail fast – Error out the user to avoid flooding the retry service and provide a friendly response to try later.     
Handling cache transient errors
Azure Redis cache internally uses ConnectionMultiplexer that automatically reconnects to Redis cache should there be disconnection or Internet glitch. However, the StackExchange.Redis does not retry for the get and set commands. To overcome this limitation, we can use library such as Polly that provide policies like Retry, Retry Forever, Wait and Retry and Circuit Breaker in a fluent manner.
The take-away!
The key take-away is to design applications considering that the infrastructure in cloud is elastic and that our applications should be designed to leverage its benefits without compromising the stability and user experience. It is, hence, utmost important to think about application information storage, its access mechanisms, exception handling and dynamic demand.

No comments:

Post a Comment