Saturday, September 06, 2008

Spring in Finance eXchange

There is a full day free event SPRING IN FINANCE EXCHANGE on 10-10-08 by Spring Source. It seems interesting. Click here see the programme and register before it is too late.

Patterns of Enterprise Application Architecture by Martin Fowler

In this text, I will share my thoughts and some of the important points from Martin Fowler's book Patterns of Enterprise Application Architecture. Similar to some of my other texts, this is an ongoing text, i will try to update it regularly while i read the book.



Enterprise Applications

Followings are the important aspects of an Enterprise Applications:

  • Persistent Data
  • Huge Volume of Data
  • Concurrent Access to Data and Resources
  • Multiple User Interfaces
  • Integration with other Enterprise Applications
  • Same data with various syntax and semantics format
  • Complex Business Logic/Illogic

In the light of these, we can say that, for example, following applications are not of enterprise applications: web browser, word/image /video processor, games, OS, compilers, digital TV software... And followings can be given as examples to enterprise applications: " ... payroll, patient records, shipping tracking, cost analysis, credit scoring, insurance, supply chain, accounting, customer service, and foreign exchange tradin."

Performance of an Enterprise Application is one of the vital factor in its success. Following has to be kept in mind during design in terms of performance:

  • Response Time: Amount of time it takes to process an request
  • Responsiveness: How quick the system acknowledge back that a request is received. Generall, responsiveness is more shorter than response time.
  • Latency: Minimum time to get a response from a remote system for a given existing or non existing task or request. Remote calls tends to increase latency therefore they should kept minimum.
  • Throughput: The amount of work/task done in a given time.
  • Load: How much system is under stress due to concurrent requests on a time point.
  • Load Sensetivity: Under a specific load or stress quickness of response time
  • Efficiency: Performance (response or throughput) per by resource
  • Capacity: Maximum effective load or throughput of a sytem
  • Scability: How performance is affected if resources are added or removed. Especially, hardware resources must be kept in mind when considering scability.

In terms of performance, the ultimate target in enterprise applications is to maximize the throughput or minimizing the repsonse time. Obviously, there is a trade-off between throughput and response time, therefore, ratio between throughput and response time has to be decided based on the constraints and requirements of application domain.

Chapter 1- Layering

Layering is one of the fundemantal pattern in enterprise applications. Layering is representing an application with loose coupled and highly coherent components, each of which sits on top of a lower components. Each layer only aware of the lower layer and it provides an interface to communicate with upper layers. Layering provides:

  • Abstraction: Each layer is responsible of a set of task and does not has to know detail implementation of other layers
  • Substitute: Without effecting much other layers, a layer's implementation can be changed.
  • Minimize dependencies
  • Standardization: By providing interface each layer make some sort of de-facto standardization, contract for other layers.
  • Reusability: Layer can be used with other high level layers.

On the other hand, extra layering would degrade performance since in each layer data or inputs has to be transferred into the layer specific format.

The Three Principal Layers

There are three principle layers:

  • Presentation: Displaying or providing information to user. Generally sits on client side
  • Domain: Business logic which generally sits on server side.
  • Data Source : Communication with database, messaging system, transaction again generally on server side.

Chapter 2. Organizing Domain Logic

Three separated pattern to organize domain logic:

  • Transaction Script: Based on simple procedural approach
  • Domain Model: Based on Object Oriented modelling
  • Table Module: Hybrid of transaction and domain model

A common approach is to put a Service Layer on top of above patterns in domain logic. A service layer provides clear API and placeholds for transaction control and security.

Chapter 3. Mapping to Relational Databases

This chapter elaborates mapping patterns and issues between domain layer and datasource layer such as architectural, behavioral, structural, decorative, connections and schemas. Fortunately, many of these concerns and patterns are implemented and addressed with latest OMR frameworks (such as hibernate) unless if you dont want to create your own OMR layer or framework.

Chapter 4. Web Presentation

Most important aspect in Web presentation is separation of business logic from web presentation by using patterns similar to MVC (Model, View, Input Controller). In case of a web application, MVC works as follows:

  1. A request comes to controller which extract required information from the request.
  2. Controller forwards it to business logic for an appropriate model object
  3. The model object fetch persistent data via data access objects and aggregate/format data for response object
  4. Returns to controller to decide which view will be used to display the response.
  5. Controller passes the response data to the view
  6. View is prepared and return back to client

Separating model from presentation is also a good practice in terms of testing. Each section, especially business model, can be tested independently without dealing with presentation issues.

View Patterns

  • Transform View: Similar to XSLT, it deploys a transformation schema which applied to inputs.
  • Template View: With structured page which has embed markers indicating where dynamic content to go. Server page technologies such as ASP, PHP, JSP implement this pattern. While this pattern provides a flexible and powerful coding, unfortunately, it also leads to a messy presentation code.

In addition to these patterns, view is generated either with a single step(stage) or two step view. In single step view generation, there is a one view module for each user interface, display and presentation decisions are taken only in this module. But in two step view, each view module responsible of a specific view and then this view is passed to second stage where global, common view is created. This is a vital advantage of two step view cos of it provides highly coherent view modules.

Input Controller Patterns

Input controller handles HTTP request and analyse it and then decide what to do with the request. There are two patterns for input controller :

  • Page Controller: For every page there is a input controller which create models and process it and then create a view object and returns it.
  • Front Controller: A centralized single object intercepts all requests and upon analyse them, it creates separate handlers to process each request.

Chapter 5. Concurrency

Concurrency Issues

  • Lost Updates
  • Inconsistent read
  • DeadLocks/LiveLocks

Execution Contexts

  • Request
  • Session
  • Process
  • Thread
  • Transaction

Isolation and Immutability

Isolation and immutability are among two solutions for concurrency problems. In isolation, shared resource is isolated for only an active agents such as process in operation systems. Other approach is to make shared resource is immutable. If no active agent tries to change the shared resource, then there wont be lost update or inconsistent read problem.

Optimistic and Pessimistic Concurrency Control

If we can not enable a isolated or immutable shared resource, then we have to carry out either an optimistic or pessimistic concurrency control.

In optimistic concurrency control, shared resource is allowed by two or more active agent and then a conflicts are detected. If there is a conflict, it is asked user to make decision (to merge, or cancel) similar to source control system such as CVS or SVN.

In pessimistic concurrency control, once an active agents starts to work on a shared resource, agent locks it, and other agents can not access it until the active agent unlock the shared resource. Unlike optimistic approach, while this approach maximise the concurrency, it suffers from low availability as a shared resource is accessed by only one agents at a time.

Severity of conflict and frequency of changes are the two major factor deciding which approach to use. If change frequency is high and severity of conflict is low then optimistic approach can be chosen. But if conflict is major factor then pessimistic approach is the answer. But these two approach comes along with additional problems such as deadlocks and livelocks.

Transaction

Transaction is one of the primary technique for concurrency control. A transacation is a sequence of work with consistent states and well defined start and end points. All works in transacation are carried out completely nor neither of them if one fails (rollback). Transaction can be defined with following four properties (ACID)

  • Atomicity: Transaction as a whole is an atomic process. Namely, if a step in the transaction fails, then all other steps will be rolled back. Transaction finishes successfuly with an commit statement.
  • Consistency: During all step of transaction, system state must be consistent and noncorrupt.
  • Isolation: Results of each internal steps in a transaction is not visible to other transaction until it finishes with a commit statement.
  • Durability: Commit statement must do result of transaction persistent.

Databases, message queues, ATM, printers are the sample transactional resources. A transaction should be short as much as possible. If a transaction takes more than a request, then it is called long transaction. And if a transaction' s lifeycle is bound to only a request's, then it is called request transaction, in other words, it starts and finishes with requests. Another variation is late transaction which works for only updates. It does not prevent inconsistent reads.

Transaction Isolation Levels

Isolation levels are defined in terms of three factor:

  • Dirty Read: You are permitted to read uncommitted or dirty data. Data integrity is compromised, foreign keys violated and unique keys ingored.
  • Non-Repeatable Read: It means a row can be updated at two different time, T1 and T2 and each time, you would get a different updated data.
  • Phantom Read: If you read a row at time T1 and then later T2, data will be same on the row but more related row data is added to table.

ISO 92 standard defines four transaction levels (from low to high):

  • Read Uncommitted
  • Read Committed
  • Repeatable Read
  • Serializable
Isolation LevelDirty Read Unrepeatable ReadPhantom
Read Uncommitted YES YES YES
Read CommittedNOYESYES
Repeatable ReadNONOYES
SerializableNONONO


Chapter 6. Session State

A session in a distributed environment system can either be:


  • Statelessness: System does not retain state between requests. When a request invoke a method, the state of the objects used by the method are not known. As default, HTTP protocol is stateless.
  • Stateful: System stores or keep track of states or information between requests.

Stateful system requires more resource as each stateful object has to to store all its states. On the other hand, a stateless object can be other requests too. But in real life problems, we need to store states. Therefore, best approach would be to store states on a stateless server.

Session States

Session state are the states that they are bound to session and isolated from other concurrent sessions. Lifecycle of a session state is limited with session's, so if you want to persist states further than business transaction, they should be persisted on other medium such as on databases.

Session states in business transaction has to obey fundemantal rules of transaction (ACID) when business transaction finished. For example. during business transaction, session states maybe be in invalid or inconsistent, but before commit, they have to be consistent with the rest of the data. But more important concern is the isolation between session states. Operations in business transaction must not cause an inconsistent data cos of multiple concurrent read and updates. Session states must be kept isolated from other sessions.

For performance reason, some data can be stored in sessions as part of a cache mechanism between requests. But this data is not a session state.

Methods to Store Session State

  • Client Session State: Storing data on client side. Most common methods: encoding data on URL, cookies or hidden form variables in html. Often these session data has to converted to right format in server side. If the amount of data is large and frequent, that approach suffers bandwith problem. It also exposes security and data integrity issues, unless data encryption is applied.
  • Server Session States: For example stroring data on server's memory or more for further persistence, serialized object can be stored on filesystem or database table where session id would be primary key and serialized object would be value. In case of session migration, transfering session to another server, session states have to stored in a shared resource or memory. That approach is good when session states are continouosly proccessed.
  • Database Session States: It is also server side but object's states are mapped to columns in a table for a longer persistence. Special attention has to be paid to secure isolation of session data in databases. In terms of performance, this approach is appropriate when session data is idle most of the time, for example in a public retail system.

Session data has to be cleared after some timeout or if request is cancelled. In case of client session state approach, this is not big concern as much others. A timeout has to be put place in server and database session states.

These three approaches can be used all together. But generally Server and Client session states are mostly used in practice. As pointed out above, if data is small and not complex, client session states is a good candidate. If you need failover, clustering and isolation between session is not problem, then Database session states can be used.