geants

Octo Technology a organisé le 20 Novembre un petit déjeuner très chic (quasiment un mini-USI) autour du sujet des Géants du Web.

Sujet que les Octos ont traité à de nombreuses reprises sur leur blog et auquel ils ont consacré un remarquable ouvrage collectif, disponible sur Amazon ou en téléchargement gratuit (avec inscription). Dans la grande tradition Octo, un ouvrage avec lequel chaque participant à ce petit déjeuner est reparti.

En synthèse, les acteurs majeurs du web d’aujourd’hui (Amazon, Facebook, Twitter, Google etc …) ont su se libérer des dogmes du passé et aborder des sujets avec fraîcheur pour apporter des solutions nouvelles, radicales, efficaces à de vieux problèmes de l’informatique pour citer l’introduction de leur ouvrage.

(NDLR : ils ont aussi su trouver des solutions nouvelles pour prendre quelques libertés avec la fiscalité mais ce n’est pas le sujet ici).

Une présentation du sujet puis une table ronde qui a fait intervenir des responsables d’entreprises hexagonales. Fabien Chazot (Meetic), Jean-Marc Potdevin (qui signe la préface du livre, de Viadeo), Ismaël Héry (LeMonde.fr) et Stéphane Priolet (CDiscount) ont ainsi répondu aux questions du modérateur Eric Biernat pour nous donner leur retour d’expérience sur ces pratiques. Une matinée particulièrement enrichissante synthétisée à un clic d’ici …

Read the rest of this entry »

Alors que la conférence USI s’apprête à fêter son cinquième anniversaire avec l’édition 2012, #hypertextual profite de l’occasion pour discuter avec la personnalité derrière cet évènement de l’industrie des systèmes d’information, évènement singulier de ce côté-ci de l’Atlantique en ce que les geeks et les boss y sont logés à la même enseigne.

François Hisquin a créé Octo (entreprise organisatrice de l’USI) en 1998. Octo est un “Cabinets d’Architectes en Systèmes d’Information” : François insiste sur ce point car il souhaitait vraiment distinguer sa société naissante des Sociétés de Service en Ingénierie Informatique (SSII), terme “qui me posait problème” comme il le concède volontiers.

Ce cabinet a placé au coeur de sa culture le bien-être de ses consultants. Ce qui peut apparaître comme du bullshit RH ou marketing pour certains esprits chagrins est une réalité. Octo walks the talk et aligne ses actes avec ses principes pour un résultat manifeste : la société a été classée ces deux dernières années en première place des Great Places to Work.

L’ouvrage collectif rédigé par différents Octos (comme les dénomme affectueusement notre interlocuteur) Partageons ce qui nous départage est un excellent extrait de cette culture. Une approche à la fois irrévérencieuse (“Les clients c’est comme les enfants : ce n’est pas parce que vous leur dites non qu’ils ne vous aiment plus” entendu durant l’USI 2008), passionnée, pétillante, qui a su adopter pour faire sienne des outils méthodologiques (les core protocols de Jim McCarthy, le ROTI, les méthodes agiles …).

On apprend en outre dans cet ouvrage éclairant que l’entreprise préfère sortir ses consultants de missions fructueuses plutôt que les laisser y dépérir s’ils ne s’y plaisent pas : nos consultants d’abord, les clients ensuite, comme un écho à la célèbre maxime de Vineet Nayar.

Ce qui sort de cet entretien est particulièrement instructif car cette approche, qui peut sembler de prime abord iconoclaste, sert à merveille leur objectif : la quête d’excellence. François Hisquin enrichit le tableau de chasse #hypertextual et nous explique comment ici et maintenant … Read the rest of this entry »

In the previous episodes of this serie , we’ve addressed the availability , scalability and Performances aspects of HA Architecture. In this one we’ll concentrate on the future of these architectures and the emerging technologies to tackle specific HA constraints.

Technology Trends

The future is a G word : GRID. Grid of memory, grid of CPU and Grid of disks. The main limitations today is to have this one to one relationship between the application and the physical server on which the former is deployed. Hence the main trend of the market today : to virtualize servers by using Network Access processing, memory and storage.Azul Systems

Network Accessed processing

This is the Grid of CPUs. Azul Systems offers some sort of Java mainframe, a box containing 768 CPUs and 700GO of memory. Applications are deployed on blades as usual but these blades contain a proxy to the Azul box : whenever CPU process is required, the blade proxy hands it over to the Azul box which is configured to allocate a certain number of CPUs for that very app.

Websphere XD also offers new possibility on CPU and servers virtualization.

Network Accessed Memory

Terracotta offers a solution for Network Accessed Memory. This is a server managing objects lying in network memory. Thus differents applications running on different JVMs and different servers can share the same instance of a given object. Client applications just need to import the terracotta client libs and define in a description XML file the objects and attributes to be shared and that’s it !

Main issue here with the open source version : 2mn start up time. This would then create a main Single Point Of Failure in the system.

Network Access Storage

SAN (Storage Access Network) offers a very robust and efficient solution for network repository. Communication are fiber channel based and therefore very performant .. but very expensive. This already is commonly used and it has paved the way for the above 2 other solutions of network access services.

In the previous posts of this serie we’ve addressed the availability and scalability aspects of high availability systems. In this (rather lengthy) one we’ll focus on the performance side of things.

Again, performance is something to be contextually defined quite early in the project. For instance a requirement such as “3s response time” is not precise enough. “3s response time with 200 simultaneous users” is a valid requirement.

History

Performance has been a common issue for the last decade or so with the emergence of multi-tier IT systems. It has not been such a problem in the past or rather it has been addressed as the core issue and fixed once for all during the mainframe years (65-89). It also has been skipped during the client-servers years (90’s) as no real software architecture was in place. Back then, no-one would react if SQL code was found in the presentation layer.

Nowadays, the standard 5 layer software architecture (Client, Application, Business, Integration and Resources) has naturally emerged as the de facto solution. The basic principle is to ensure for isolation between layers to provide the software architecture with a greater modularity and robustness. With the considerable litterature available, architecture software development constraints are much more pregnant – and that’s an excellent thing.

However, such complex software architecture requires to be extremely cautious from the very early stage of architecture and design as a laidback appoach can prove to be quite expensive in terms of performance.

Usual Suspects

From experience, when it comes to performance, the usual suspects in this 5 layers architecture are :

  1. Integration tier
  2. Database tier : Execution plans, indexes, table schemes, DBA early in the process
  3. Business and Application tier : Algorithm, transactions, resource connections and APIs

software layers performances

Integration tier

Java Enterprise frameworks (Hibernate, Entity Beans) have been developed to abstract data persistence from the application developer. On one hand that’s a good thing : application developers can then focus on their own business problems and forget about SQL.

As a result these frameworks usage may not optimised and it could happen that application developers lose control on what they actually do, the number of objects in memory etc … The solution is to keep a fine grained control on these frameworks and the SQL that is actually generated and executed behind the scene. A close collaboration with a experience DBA (Database Administrator) is strongly recommended here, from the early steps of design.

For information, it’s worth mentionning that big internet giant such as Amazon for instance just don’t use such frameworks in order not to lose control on piece of processing that actually is critical in terms of performance. Read this article on that very subject.

Databases tier

Another drawback is that SQL not being at the center of development concerns, we may end up with applications that are not really optimised from the database perspective : database schema and queries execution plans are inappropriate and inefficient.

Then again it is recommended to have a DBA involved from the very early stage of the project (architecture and design) to validate the database schema and suggest execution plan to optimise the database usage.

Stored Procedures

At first, stored procedures may be considered as antiquities inheritated from the client/server years, souding like heretic software components from the 5 layer perspective.

However, from a genuine performance perspective they still are top drawer solutions. Data handling on the server allow to save a lot of time for applications and on the network.

The aim here is obviously not to develop the whole business layer as sored procedures but, rather, to optimise a few heavy process. Besides it proves to be an excellent antidote to the “DB request machin-gun” anti-pattern.

Application and business tier

There are a few standards Java dos and don’t to improve algorithm path length and CPU/memory usage.

The most important ones though regard sessions, transactions, APIs and resource connection

  1. Avoid stateful component whenever possible (EJB Stateful sessions are performance killers). Whenever handling sessions (e.g HTTP Session) make sure they don’t contains too many objects.
  2. Transactions must be kept as short as possible.
  3. Resource connections are very precious asset. There use should be optimised to the actual need and kept as short as possible.
  4. Whenever developing services API, always develop both unitary treatment service (i.e perform one action for one given item) and multiple treatments interface (i.e execute multiple request with multiple inputs and return multiple replies). This basic principle allows to save a lot of time by avoiding machin-gun request to the service.

Lastly, whenever building a new enterprise system using platform such as JEE or .Net, we need to keep in mind that these were made for real time process, not for long process. So if the application you are developing takes times (i.e above 5 sec), this component must not be deployed on the application server : it will uses precious resources for long process that are not used in the meatime to serve real-time services. Such a component should be deployed as a stand-alone server instead.

The Asynchronous trick

This is a great tool to load balance. Whenever addressing long process issue, it always is worth considering deploying it as an asynchonous service. This is a very natural and elegant load balancer mechanism that does not impact real time service QOS.

The biggest problem here is to convinced the business to accept to implement some asynchronous solution for rather peripheral services (e.g Reporting).

JMS is the natural solution whenever developing asynchronous solution in the Java world. One has still to be careful when choosing a JMS implementation.

A basic rule of thumb is that whenever you’re dealing with transactional messages with high availability and robustness concerns, you should rule out JMS implementation without persistence mechanism. JBoss JMS for instance is not as mature as JBoss Servlet or EJB implementation. Today the best solution on the market for JMS is SonicMQ.

In the previous post of this serie we’ve seen how to increase the availability of your IT system . In this one we’ll focus on how to scale it.

Don’t believe the RAC !

You may be tempted to think : well the bottleneck of any enterprise application being I/Os and therefore the database, I’ll start with clustering that layer. Especially if you have opened the door to Oracle marketing representatives that will try to sell their (still excellent) Oracle RAC.

Well, actually the best way to cluster your infrastructure is not bottom up but top down. So the order to scale your 5 standard software layers (client, application, business, integration, database) is as follow :

  1. client/web layer
  2. application / business layer
  3. database layer

There is this excellent (and free) IBM redbook describing the path to high availability system and this is the recommended approach. Big Blue have been developing such HA solutions for about 40 years, you can trust them whenever they address this issue.

Scaling Dimensions

Okay now we know where to start with clustering, a question still remains : which type of clustering should we use ? There are two of them :

  1. Symmetrical Multi-Processing (SMP – Scale Up – vertical) : upgrading one single server with more CPUs and more memory
  2. Massively Parralel Processing (MPP – Scale Out – horizontal) : installing multiple servers in parrallel

Roughly speaking, Scaling Up is recommended for data centric applications. This is because you dont really want to have file lock management to be carried out and synchonize on multiple servers. So if you need to scale your database, your ERP or your CRM, scale it up.

On the other hand, Scaling Out is recommended for web servers. In that case, the goal is to share work load for non related requests . Google is a perfect example : many simultaneous non related requests are processed at the same time : many different servers load balanced upfront will do the trick.

There is a third scalability approach which is a mixed one : scaling both up (bigger server) and out (a second one). This one perfectly addresses scalability constraints for application servers. However, do yourself a favor and think twice before clustering EJBs.

So we know we’ll start first by scaling our web servers horizontally, then mix scaling our application servers and lastly scaling up our database servers. Now let see how we should implement this clustering.

Cluster modes

There are two ways to set up clusters :

  1. Active / Passive : one server is up and running with all traffic redirected on it while the other one is sleeping ready to take over if the first server has an outage
  2. Active / Active : All servers are up and running, sharing the workload.

Active / Passive is the simplest approach and the favorite one of operations teams : easier to manage. On the other hand it is expensive as half of the CPU power is paid for doing nothing most of the time. Besides, when considering such solution there is a need to address the transparent fail over issue and the time to switch. In any case, this Active/Passive cluster set up is the recommended approach for asynchronous servers such as JMS.

Active / Active is the most complicated issue from the Operations team perspective. However, it optimises the investment and the ratio paid CPU/used CPU. There is still a margin to kept though to absorb any stopping servers amongst the remaining ones. Recommended approach for application servers.

Database Clustering

So you have been scaling up your database on the biggest server on the face of earth but that still does not cope with you traffic : you need to set up a database cluster.

There are four clustering solutions available to you :

  • Share nothing : each server has his own tables and memory. This is also called partitionning. You need to make sure that you don’t have any joint between tables and that the tables are completely independent one from the other. Otherwise you lay end up doing 2 I/Os on 2 different servers for a call. This is the DB2 approach.
  • Redundant : each server has a full copy of the database. This approach implies that each change in one copy of the database is propagated to other servers. May prove to be quite cumbersome. MySql approach
  • Disk Share Technique : All the data resides on a share drive (such as SAN – Storage Access Network) and all servers access this disk in read mode while only one access the disk in write mode. Sybase clustering solution
  • Disk and Memory Share : This is the purest version of clustering. Only Oracle RAC offer this feature. Servers are just CPU boxes and they share network memory and network storage. Complete fail-over solution. Extremely complicated to operate and very expensive.

In the next post we’ll concentrate on the performance of HA architectures.

I’ve been lucky enough to attend this 2 days High Availability Architecture training by Médéric Morel from french IT company SQLI. Médéric also has contributed to an excellent book on the subject (sorry it’s in french). Let me put it straight : one of the best training IT session I ever had. Based on what I’ve learned during these 2 days, this post serie aims to introduce the following aspects of HA architecture :

  1. Availability
  2. Scalability
  3. Performances
  4. Technology Trends

The basic principle whenever designing this type of architecture : there is no definite good solutions. There just are good practises that are appropriate for your business / budget. It’s up to you to choose amongst those existing possibilities and make the correct balance between development / operability costs and business needs.

Only the Availability subject is addressed in this post. Sequels will be published to address the remaining ones.

Availability

It is critical to define at the very early stages of the project this availability target (i.e the time ratio the system is available for the user). This really is a business input that IT architect need to address, bearing in mind that the characteristic curve of the cost/availability ratio is linear up to the 99.99% range at which point the cost explodes while attempting to reach the last thousandth.

The main issue with multi-tier applications is that by piling up servers, the computed availability of the whole system decreases. The overall availability ratio is the product of all components availability. For instance a platform with 2 servers with 99% availability rate has an availability of 0.99×0.99 = 0.98, i.e 98%.

The natural solution to increase availability would then to get the checkbook out and set up servers in parrallel to ensure for high availability. But there are wiser things to do first.

Sequence of actions

Fact : only 20% of system outages come from hardware problems. While 40% come from human mistakes (operation team) and the remaining 40% from application problems. As a result, the recommended curse of actions is as follow :

1. Improve Integrations and Operations practises

2. Improve Monitoring

3. Scale the system

4. Set up a disaster recovery procedure

The first action to take to improve system availability is to set up standards and to normalize the process at all levels of the operation teams in order to avoid human mistake causing outage and improve detection, diagnosis and reactivity during system problems to reduce the time while customer can’t access the service. Hence the success of ITIL Methods.

The second solution is to implement solutions to closely monitor the system at the five monitoring levels defined : availability, response time, application details, business details and user experience.

There are many solutions available that address these monitoring requirements, for each level. For most not critical applications a good old status page, like Apache’s pinged by a robot on a regular basis is quite often good enough.

Now that you know how to improve your system availability we’ll focus with the next post on how to scale your system.

Follow

Get every new post delivered to your Inbox.

Join 378 other followers