Spotlight: Performance Engineering and the Chaos Monkey

Performance Engineering and the Chaos Monkey
A key part of the performance engineering discipline is performance testing. The performance engineer can design and execute performance tests for almost any software and hardware component. Sometimes the greater challenge is actually finding or building the test harness that will be used to test your system. Collaborative’s performance engineering teams routinely lead testing projects set in a large Enterprise lab, under controlled situations. We have extended our performance engineering services to address the issues with stability, performance, and scalability in the Cloud. We are using the Cloud for performance load generation.
Production systems in the cloud need to be designed differently; see the Amazon guide on Architecture. Now your performance engineering testing strategy must be extended to deal with the Chaos Monkey; dealing with the unknown. For instance; while running a performance test, what happens if a component slows down or simply disappears? How does the rest of the system handle this condition? There is a greater chance of your components disappearing in the Cloud than within the Enterprise IT.
The Chaos Monkey was imagined and implemented by the Netflix team, to test the applications for stability and how they would react to degraded or missing technical components. They moved their entire production system to the Amazon Cloud.
The Performance Engineering testing strategy must address the following:
· the technical components of the application and risk assigned to each component
· the workload profile for each test (the business transactions that will be executed during the test), and the workload intensity
· the size of the database and the data required to execute a test
· the categories of tests to be executed
· and now Stability within the cloud
It must now account for the new more resilient application architecture, the performance engineering test plan must include managing components (for instance stopping, slowing down), not just load generation.

From components to system of systems
Overall, your testing strategy can range from isolating particular components or subcomponents of the system under test; for instance a single web service that only performs calculations, or a web service that accesses a database. Your test harness could hit the web service directly and bypass the client front-end. Your testing strategy can include an entire application with all the representative user profiles, where the test harness will exercise the client front-end system.
Your strategy may need to include testing a system of systems. For example, to place and execute a trade on a brokerage platform requires a number of applications to work together. Processing a prescription requires a number of applications to work together.
A solid performance engineering test strategy defines a series of incremental tests whereby each one is a building block to get to the large scale system testing. Now, you must include Chaos Monkey testing as part of your strategy if you are moving to the Cloud. This will help identify and then manage the risks associated with loosely coupled Cloud based applications.

