New technology – JanWiersma.com

The tale of Services v.s. Cloud product organizations

As companies transition their product delivery methodology from on-premise software to a As A Service (PAAS/SAAS) model, they are confronted with very different motions across their Sales, Marketing, Development, Services and Support organizations.

One of the examples that show the difference how ‘execution’ is done in these models, is how Services and Product is managed across the organization. For larger on-premise software companies it is not uncommon to see Professional Services (PS) bookings v.s. software bookings rates of >3, meaning that customers pay more for the implementation and assistance in software management, then the actual purchase price of the software.

The Cloud delivery model has very different PS dynamic, as Waterstone reports in their 2015 report – changing Professional Services Economics ;

“There is growing preference for Cloud- and SaaS-based solutions that, on average, have a PS attach rate around 0.5x to 1.0X (versus the 2.9x PS attach rate commonly seen with traditional licensed products).”

The analysis is not strange, as Cloud is all about providing low friction of onboarding by self-service and automation. This means getting the human out of the equation, as it’s a limiting factor in scalability and raises cost.

Cloud is all about minimizing the time from idea -to- revenue, while being able to scale rapidly and keeping cost low.

The definition of ‘the product’ in a Cloud world therefore isn’t only about the bits & bytes, but includes successful onboarding of the customer and maximizing their usage.

Continue with reading

The problem with the Docker hype

Remember when the Cloud hype kicked off and we all looked mesmerised at the Cloud Unicorn companies (like Netflix) that got great benefit from Cloud usage? We all wanted that so badly. We wanted to get out of the pain of high maintenance cost and the lack of agility. Amazon, Google, Microsoft Azure all seemed to provide that. Just by the click of a button.

In 2012 I did a short whitepaper on what it takes to move an application the cloud, based on my painful experience with some Enterprise IT moves to the cloud. I stated, “The idea that one can just move applications without change is flawed.” There was not enough benefit in moving the monolithic, 10 year old, application on the cloud. That type of move may deliver small cost savings, but that is actually a hosting exercise. It could even be dangerous to move in that way because the application may not be suitable for the cloud providers’ reference architecture. That could for example lead to availability and performance issues. The unicorn benefits could only be gained if you changed your way of working and thinking.

The same goes for the Docker hype now;

I agree with all the potential that Docker unlocks; portability & abstraction. It is a game changer and some even say ‘Docker changes everything’

Hearing people talk at large conferences (like AWS ReInvent) about Docker seems like the first phase of the Cloud hype all over again. They state ‘Just docker-ize your app’ and all will be great. Sureal conversations with people that try to put anything and everything in a container. ‘Yes, just put that big monolithic app in a container’

People seem to forget that Docker is an enabler for architecture elements like portability and micro-services (that leads to scalability).

I highly recommend reading James Lewis & Martin Fowler ‘s article on microservices first: http://martinfowler.com/articles/microservices.html

Then See this:

Because The problem with the Docker hype currently? It makes it about the tool. And only the tool will not fix your problem.

Other things to consider around Docker: Docker Misconceptions

Where is the rack density trend going ?…

When debating capacity management in the datacenter the amount of watts consumed per rack is always a hot topic.

Years ago we could get away with building datacenters that supported 1 or 2 kW per rack in cooling and energy supply. The last few years demand for racks around 5-7kW seems the standard. Five years ago I witnessed the introduction of blade servers first hand. This generated much debate in the datacenter industry with some claiming we would all have 20+ kW racks in 5 years. This never happened… well at least not on a massive scale…

So what is the trend in energy consumption on a rack basis ?

Readers of my Dutch datacenter blog know I have been watching and trending energy development in the server and storage industry for a long time. To update my trend analysis I wanted to start with a consumption trend for the last 10 years. I could use the hardware spec’s found on the internet for servers but most published energy consumption values are ‘name-plate ratings’. Green Grid’s whitepaper #23 states correctly:

Regardless of how they are chosen, nameplate values are generally accepted as representing power consumption levels that exceed actual power consumption of equipment under normal usage. Therefore, these over-inﬂated values do not support realistic power prediction

I have known HP’s Proliant portfolio for a long time and successfully used their HP Power Calculator tools (now: HP Power Advisor). They display the nameplate values as well as power used at different utilizations and I know from experience these values are pretty accurate. So; that seems as good starting point as any…

I decide to go for 3 form factors:

1U servers (or pizzabox servers)
Blade servers. Selecting basic x86 systems
Density Optimized servers. These can be defined as;

..minimalist server designs that resemble blades in that they have skinny form factors but they take out all the extra stuff that hyperscale Web companies like Google and Amazon don’t want in their infrastructure machines because they have resiliency and scale built into their software stack and have redundant hardware and data throughout their clusters….These density-optimized machines usually put four server nodes in a 2U rack chassis or sometimes up to a dozen nodes in a 4U chassis and have processors, memory, a few disks, and some network ports and nothing else per node.[They may include low-power microprocessors]

For the 1U server I selected the HP DL360. A well know mainstream ‘pizzabox’ server. For the blade servers I selected the HP BL20p (p-class) and HP BL460c (c-class). The Density Optimized Sever could only be the recently introduced (5U) HP Moonshot.

For the server configurations guidelines:

Single power supply (no redundancy) and platinum rated when available.
No additional NICs or other modules.
Always selecting the power optimized CPU and memory options when available.
Always selecting the smallest disk. SSD when available.
Blade servers enclosures

Pass-through devices, no active SAN/LAN switches in the enclosures
No redundancy and onboard management devices.
C7000 for c-class servers
Converted the blade chassis power consumption, fully loaded with the calculated server, back to power per 1U.

Used the ‘released’ date of the server type found in the Quickspec documentation.
Collected data of server utilization at 100%, 80%, 50%. All converted to the usage at 1U for trend analysis.

This resulted in the following table:

Server type	Year	CPU Core count	CPU type	RAM (GB)	HD (GB)	100% Util (Watt for 1U)	80% Util (Watt for 1U)	50% Util (Watt for 1U)
HP BL20p	2002	1	2x Intel PIII	4	2x 36	328.00
HP DL360	2003	1	2x Intel PII	4	2x 18	176.00
HP DL360G3	2004	1	2x Intel Xeon 2,8Ghz	8	2x 36	360.00
HP BL20pG4	2006	1	2x Intel Xeon 5110	8	2x 36	400.00	&n bsp;
HP BL460c G1	2006	4	2x Intel L5320	8	2x 36	397.60	368.80	325.90
HP DL360G5	2008	2	2x Intel L5240	8	2x 36	238.00	226.00	208.00
HP BL460c G5	2009	4	2x Intel L5430	8	2x 36	368.40	334.40	283.80
HP DL360G7	2011	4	2x Intel L5630	8	2x 60 SSD	157.00	145.00	128.00
HP BL460c G7	2011	6	2x Intel L5640	8	2x 120 SSD	354.40	323.90	278.40
HP BL460c Gen8	2012	6	2x Intel 2630L	8	2x 100 SSD	271.20	239.10	190.60
HP DL360e Gen8*	2013	6	2x Intel 2430L	8	2x 100 SSD	170.00	146.00	113.00
HP DL360p Gen8*	2013	6	2x Intel 2630L	8	2x 100 SSD	252.00	212.00	153.00
HP Moonshot	2013	2	Intel Atom S1260	8	1x 500	177.20	172.40	165.20

* HP split the DL360 in to a stripped down version (the ‘e’) and an extended version (the ‘p’)

And a nice graph (click for larger one):

The graph shows an interesting peak around 2004-2006. After that the power consumption declined. This is mostly due to power optimized CPU and memory modules. The introduction of Solid State Disks (SSD) is also a big contributor.

Obviously people will argue that:

the performance for most systems is task specific
and blades provide higher density (more CPU cores) per rack,
and some systems provide more performance and maybe more performance/Watt,
etc…

Well; datacenter facility guys couldn’t care less about those arguments. For them it’s about the power per 1U or the power per rack and its trend.

With a depreciation time of 10-15years on facility equipment, the datacenter needs to support many IT refresh cycles. IT guys getting faster CPU’s, memory and bigger disks is really nice and it’s even better if the performance/watt ratio is great… but if the overall rack density goes up, than facilities needs to supp
ort it.

To provide more perspective on the density of the CPU/rack, I plotted the amount of CPU cores at a 40U filled rack vs. total power at 40U:

Still impressive numbers: between 240 and 720 CPU cores in 40U of modern day equipment.

Next I wanted to test my hypotheses, so I looked at a active 10.000+ server deployment consisting of 1-10 year old servers from Dell/IBM/HP/SuperMicro. I ranked them in age groups 2003-2013, sorted the form factors 1U Rackmount, Blades and Density Optimized. I selected systems with roughly the same hardware config (2 CPU, 2 HD, 8GB RAM). For most age groups the actual power consumption (@ 100,80,50%) seemed off by 10%-15% but the trend remained the same, especially among form factors.

It also confirmed that after the drop, due to energy optimized components and SSD, the power consumption per U is now rising slightly again.

Density in general seemed to rise with lots more CPU cores per rack, but at a higher power consumption cost on a per rack basis.

Let’s take out the Cristal ball

The price of compute & storage continues to drop, especially if you look at Amazon and Google.

Google and Microsoft have consistently been dropping prices over the past several months. In November, Google dropped storage prices by 20 percent.

For AWS, the price drops are consistent with its strategy. AWS believes it can use its scale, purchasing power and deeper efficiencies in the management of its infrastructure to continue dropping prices. [Techcrunch]

If you follow Jevons Paradox then this will lead to more compute and storage consumption.

All this compute and storage capacity still needs to be provisioned in datacenters around the world. The last time IT experienced growth pain at the intersection between IT & Facility it accelerated the development of blade servers to optimize physical space used. (that was a bad cure for some… but besides the point now..) The current rapid growth accelerated the development of Density Optimized servers that strike a better balance between performance, physical space and energy usage. All major vendors and projects like Open Compute are working on this with a 66.4% year over year in 4Q12 growth in revenue.

Blades continue to get more market share also and they now account for 16.3% of total server revenue;

"Both types of modular form factors outperformed the overall server market, indicating customers are increasingly favoring specialization in their server designs" said Jed Scaramella, IDC research manager, Enterprise Servers "Density Optimized servers were positively impacted by the growth of service providers in the market. In addition to HPC, Cloud and IT service providers favor the highly efficient and scalable design of Density Optimized servers. Blade servers are being leveraged in enterprises’ virtualized and private cloud environments. IDC is observing an increased interest from the market for converged systems, which use blades as the building block. Enterprise IT organizations are viewing converged systems as a method to simplify management and increase their time to value." [IDC]

With cloud providers going for Density Optimized and enterprise IT for blade servers, the market is clearly moving to optimizing rack space. We will see a steady rise in demand for kW/rack with Density Optimized already at 8-10kW/rack and blades 12-16kW/rack (@ 46U).

There will still be room in the market for the ‘normal’ rackmount server like the 1U, but the 2012 and 2013 models already show signs of a rise in watt/U for those systems also.

For the datacenter owner this will mean either supply more cooling&power to meet demand or leave racks (half) empty, if you haven’t build for these consumption values already.

In the long run we will follow the Gartner curve from 2007:

With the market currently being in the ‘drop’ phase (a little behind on the prediction…) and moving towards the ‘increase’ phase.

Density Optimized servers (aka microservers) market is booming

IDC starts tracking hyperscale server market

Documentation and disclaimer on the HP Power Advisor

Google’s BMS got hacked. Is your datacenter BMS next ?

A recent USA Congressional survey stated that power companies are targeted by cyber attacks 10.000x per month.

After the 2010 discovery of the Stuxnet virus the North American Electric Reliability Corporation (NERC) established both mandatory standards and voluntary measures to protect against such cyber attacks, but most utility providers haven’t implemented NERC’s voluntary recommendations.

Stuxnet hit the (IT) newspaper front-pages around September 2010, when Symantec announced the discovery. It represented one of the most advanced and sophisticated viruses ever found. One that targeted specific PLC devices in nuclear facilities in Iran:

Stuxnet is a threat that was primarily written to target an industrial control system or set of similar systems. Industrial control systems are used in gas pipelines and power plants. Its final goal is to reprogram industrial control systems (ICS) by modifying code on programmable logic controllers (PLCs) to make them work in a manner the attacker intended and to hide those changes from the operator of the equipment.

DatacenterKnowledge picked up on it in 2011, asking ‘is your datacenter ready for stuxnet?’

After this article the datacenter industry didn’t seem to worry much about the subject. Most of us deemed the chance of being hacked with a highly sophisticated virus ,attacking our specific PLC’s or facility controls, very low.

Recently security company Cylance published the results of a successful hack attempt on a BMS system located at a Google office building. This successful hack attempt shows a far greater threat for our datacenter control systems.

The road towards TCP/IP

The last few years the world of BMS & SCADA systems radically changed. The old (legacy) systems consisted of vendor specific protocols, specific hardware and separate networks. Modern day SCADA networks consist of normal PC’s and servers that communicate through IT standard protocols like IP, and share networks with normal IT services.

IT standards have also invaded facility equipment: The modern day UPS and CRAC is by default equipped with an onboard webserver able of send warning using an other IT standard: SNMP.

The move towards IT standards and TCP/IP networks has provided us with many advantages:

Convenience: you are now able to manage your facility systems with your iPad or just a web browser. You can even enable remote access using Internet for your maintenance provider. Just connect the system to your Internet service provider, network or Wi-Fi and you are all set. You don’t even need to have the IT guys involved…
Optimize: you are now able to do cross-system data collection so you can monitor and optimize your systems. Preferably in an integrated way so you can have a birds-eye view of the status of your complete datacenter and automate the interaction between systems.

Many of us end-users have pushed the facility equipment vendors towards this IT enabled world and this has blurred the boundary between IT networks and BMS/SCADA networks.

In the past the complexity of protocols like Bacnet and Modbus, that tie everything together, scared most hackers away. We all relied on ‘security through obscurity’ , but modern SCADA networks no longer provide this (false) sense of security.

Moving towards modern SCADA.

The transition towards modern SCADA networks and systems is approached in many different ways. Some vendors implemented embedded Linux systems on facility equipment. Others consolidate and connected legacy systems & networks on standard Windows or Linux servers acting as gateways.

This transition has not been easy for most BMS and SCADA vendors. A quick round among my datacenter peers provides the following stories:

BMS vendors installing old OS’s (Windos/Linux) versions because the BMS application doesn’t support the updated ones.
BMS vendors advising against OS updates (security, bug fix or end-of-support) because it will break their BMS application.
BMS vendors unable to provide details on what ports to enable on firewalls; ‘ just open all ports and it will work’.
Facility equipment vendors without software update policies.
Facility equipment vendors without bug fix deployment mechanisms; having to update dozens of facility systems manually.

And these stories all apply to modern day, currently used, BMS&SCADA systems.

Vulnerability patching.

Older versions of the SNMP protocol have known several vulnerabilities that affected almost every platform, included Windows/Linux/Unix/VMS, that supported the SNMP implementation.

It’s not uncommon to find these old SNMP implementations still operational in facility equipment. With the lack of software update policies, that also include the underlying (embedded) OS, new security vulnerabilities will also be neglected by most vendors.

The OS implementation from most BMS vendors also isn’t hardened against cyber attacks. Default ports are left open, default accounts are still enabled.

This is all great news for most hackers. It’s much easer for them to attack a standard OS like a Windows or Linux server. There are lots of tools available to make the life of the hacker easer and he doesn’t have to learn complex protocols like Modbus or Bacnet. This is by far the best attack surface in modern day facility system environments.

The introduction of DCIM software will move us even more from the legacy SCADA towards an integrated & IT enabled datacenter facility world. You will definitely want to have your ‘birds-eye DCIM view’ of your datacenter anywhere you go, so it will need to be accessible and connected. All DCIM solutions run on mainstream OS’s, and most of them come with IT industry standard databases. Those configurations provide an other excellent attack surface, if not managed properly.

ISO 27001

Some might say: ‘I’m fully covered because I got an ISO 27001 certificate’.

The scope of ISO27001 audit and certificate is set by the organization pursuing the certification. For most datacenter facilities the scope is limited to the physical security (like access control, CCTV) and its processes and procedures. IT systems and IT security measures are excluded because those are part of the IT domain and not facilities. So don’t assume that BMS and SCADA systems are included in most ISO 27001 certified datacenter installations.

Natural evolution

Most of the security and management issues are a normal part of the transition in to a larger scale, connected IT world for facility systems.

The same lack of awareness on security, patching, managing and hardening of systems has been seen by the IT industry 10-15 year ago. The move from a central mainframe world to decentralized servers and networks, combined with the introduction of the Internet has forced IT administrators to focus on managing the security of their systems.

In the past I have heard Facility departments complain that IT guys should involve them more because IT didn’t understand power and cooling. With the introduction of a more software enabled datacenter the Facility guys now need to do the same and get IT more involved; they have dealt with all of this before…

Examples of what to do:

Separate your systems and divide the network. Your facility system should not share its network with other (office) IT services. The separate networks can be connected using firewalls or other gateways to enable information exchange.
Assess your real needs: not everything needs to be connected to the Internet. If facility systems can’t be hardened by the vendor or your own IT department, then don’t connect them to the Internet. Use firewalls and Intrusion Detection Systems (IDS) to secure your system if you do connect them to the Internet.
Involve your IT security staff. Have facilities and IT work together on implementing and maintaining your BMS/SCADA/DCIM systems.
Create awareness by urging your facility equipment vendor or DCIM vendor to provide a software update & security policy.
Include the facility-systems in the ISO 27001 scope for policies and certification.
Make arrangements with your BMS and/or DCIM vendor about management of the underlying OS and its management. Preferably this is handled by your internal IT guys who already should know everything about patching IT systems and hardening them. If the vendor provides you with an appliance, then the vendor needs to manage the patching process and hardening of the system.

If you would like to talk about the future of securing datacenter BMS/SCADA/DCIM systems than join me at Observe Hack Make (OHM) 2013. IOHM is a five-day outdoor international camping festival for hackers and makers, and those with an inquisitive mind. Starts July 31st 2013.

Note:
There are really good whitepapers on IDS systems (and firewalls) for securing Modbus and Bacnet protocols, if you do need to connect those networks to the internet. Example: Snort IDS for SCADA (pdf) or books about SCADA & security at Amazon.

Source:
A large part of this blog is based on a Dutch article on BMS/SCADA security January 2012 by Jan Wiersma & Jeroen Aijtink (CISSP). The Dutch IT Security Association (PViB) nominated this article for ‘best security article of 2012’.

Datacenters & SmartGrid

In de Datacenterworks van Oktober dit jaar stond een artikel over een lopend TNO onderzoek rond ‘Datacenters & smartgrid’. De focus van het onderzoek is de flexibilisering van energie afname bij middelgrote afnemers zoals koel/vries huizen en datacentra. Dit onder de toepasselijke naam ‘Flexiquest’.

Hier de PDF, vanaf pagina 7:

Nabrander – Groene software; door beïnvloeden van gedrag

Naar aanleiding van de workshop rond groene software, mijn blog en presentatie daar, zijn er nog een aantal zaken die ontbraken in mijn vorige blog of interessante zaken die voorbij kwamen in de discussie.

1. Virtualisatie.

Zoals de heren van Schuberg Philis lieten zien in hun presentatie, zijn veel systeem resources op servers onderbenut. Virtualisatie kan daarbij duidelijk een rol spelen; het zorgt voor betere uitnutting van de beschikbare hardware. Aangezien een server die niets staat te doen (idle is) toch 40-60% van zijn maximale energie consumeert, levert dit dus energie besparingen op.

2. Strippen en tunen van je server OS.

In de discussie kwam de overhead van besturingssystemen aanbod; In de afgelopen jaren zijn besturingssystemen (OS’en) zoals Windows en Linux uitgegroeid tot alleskunners. De leverancier van het OS weet uiteraard vooraf niet wat de klant er op wil draaien. De ene keer kan het een email servers zijn, de volgende keer een database server. Standaard komt een modern OS dus met een waslijst aan services of daemon’s die aan staan. Daar boven op komen vaak nieuwe services van een virusscanner, een inventory tool, monitoring tool, etc.. Al met al gaan er een hoop computer resources verloren met al deze services en deamon’s.

IT Security specialisten leren ons echter ook al jaren dat je server OS’en dient te ontdoen van al deze overhead. Dit maakt namelijk de mogelijke ingangen voor hackers kleiner. Diverse tools van IT leveranciers kunnen hierbij helpen. Ook kan men steeds meer kiezen voor zo geheten virtual appliances. Deze virtuele servers zijn door de leverancier al gestript en taak specifiek gemaakt.

Het strippen en tunen van je server OS na het installeren van zijn specifieke rol is dus een security en (energie) efficiëntie noodzaak.

3. Is energie afname van software wel te ‘zien’ ?

Er was een kritische kanttekening bij het feit of de energie afname van een query wel te zien was;

Systeembeheerders weten al enige tijd dat het systeem wat zijn in beheer gaan nemen voorzien moet worden van een baseline. Hierbij word basis gedrag van het systeem bepaald, zodat er binnen monitoring en beheer applicaties drempelwaarde gezet kunnen worden om het nieuwe systeem in de gaten te houden. Op deze manier worden de beheerders niet onnodig belast met valse meldingen.

Tijdens een goed OTAP proces met performance testen, worden dit soort baselines opgesteld voor hele applicatie ketens. Dit helpt de test&performance analist om te bepalen hoe de applicatie keten werkt en of deze goed werkt. Deze gegevens kunnen later gebruikt worden om de beheerder te voeden voor zijn baseline.

Als men een goede baseline (op een gestripte server) vast stelt, is het zeer goed mogelijk voor een applicatie ontwikkelaar om relaties te leggen tussen de handelingen die zijn code uitvoert en het gebruik van systeem resources.

4. Server energie afname.

Zoals bij 1 aangegeven gaat ook een hoop energie verloren in systemen die niets staan te doen. Tijdens de discussie gaf dit een opening tot de ontwikkeling van efficiënte hardware. Daar heb ik in vorige blogs al een hoop overgeschreven. Wat nog specifiek het vermelden waard was:

OpenCompute project heeft een OpenRack design uitgebracht waarbij o.a. met DC power gewerkt word in het rack. Servers hier voor zullen binnen kort van HP en Dell op de markt komen. Binnen kort meer…

De Schuberg Philis mannen tipte mij nog een Anandtech artikel over een test met OpenCompute server hardware; http://www.anandtech.com/show/4958/facebooks-open-compute-server-tested

5. IT Energie afname is geen focus voor vele.

Al vroeg tijdens de bijeenkomst viel het feit dat energie afname door IT, voor veel bedrijven geen primaire focus zou zijn. Dat is een punt wat ik onderschrijf, zeker voor bedrijven waarbij ICT niet hun primaire product is maar een ondersteuning. Voor veel van dit soort bedrijven zitten hun grote kosten (OPEX) niet in energie.

In 2009-2010 concludeerde diverse mensen en organisaties al dat de adoptie van energie efficiëntie daar door maar lastig op gang komt. Zolang er voor een bedrijf geen grote financiële motivator onder zit, zal het lastig blijven om hier focus op te krijgen.

De opkomst van Cloud computing en met name bedrijven in de IAAS / PAAS sfeer, helpt om holistisch gezien deze efficiëntie slag wel te maken;

Voor de bedrijven die de IAAS/PAAS dienst leveren is het datacenter en zijn energie afname wel een grote kosten post, zeker gezien de schaal grote waar veel van dit soort infrastructuren op worden gebouwd. Het verhuizen en consolideren van IT services uit de bezemkast van niet-IT-bedrijven naar een Cloud computing provider levert op dit vlak dus een juiste prikkel.

Zoals in mijn vorige blog vermeld, krijgen de gebruikers van de IAAS/PAAS diensten ook een goede prikkel om efficiëntie programmeer code te gebruiken. De pay-per-use modellen in veel Cloud computing aanbiedingen zorgen daar voor;

“the art of efficiënt programming is lost… Cloud will bring it back… “

Groene software; door beïnvloeden van gedrag

Kijkend naar energie verbruik zien we dat elke Watt die bespaard word bij de bron uiteindelijk optelt tot een veelvoud daar van in de gehele energie keten van het datacenter. Emerson noemt dat het Cascade Effect.

Nu we zien dat de PUE langzaam richting de (theoretische) 1 zakt, word het zaak om te kijken welke winsten er nog meer te behalen zijn in het datacenter. Op facilitair vlak gaat daar bij de focus naar bijvoorbeeld WUE. Op het vlak van IT was men al bezig efficiënter om te gaan met de bronnen door de inzet van bijvoorbeeld virtualisatie en deduplicatie.

Dit alles speelt zich echter af op de IT infrastructuur laag. Wat zou er dus nog te behalen zijn bij de ‘echte’ bron in de ICT; software & applicatie ? Door hier te bezuinigen zouden we volgens het cascade effect een veelvoud moeten kunnen besparen.

Begin 2010 haalde ik al aan dat er diverse leveranciers, waar onder Microsoft, Intel en HP, bezig waren de mogelijkheden te verkennen door applicatie ontwikkelaars inzicht te geven in energie verbruik van hun systemen. Dit door middel van SDK’s.

In de afgelopen jaren, waar in ik verantwoordelijk was voor diverse omgevingen binnen commerciële en overheidsbedrijven, heb ik de kans gehad om te kunnen experimenteren met de gedachtes rond beïnvloeden van IT gebruik, door met name ICT-ers zelf. Dit door middel van transparantie in gebruik en verbruik van ICT en het ontwikkelen van kostenmodellen hier op.

Het recent opgerichte Knowledge Network Green Software is ook bezig met dit soort ontwikkeling. In aanloop naar een bijeenkomst hier over op 8 mei 2012 aanstaande, vast een samenvatting van mijn ervaringen rond de ontwikkeling van groene software door het beïnvloeden van verbruiksgedrag.

Gedachte

Door ontwikkelaars inzicht te geven in verbruik, kun je zorgen dat ze efficiënt omspringen met hun resources. Dit is een normale manier van benaderen als het gaat om IT resources zoals CPU, RAM en verbruik van Disk I/O’s. De software ontwikkelaar controleert de performance van het systeem (eventueel met een software tester) zodat zijn eind product op een normale manier functioneert.

De opkomst van kosten modellen is ook een trend in ICT. Zeker de introductie van virtualisatie heeft dit bevorderd. Aan de ene kant was er de wens vanuit de business om meer naar een pay-per-use model te gaan en andere andere kant was er soms de wens vanuit de IT afdeling om virtualisatie aan banden te leggen. De zo genaamde VM Sprawl is een bekende term voor ICT-ers, waarbij het gemak en de lage kosten voor een virtuele server (VM) leiden tot honderden virtuele servers waar niemand meer van wat van wie ze zijn en waarom ze er zijn. Om dit gedrag aan banden te leggen werden de kosten van VM’s in beeld gebracht en (automatisch) door belast.

Wat nu als we het performance inzicht voor de ontwikkelaar uitbreiden met inzicht in stroom verbruik en deze voorzien van een kosten prikkel door kWh te verrekenen?

Hier mee zou hetzelfde gedrag gestimuleerd kunnen worden als die bij de introductie van slimme energie meters in de thuis omgeving; gedrag sturen door inzicht te geven. Het werkt voor de Eneco Toon en de Nuon E-manager…

OTAP

De experimenten werden uitgevoerd in een zo geheten OTAP omgeving (of DTAP in het Engels). Dit is een belangrijke basis aangezien een dergelijke omgeving zorgt voor gecontroleerde uitrol van software en een consistente omgeving. Dit betekend dat de omgeving waar in ontwikkeld, getest en geaccepteerd wordt gelijk is aan de productie omgeving.

In de OTA omgeving word de software ontwikkelaar of database query schrijver voorzien van enkele standaard metrieke voor zijn software ontwikkelaar; CPU , RAM en disk I/O gebruik.

Deze werd aangevuld met kWh gegevens van de gebruikte systemen. Al deze gegevens waren (near-)realtime beschikbaar. Zo was het effect van wijzigen van enkele regels code of een query op een database direct terug te zien.

Om de juiste prikkel te creëren werden alle projecten die de OTA omgeving gebruikte belast voor het gebruik van hun resources. Dit op basis van een vast component rond afschrijving van de gebruikte hardware en het beheer daar van. Het flexibele component was het kWh verbruik. Het loont dus voor de project leider om systemen s ’nachts uit te laten schakelen en projectleden te stimuleren energie efficiënte code te schrijven.

Na de normale OTA procedure word de software in productie genomen. Tijdens de experimenten word in productie nogmaals gemeten. Dit om te controleren of aangepaste versies van de software (releases) werkelijk efficiënter zijn geworden in de productie situatie.

Technische setup

Om dit alles technisch te kunnen laten werken is wel een omgeving nodig waar in de feedback bijna realtime aan de ontwikkelaar gegeven kan worden.

Tijdens de diverse experimenten werd daar voor het volgende gebruikt:

HP c-class blades, IPMI, OA, ILO – De HP ILO en OA (voor blades) geeft realtime inzicht in energie verbruik. Deze gegevens zijn o.a. via scripts (SSH) op te vragen. Voor andere systemen kan men terug vallen op IPMI. Veel IPMI implementaties geven de mogelijkheid om energie verbruik te zien. Er zijn diverse (opensource) tools beschikbaar om IPMI gegevens op te vragen en te verwerken.
Windows, Linux – Als besturingssysteem (OS). Deze OS’en kunnen optioneel gebruik maken van Microsoft’s Joulemeter of Intel’s Energy Checker SDK.
Oracle DB , Java –
Als database en default programmeer taal.
HP OpenView, HP Insight Control – Voor collectie, verwerking en dashboarding van de gegevens. Uiteraard kan hier voor ook opensource producten zoals Cacti gebruikt worden.
VMware vCenter – Voor inzicht in virtuele systemen.
Visual Studio performance testing, HP LoadRunner (Mercury) – Deze ontwikkel tools bieden ook veel mogelijkheden om realtime gegevens rond performance en gebruik uit systemen te halen.

Aandachtspunten uit deze experimenten

Bovenstaande setups werkte uitstekend. Zonder veel moeite kon in de meeste gevallen 20% op de energie worden bespaard. Dit door de juiste query’s en code te schrijven. Ook werden de software ontwikkelaars scherper op het schrijven van code zonder al te veel overhead.

De experimenten kende ook de nodige (onopgeloste) uitdagingen:

– Virtualisatie; het meten van energie consumptie met IPMI op hardware niveau is goed te doen. Echter op VM niveau word het een stuk lastiger. VMware heeft al wat werk op dit vlak gedaan met hun Host Power Management in vSphere 5. Zie: http://www.vmware.com/files/pdf/hpm-perf-vsphere5.pdf. Dit is een goede eerste stap, maar verdient nog nadere uitwerking. Af en toe week het totaal energie verbruik op hardware niveau af van het totaal verbruik van de VM’s of werden er cijfers gerapporteerd die niet realistisch aan voelde. Microsoft is ook aardig op weg met hun Joulemeter. De whitepapers van dit Microsoft team zijn echt een must-read voor energie verbruik&virtualisatie. Het is echter onbekend wat de integratie is met Hyper-V. Energie meting opties met KVM of Xen zijn niet gevonden ten tijden van de testen.

– Slechte interne sensoren; in navolging van de afwijkende getallen in de virtualisatie omgeving rond energie verbruik zijn er controles gedaan met externe, geijkte, energie meters. Hierbij bleek er soms 40% verschil te zitten tussen de door HP OA/ILO gerapporteerde energie afname op hardware niveau en de externe meter. Al met al werd geconcludeerd dat in sommige hardware implementaties kwalitatief slechte sensoren gebruikt worden.

– OTAP gedachte; bovenstaande experimenten werken alleen als alle stappen uit het OTAP proces voorzien zijn van dezelfde of nagenoeg zelfde omgevingen. Dit betekend dat men niet alleen software op release matige manier moet uitrollen, maar ook zo met infrastructuur moet omgaan. Hierbij waren tijdens de testen bijvoorbeeld verschillen te zien in energie efficiënte query’s die wel goed werkte in OTA maar in productie niet. Daarbij bleek productie in 1 geval voorzien te zijn van een Oracle database patch die niet in OTA aanwezig was. Deze had een 60% stijging in energie tot gevolg. In een ander geval waren het enkele ontbrekende Microsoft Windows patches.

– Keten denken & architectuur; al snel bleek dat energie verbruik en efficiëntie ook mee genomen moet worden in de totale architectuur van een applicatie en zijn infrastructuur. Juist bij de applicaties die meerdere systemen gebruiken om hun functionaliteit te kunnen leveren, zijn grote besparingen te halen. Een 3 lagen architectuur is daarbij niet vreemd tegenwoordig; database – applicatie – webserver. De focus tijdens de ontwikkeling dient dus breder te zijn dan enkel die ene query op de database. Hierbij kunnen integratie en test specialisten op infrastructuur en software niveau een rol spelen.

– Keuze van hardware, software; de testen en experimenten zijn uitgevoerd met componenten die op dit moment vast lagen in de architectuur en standaarden. Er is niet gekeken welke effecten de selectie van hardware, besturingssysteem, middelware, database, programmeertaal of programma framework heeft op de energie afname. Wel kwam de keuze voor hardware tot stand door SPECpower als selectie onderdeel te gebruiken.

– Open en integratie; integratie tussen de diverse IT lagen was de sleutel om dit geheel inzichtelijk te krijgen. De schakel die echter miste was de integratie met het fysieke datacenter. Zo levert de PDU en andere elementen uit de energie keten ook kWh gegevens. Deze waren lastig te integreren, zeker als het gaat om protocollen als Modbus.

Integratie en keten denken

Dit laatste punt word ook in de DatacenterPulse Top10 (PDF) voor 2012 aangehaald:

10. CONVERGED INFR. INTELLIGENCE • UPDATE: The Data Center Infrastructure is becoming a complex machine requiring connection up the stack • Treat the DC infrastructure as an IT system • Converge in the infrastructure instrumentation and control systems • Connect it into the IT systems for ultimate control• Standardize connections and protocols to connect components • What’s measured and controlled will be addressed and tuned

Daarbij is de laatste de ‘oneliner’ waar het allemaal om draait: “What’s measured and controlled will be addressed and tuned”

Ondanks bovenstaande uitdagingen zullen we zien dat de komende jaren er steeds efficiëntere software zal ontstaan. Dit zal misschien niet direct gedreven worden vanuit de enkele centen die bespaard worden op de kWh maar meer vanuit het pay-per-use kosten model waar Cloud computing ons mee confronteert. De uitrol van inefficiënte software code of frameworks zal direct een hogere rekening van Amazon (AWS) opleveren. En niets werkt zo stimulerend als dat…

CeBit 2012 & Datacenter

Afgelopen CeBit 2012 leek ook wat last van de crisis te hebben. Hal delen waren leeg en het leek rustiger dan anders.

Op datacenter gebied waren er ook geen schokkende zaken. De grote IT spelers zetten massaal in op de cloud hype, de datacenter leveranciers op groen en modulair. Niets nieuws onder de zon dus.

Enkele interessante punten:

Temperatuur

Dell kondigde hun 12e generatie servers aan. Niet alleen ondersteunen deze servers weer nieuwe CPU’s en zijn ze nog weer krachtiger en efficiënter, maar deze servers ondersteunen ook Dell’s Fresh Air initiatief. Hierbij worden de werking van de servers gegarandeerd bij temperaturen (inlet) tot 35C constant. Voor 900 uur per jaar is 40C acceptabel en voor 90 uur per jaar 45C. Dit alles dus onder volledige garantie van Dell.

Voor lage temperaturen (zoals onze afgelopen winter) geld een standaard temperatuur van 10C, 900 uur van 5C en 90 uur van –5C onder garantie.

Met deze temperatuur condities is het dus mogelijk om de servers in het Nederlandse klimaat op gefiltreerde buitenlucht te laten draaien voor de meeste tijd. Enkel in winter condities moeten we er nog voor zorgen dat het niet onder de garantie grenzen komt.

Niet alleen de server lijnen van Dell (PowerEdge) maar ook opslag en netwerk componenten zullen de komende tijd het Fresh Air initiatief gaan ondersteunen.

De firma Iceotope kondigde een ready-for-market vloeistof koelsysteem voor servers aan op deze CeBit;

The worlds first truly modular liquid cooled data center. We’ve combined servers and liquid cooling into a scalable 2N solution ready to slot into any standard data center globally.

The Iceotope solution comprises a range of compute modules from Intel, AMD and other vendors inside a high density, low power and fully cooled cabinet.

Liquid is vastly more capable of transferring heat. It conducts the heat far better and needs much less energy to be moved around. Using liquid allows us to take heat away from the electronics quickly, efficiently and effectively.

We use 3M™ Novec™ because of its excellent convection properties, inert nature and its ability to insulate electrically.

Ze zijn niet de eerste die een dergelijke oplossing in de markt proberen te zetten. Onder andere Hardcorecomputer en Green Revolution Cooling gingen hun ook al voor met een product op vloeistof gekoelde basis. Het aardige van de Iceotope oplossing is dat deze tot 45C koelvloeistof accepteerd, waarna deze met een deltaT van +5C retour komt. Hier door is het goed inzetbaar met buitenlucht gebruik en in warme klimaat omstandigheden.

De uitdaging met dit soort oplossingen is het huidige eco-systeem van de datacentermarkt; alles is ingericht op luchtkoeling waarbij standaard componenten (zoals het 19’’ rack) er voor zorgen dat een heel scala aan leveranciers met elkaar kan samen werken. Hierbij heeft de klant, na de keuze van zijn rack (Minkels, APC, Rittal, …) , bijvoorbeeld de keuze uit diverse server leveranciers (IBM, HP, Dell, …). De start van het Iceotope model gaat er vanuit dat je het hele systeem inclusief servers, koeling etc.. bij hun afneemt wegens o.a. de lekdichtheid van het systeem.

Video Iceotope hier.

LED

LED kent een steeds grotere aanwezigheid voor verlichtingsdoeleinden. Naast het feit dat er diverse Aziatische bedrijven met LED oplossingen op CeBit waren, waren er ook diverse LED toepassingen voor datacenter verlichting. Rittal toonde op de stand een mooie blauwe LED oplossing, maar er waren ook toepassingen te zien die werkelijk voldoende licht opbrengst hadden om goed bij te kunnen werken en er ook nog eens ‘fancy’ uitzagen.

Gezien de energie besparingen die haalbaar zijn met technieken zoals LED en TL5 zit er zeker toekomst in voor datacenter gebruik. Enige aandachtspunt bij deze technieken is mogelijke verstoring op databekabeling.

C13 PDU Lock

Veel PDU leveranciers dit jaar op de CeBit. Enkele daarvan toonde een handige clip om te zorgen dat je C13/14 stekker niet meer uit de PDU kan vallen als je bijvoorbeeld onderhoud moet doen aan je systeem of extra kabels moet installeren.

Dit soort oplossingen waren er wel eerder met een losse opzet clip, maar die werd vaak vergeten of was niet echt werkbaar. Deze is beter geïntegreerd.

Modulair

Uiteraard weer de nodige modulaire en container oplossingen. De Rittal container (voorheen Lampertz) mocht natuurlijk niet ontbreken. De Chinese firma Huawei had ook een datacenter container mee gebracht. De IDS1000 AIO (PDF).

De PUE zou 1,4 bedragen voor deze 80kW container. Maximaal 10kW per rack word ondersteund. De container kan geplaatst worden in buiten condities van –40C tot 55C. Beschikbaarheid ligt op TIER III niveau volgens de spec’s.

Op zich niet heel erg spannend qua ontwerp en uitvoering, maar gezien de mogelijkheid voor Chinese bedrijven om gro
te volumes goedkoop te produceren zeker interessant.

Filmpje van de container oplossing hier.

DCIM

APC liet hun software oplossing zien voor datacenter beheer. De zo geheten DCIM. De samenvoeging van informatie die relevant is voor IT en facilitair personeel in een enkel platform levert zeker zijn voordelen op. De markt is aardig aan het opwarmen voor dit soort oplossingen en nu er meer spelers komen worden de oplossingen steeds beter uitgewerkt en de prijzen interessanter.

Ook Huawei toonde een dergelijke oplossing in beta fase die samen met hun bovengenoemde container werd geleverd. Het beta product zag er al veel belovend uit.

Opkomst van Azie (en vooral China)

Naast de boven genoemde Huawei waren er traditie getrouw een hoop Aziatische bedrijven op de CeBit. Waar de focus voorheen lag op databekabeling en elektronica, waren er dit jaar diverse Aziatische bedrijven die datacenter componenten aan de man probeerde te brengen. Denk hier bij vooral aan IT racks (compleet met warme/koude scheiding), bekabelingskasten en PDU’s. Sommige leveranciers waren nog niet op het kwaliteit niveau dat we gewent zijn in Europa, maar een aantal kwamen aardig in de buurt.

Het is duidelijk dat ze goed gekeken hebben naar de Europese producten en onder het motto ‘beter goed gejat, dan slecht bedacht’ nu proberen de datacenter markt binnen te dringen.

Waarom de stap van Facebook interessant is…

De afgelopen week kondigde Facebook het Open Compute initiatief aan. Kort gezegd publiceerde ze al hun bouw tekeningen van hun datacenter en servers met de mededeling; hier is het, gebruik het, verbeter het…

In de dagen daar op kwam er een stort vloed aan blogs en twitter discussies los over het initiatief. Na een flinke discussie afgelopen weekend met DatacenterPulse leden en cloud guru’s; waarom dit initiatief relevant is voor jou (Enterprise) IT omgeving:

Facebook behoord tot mijn rijtje van bedrijven uit de Formule 1 der ICT; bedrijven die ver vooruit zijn in technologische ontwikkeling, vaak vanwege schaalgrote, en waar van op termijn interessante technologie ontstaat die bruikbaar is voor Enterprise IT en later MKB IT.

Deze bedrijven hebben in de afgelopen jaren diverse interessante ontwikkelingen naar buiten gebracht, die nu al een impact hebben op enterprise IT omgevingen. Een voorbeeld is de inzet van datacenter containers; Uiteraard is de inzet van een container vol met 1500 servers een maatje te groot voor de meeste organisaties. Echter het modulaire concept er achter heeft de manier van datacenter bouw veranderd. Zeker als je kijkt naar het huidige portfolio van veel datacenter aanbieders, zie je altijd wel ergens een ‘modulaire’ oplossing. Deze innovatie is dus opgepakt door de eindgebruikers en de markt, waarna er producten op door ontwikkeld zijn die toepasbaar zijn voor de ‘normale’ IT.

Een ander voorbeeld is de ontwikkeling van Hadoop voor de afhandeling van grote hoeveelheden data. Over dit ‘data olifantje’ schreef ik al eerder.

Facebook geeft nu een kijkje in de keuken van hun datacenter en servers. Google deed dit in het verleden ook al een beetje, maar nooit zo open als Facebook nu doet. In eerste instantie staat het volledig (bouwkundig) ontwerp van het Facebook datacenter op internet samen met de server ontwerpen.

De aankondiging van Facebook over hun Open Compute initiatief werd gematigd positief ontvangen; de vraag was vooral wat moet ik (als normale IT-er) er mee.

Laat ik duidelijk zijn; Het 1-op-1 kopiëren van de ontwerpen van Facebook zal een Enterprise IT omgeving niet helpen. Net zo min als het zal helpen om de Google of Amazon ontwerpen te kopiëren (als die publiek bezit zouden zijn). Hier voor zijn de omgevingen en doelen te verschillend.

Er is echter wel een hoop van te leren voor de enterprise IT-er. Door kritisch te kijken in de keuken van de Formule 1 kun je zien waarom het hun lukt om wel 10.000-en servers met enkele beheerders in de lucht te houden, deze snel uit te rollen en hogere beschikbaarheid te bieden tegen lagere kosten.

Daarnaast geeft het ons als eind gebruiker inzicht in de technologische mogelijkheden. Hiermee kunnen we de leveranciers onder druk zetten om ons betere producten te leveren. Dit past geheel in de visie die DatacenterPulse na streeft: “influence the datacenter industry through end users”

Twee andere Open* ontwikkelingen die de moeite waard zijn van het volgen op het IT vlak zijn:

www.openstack.org

OpenStack is a collection of open source technologies delivering a massively scalable cloud operating system. OpenStack is currently developing two interrelated projects: OpenStack Compute and OpenStack Object Storage. OpenStack Compute is software to provision and manage large groups of virtual private servers, and OpenStack Object Storage is software for creating redundant, scalable object storage using clusters of commodity servers to store terabytes or even petabytes of data.

www.openflow.org

OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-defined "forwarding instruction set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors.

Deze laatste heeft vooral mijn interesse omdat er wel netwerk innovatie is geweest op snelheid (10G, etc..) maar niet echt een open initiatief op het gehele eco-systeem. Hiermee lijkt het netwerk ook echt aansluiting te gaan vinden met de cloud ontwikkelingen op opslag en compute gebied.

Uiteindelijk zien we een heleboel interessante ontwikkelingen aan de horizon, die Enterprise IT gaan helpen. Zeker bij de bouw van private cloud en focus op ‘big data’.

Infrastructuur commodity ?

De Facebook openheid op dit vlak geeft ook nog een andere trend aan; het hebben van een datacenter en IT infrastructuur word steeds minder een strategisch voordeel. Feitelijk kan de Facebook concurrentie het model compleet kopiëren, maar Facebook geeft eigenlijk aan dat dit geen bedreiging zal zijn voor hun business. Ze richten zich op hun belangrijkste waarde; de data en hun eind gebruikers. Deze leveren wel het strategisch voordeel, zoals ook de Huffington Post concludeert:

"[The Open Compute Project] really is a big deal because it constitutes a general shift in terms of what how we look at technology as a competitive advantage," O’Grady said. "For Facebook, the evidence is piling up that they don’t consider technology to be a competitive advantage. They view their competitive advantage in the marketplace to be their users."

Die conclusie werd ook al getrokken rond infrastructuur software (zoals database) bij Facebook en enkele andere, rond de openheid van hun ontwikkelingen;

For Facebook, the value is not in the infrastructure – though Hip-Hop demonstrates the value of even marginal improvements in performance for high scale players – it is in the users and the data they generate. As Tim O’Reilly famously put it, “data is the Intel Inside.”
Read more: http://redmonk.com/sogrady/2011/03/11/how-important-is-software/#ixzz1JFQUwNwa

En dat alles zou wel eens de belangrijkste conclusie kunnen zijn van al deze publicaties en discussies; het gaat om de data… en de IT er om heen word een commodity. Met dank aan de Formule 1 van IT en cloud computing…

Meer van mijn cloud collega’s:

Google Admits "Data is the Intel Inside" < Tim O’Reilly, uit 2007!!
Cloud Computing, Open* and the Integrator’s Dilemma
Poke Me On…
Facebook’s New Data Center – What can we learn from it?