JanWiersma.com – Page 3 – Cloudy strategies…

Where is the rack density trend going ?…

When debating capacity management in the datacenter the amount of watts consumed per rack is always a hot topic.

Years ago we could get away with building datacenters that supported 1 or 2 kW per rack in cooling and energy supply. The last few years demand for racks around 5-7kW seems the standard. Five years ago I witnessed the introduction of blade servers first hand. This generated much debate in the datacenter industry with some claiming we would all have 20+ kW racks in 5 years. This never happened… well at least not on a massive scale…

So what is the trend in energy consumption on a rack basis ?

Readers of my Dutch datacenter blog know I have been watching and trending energy development in the server and storage industry for a long time. To update my trend analysis I wanted to start with a consumption trend for the last 10 years. I could use the hardware spec’s found on the internet for servers but most published energy consumption values are ‘name-plate ratings’. Green Grid’s whitepaper #23 states correctly:

Regardless of how they are chosen, nameplate values are generally accepted as representing power consumption levels that exceed actual power consumption of equipment under normal usage. Therefore, these over-inﬂated values do not support realistic power prediction

I have known HP’s Proliant portfolio for a long time and successfully used their HP Power Calculator tools (now: HP Power Advisor). They display the nameplate values as well as power used at different utilizations and I know from experience these values are pretty accurate. So; that seems as good starting point as any…

I decide to go for 3 form factors:

1U servers (or pizzabox servers)
Blade servers. Selecting basic x86 systems
Density Optimized servers. These can be defined as;

..minimalist server designs that resemble blades in that they have skinny form factors but they take out all the extra stuff that hyperscale Web companies like Google and Amazon don’t want in their infrastructure machines because they have resiliency and scale built into their software stack and have redundant hardware and data throughout their clusters….These density-optimized machines usually put four server nodes in a 2U rack chassis or sometimes up to a dozen nodes in a 4U chassis and have processors, memory, a few disks, and some network ports and nothing else per node.[They may include low-power microprocessors]

For the 1U server I selected the HP DL360. A well know mainstream ‘pizzabox’ server. For the blade servers I selected the HP BL20p (p-class) and HP BL460c (c-class). The Density Optimized Sever could only be the recently introduced (5U) HP Moonshot.

For the server configurations guidelines:

Single power supply (no redundancy) and platinum rated when available.
No additional NICs or other modules.
Always selecting the power optimized CPU and memory options when available.
Always selecting the smallest disk. SSD when available.
Blade servers enclosures

Pass-through devices, no active SAN/LAN switches in the enclosures
No redundancy and onboard management devices.
C7000 for c-class servers
Converted the blade chassis power consumption, fully loaded with the calculated server, back to power per 1U.

Used the ‘released’ date of the server type found in the Quickspec documentation.
Collected data of server utilization at 100%, 80%, 50%. All converted to the usage at 1U for trend analysis.

This resulted in the following table:

Server type	Year	CPU Core count	CPU type	RAM (GB)	HD (GB)	100% Util (Watt for 1U)	80% Util (Watt for 1U)	50% Util (Watt for 1U)
HP BL20p	2002	1	2x Intel PIII	4	2x 36	328.00
HP DL360	2003	1	2x Intel PII	4	2x 18	176.00
HP DL360G3	2004	1	2x Intel Xeon 2,8Ghz	8	2x 36	360.00
HP BL20pG4	2006	1	2x Intel Xeon 5110	8	2x 36	400.00	&n bsp;
HP BL460c G1	2006	4	2x Intel L5320	8	2x 36	397.60	368.80	325.90
HP DL360G5	2008	2	2x Intel L5240	8	2x 36	238.00	226.00	208.00
HP BL460c G5	2009	4	2x Intel L5430	8	2x 36	368.40	334.40	283.80
HP DL360G7	2011	4	2x Intel L5630	8	2x 60 SSD	157.00	145.00	128.00
HP BL460c G7	2011	6	2x Intel L5640	8	2x 120 SSD	354.40	323.90	278.40
HP BL460c Gen8	2012	6	2x Intel 2630L	8	2x 100 SSD	271.20	239.10	190.60
HP DL360e Gen8*	2013	6	2x Intel 2430L	8	2x 100 SSD	170.00	146.00	113.00
HP DL360p Gen8*	2013	6	2x Intel 2630L	8	2x 100 SSD	252.00	212.00	153.00
HP Moonshot	2013	2	Intel Atom S1260	8	1x 500	177.20	172.40	165.20

* HP split the DL360 in to a stripped down version (the ‘e’) and an extended version (the ‘p’)

And a nice graph (click for larger one):

The graph shows an interesting peak around 2004-2006. After that the power consumption declined. This is mostly due to power optimized CPU and memory modules. The introduction of Solid State Disks (SSD) is also a big contributor.

Obviously people will argue that:

the performance for most systems is task specific
and blades provide higher density (more CPU cores) per rack,
and some systems provide more performance and maybe more performance/Watt,
etc…

Well; datacenter facility guys couldn’t care less about those arguments. For them it’s about the power per 1U or the power per rack and its trend.

With a depreciation time of 10-15years on facility equipment, the datacenter needs to support many IT refresh cycles. IT guys getting faster CPU’s, memory and bigger disks is really nice and it’s even better if the performance/watt ratio is great… but if the overall rack density goes up, than facilities needs to supp
ort it.

To provide more perspective on the density of the CPU/rack, I plotted the amount of CPU cores at a 40U filled rack vs. total power at 40U:

Still impressive numbers: between 240 and 720 CPU cores in 40U of modern day equipment.

Next I wanted to test my hypotheses, so I looked at a active 10.000+ server deployment consisting of 1-10 year old servers from Dell/IBM/HP/SuperMicro. I ranked them in age groups 2003-2013, sorted the form factors 1U Rackmount, Blades and Density Optimized. I selected systems with roughly the same hardware config (2 CPU, 2 HD, 8GB RAM). For most age groups the actual power consumption (@ 100,80,50%) seemed off by 10%-15% but the trend remained the same, especially among form factors.

It also confirmed that after the drop, due to energy optimized components and SSD, the power consumption per U is now rising slightly again.

Density in general seemed to rise with lots more CPU cores per rack, but at a higher power consumption cost on a per rack basis.

Let’s take out the Cristal ball

The price of compute & storage continues to drop, especially if you look at Amazon and Google.

Google and Microsoft have consistently been dropping prices over the past several months. In November, Google dropped storage prices by 20 percent.

For AWS, the price drops are consistent with its strategy. AWS believes it can use its scale, purchasing power and deeper efficiencies in the management of its infrastructure to continue dropping prices. [Techcrunch]

If you follow Jevons Paradox then this will lead to more compute and storage consumption.

All this compute and storage capacity still needs to be provisioned in datacenters around the world. The last time IT experienced growth pain at the intersection between IT & Facility it accelerated the development of blade servers to optimize physical space used. (that was a bad cure for some… but besides the point now..) The current rapid growth accelerated the development of Density Optimized servers that strike a better balance between performance, physical space and energy usage. All major vendors and projects like Open Compute are working on this with a 66.4% year over year in 4Q12 growth in revenue.

Blades continue to get more market share also and they now account for 16.3% of total server revenue;

"Both types of modular form factors outperformed the overall server market, indicating customers are increasingly favoring specialization in their server designs" said Jed Scaramella, IDC research manager, Enterprise Servers "Density Optimized servers were positively impacted by the growth of service providers in the market. In addition to HPC, Cloud and IT service providers favor the highly efficient and scalable design of Density Optimized servers. Blade servers are being leveraged in enterprises’ virtualized and private cloud environments. IDC is observing an increased interest from the market for converged systems, which use blades as the building block. Enterprise IT organizations are viewing converged systems as a method to simplify management and increase their time to value." [IDC]

With cloud providers going for Density Optimized and enterprise IT for blade servers, the market is clearly moving to optimizing rack space. We will see a steady rise in demand for kW/rack with Density Optimized already at 8-10kW/rack and blades 12-16kW/rack (@ 46U).

There will still be room in the market for the ‘normal’ rackmount server like the 1U, but the 2012 and 2013 models already show signs of a rise in watt/U for those systems also.

For the datacenter owner this will mean either supply more cooling&power to meet demand or leave racks (half) empty, if you haven’t build for these consumption values already.

In the long run we will follow the Gartner curve from 2007:

With the market currently being in the ‘drop’ phase (a little behind on the prediction…) and moving towards the ‘increase’ phase.

Density Optimized servers (aka microservers) market is booming

IDC starts tracking hyperscale server market

Documentation and disclaimer on the HP Power Advisor

Google’s BMS got hacked. Is your datacenter BMS next ?

A recent USA Congressional survey stated that power companies are targeted by cyber attacks 10.000x per month.

After the 2010 discovery of the Stuxnet virus the North American Electric Reliability Corporation (NERC) established both mandatory standards and voluntary measures to protect against such cyber attacks, but most utility providers haven’t implemented NERC’s voluntary recommendations.

Stuxnet hit the (IT) newspaper front-pages around September 2010, when Symantec announced the discovery. It represented one of the most advanced and sophisticated viruses ever found. One that targeted specific PLC devices in nuclear facilities in Iran:

Stuxnet is a threat that was primarily written to target an industrial control system or set of similar systems. Industrial control systems are used in gas pipelines and power plants. Its final goal is to reprogram industrial control systems (ICS) by modifying code on programmable logic controllers (PLCs) to make them work in a manner the attacker intended and to hide those changes from the operator of the equipment.

DatacenterKnowledge picked up on it in 2011, asking ‘is your datacenter ready for stuxnet?’

After this article the datacenter industry didn’t seem to worry much about the subject. Most of us deemed the chance of being hacked with a highly sophisticated virus ,attacking our specific PLC’s or facility controls, very low.

Recently security company Cylance published the results of a successful hack attempt on a BMS system located at a Google office building. This successful hack attempt shows a far greater threat for our datacenter control systems.

The road towards TCP/IP

The last few years the world of BMS & SCADA systems radically changed. The old (legacy) systems consisted of vendor specific protocols, specific hardware and separate networks. Modern day SCADA networks consist of normal PC’s and servers that communicate through IT standard protocols like IP, and share networks with normal IT services.

IT standards have also invaded facility equipment: The modern day UPS and CRAC is by default equipped with an onboard webserver able of send warning using an other IT standard: SNMP.

The move towards IT standards and TCP/IP networks has provided us with many advantages:

Convenience: you are now able to manage your facility systems with your iPad or just a web browser. You can even enable remote access using Internet for your maintenance provider. Just connect the system to your Internet service provider, network or Wi-Fi and you are all set. You don’t even need to have the IT guys involved…
Optimize: you are now able to do cross-system data collection so you can monitor and optimize your systems. Preferably in an integrated way so you can have a birds-eye view of the status of your complete datacenter and automate the interaction between systems.

Many of us end-users have pushed the facility equipment vendors towards this IT enabled world and this has blurred the boundary between IT networks and BMS/SCADA networks.

In the past the complexity of protocols like Bacnet and Modbus, that tie everything together, scared most hackers away. We all relied on ‘security through obscurity’ , but modern SCADA networks no longer provide this (false) sense of security.

Moving towards modern SCADA.

The transition towards modern SCADA networks and systems is approached in many different ways. Some vendors implemented embedded Linux systems on facility equipment. Others consolidate and connected legacy systems & networks on standard Windows or Linux servers acting as gateways.

This transition has not been easy for most BMS and SCADA vendors. A quick round among my datacenter peers provides the following stories:

BMS vendors installing old OS’s (Windos/Linux) versions because the BMS application doesn’t support the updated ones.
BMS vendors advising against OS updates (security, bug fix or end-of-support) because it will break their BMS application.
BMS vendors unable to provide details on what ports to enable on firewalls; ‘ just open all ports and it will work’.
Facility equipment vendors without software update policies.
Facility equipment vendors without bug fix deployment mechanisms; having to update dozens of facility systems manually.

And these stories all apply to modern day, currently used, BMS&SCADA systems.

Vulnerability patching.

Older versions of the SNMP protocol have known several vulnerabilities that affected almost every platform, included Windows/Linux/Unix/VMS, that supported the SNMP implementation.

It’s not uncommon to find these old SNMP implementations still operational in facility equipment. With the lack of software update policies, that also include the underlying (embedded) OS, new security vulnerabilities will also be neglected by most vendors.

The OS implementation from most BMS vendors also isn’t hardened against cyber attacks. Default ports are left open, default accounts are still enabled.

This is all great news for most hackers. It’s much easer for them to attack a standard OS like a Windows or Linux server. There are lots of tools available to make the life of the hacker easer and he doesn’t have to learn complex protocols like Modbus or Bacnet. This is by far the best attack surface in modern day facility system environments.

The introduction of DCIM software will move us even more from the legacy SCADA towards an integrated & IT enabled datacenter facility world. You will definitely want to have your ‘birds-eye DCIM view’ of your datacenter anywhere you go, so it will need to be accessible and connected. All DCIM solutions run on mainstream OS’s, and most of them come with IT industry standard databases. Those configurations provide an other excellent attack surface, if not managed properly.

ISO 27001

Some might say: ‘I’m fully covered because I got an ISO 27001 certificate’.

The scope of ISO27001 audit and certificate is set by the organization pursuing the certification. For most datacenter facilities the scope is limited to the physical security (like access control, CCTV) and its processes and procedures. IT systems and IT security measures are excluded because those are part of the IT domain and not facilities. So don’t assume that BMS and SCADA systems are included in most ISO 27001 certified datacenter installations.

Natural evolution

Most of the security and management issues are a normal part of the transition in to a larger scale, connected IT world for facility systems.

The same lack of awareness on security, patching, managing and hardening of systems has been seen by the IT industry 10-15 year ago. The move from a central mainframe world to decentralized servers and networks, combined with the introduction of the Internet has forced IT administrators to focus on managing the security of their systems.

In the past I have heard Facility departments complain that IT guys should involve them more because IT didn’t understand power and cooling. With the introduction of a more software enabled datacenter the Facility guys now need to do the same and get IT more involved; they have dealt with all of this before…

Examples of what to do:

Separate your systems and divide the network. Your facility system should not share its network with other (office) IT services. The separate networks can be connected using firewalls or other gateways to enable information exchange.
Assess your real needs: not everything needs to be connected to the Internet. If facility systems can’t be hardened by the vendor or your own IT department, then don’t connect them to the Internet. Use firewalls and Intrusion Detection Systems (IDS) to secure your system if you do connect them to the Internet.
Involve your IT security staff. Have facilities and IT work together on implementing and maintaining your BMS/SCADA/DCIM systems.
Create awareness by urging your facility equipment vendor or DCIM vendor to provide a software update & security policy.
Include the facility-systems in the ISO 27001 scope for policies and certification.
Make arrangements with your BMS and/or DCIM vendor about management of the underlying OS and its management. Preferably this is handled by your internal IT guys who already should know everything about patching IT systems and hardening them. If the vendor provides you with an appliance, then the vendor needs to manage the patching process and hardening of the system.

If you would like to talk about the future of securing datacenter BMS/SCADA/DCIM systems than join me at Observe Hack Make (OHM) 2013. IOHM is a five-day outdoor international camping festival for hackers and makers, and those with an inquisitive mind. Starts July 31st 2013.

Note:
There are really good whitepapers on IDS systems (and firewalls) for securing Modbus and Bacnet protocols, if you do need to connect those networks to the internet. Example: Snort IDS for SCADA (pdf) or books about SCADA & security at Amazon.

Source:
A large part of this blog is based on a Dutch article on BMS/SCADA security January 2012 by Jan Wiersma & Jeroen Aijtink (CISSP). The Dutch IT Security Association (PViB) nominated this article for ‘best security article of 2012’.

Hivos Expert meeting datacentra

Hivos organiseert een Expert meeting, waar in Datacenter Pulse ook deel zal nemen:

Op vrijdag 28 juni organiseert Hivos een “Expert meeting data centra: Trends in de sector.”

Hivos nodigt u uit om mee te discussieren over prikkelende stellingen en visies van experts uit de sector.

Wilt u zelf een stelling verdedigen of aanvallen? Dan kan ook.

Meer informatie over de bijeenkomst vindt u in de uitnodiging (zie bijlage).

U kunt zich aanmelden door voor 22 juni een bericht te sturen aan expertmeeting@hivos.nl met daarin uw naam en uw bedrijf/organisatie.

De uitnodiging is hier: uitnodiging-def.pdf

DCIM–Its not about the tool; its about the implementation

Failure Success Road Sign

So you just finished your extensive purchase process and now the DCIM DVD is on your desk.

Guess what; the real work just started…

The DCIM solution you bought is just a tool, implementing it will require change in your organization. Some of the change will be small; for example no longer having to manually put data in an Excel file but have it automated in the DCIM tool. Some of the change will be bigger like defining and implementing new processes and procedures in the organization. A good implementation will impact the way everyone works in your datacenter organization. The positive outcome of that impact is largely determined by the way you handle the implementation phase.

These are some of the most important parts you should consider during the implementation period:

Implementation team.

The implementation team should consist of at least:

A project leader from the DCIM vendor (or partner).
An internal project leader.
DCIM experts from the DCIM vendor.
Internal senior users.

(Some can be combined roles)

Some of the DCIM vendors will offer to guide the implementation process themselves others use third party partners.

During your purchase process its important to have the DCIM vendor explain in detail how they will handle the full implementation process. Who will be responsible for what part? What do they expect from your organization? How much time do they expect from your team? Do they have any reference projects (same size & complexity?)?

The DCIM vendor (or its implementation partner) will make implementation decisions during the project that will influence the way you work. These decisions will give you either great ease of working with the tool or day-to-day frustration. Its important that they understand your business and way of working. Not having any datacenter experience at all will not benefit your implementation process, so make sure they supply you with people that know datacenter basics and understand your (technical) language.

The internal senior users should be people that understand the basic datacenter parts (from a technical perspective) and really know the current internal processes. Ideal candidates are senior technicians, your quality manager, backoffice sales people (if you’re a co-lo) and site/operations managers.

The internal senior users also play an important role in the internal change process. They should be enthusiast early adapters who really want to lead the change and believe in the solution.

Training.

After you kicked off your implementation team, you should schedule training for your senior users and early adaptors first. Have them trained by the DCIM vendor. This can be done on dummy (fictive) data. This way your senior users can start thinking about the way the DCIM software should be used within your own organization. Include some Q&A and ‘play’ time at the end of the training. Having a sandbox installation of the DCIM software available for your senior users after the training also helps them to get more familiar with the tool and test some of their process ideas.

After you have done the loading of your actual data and you made your process decisions surrounding the DCIM tool, you can start training all your users.

Some of the DCIM vendors will tell you that training is not needed because their tool is so very user friendly. The software maybe user friendly but your users should still need to be trained on the specific usage of the tool within your own organization.

Have the DCIM vendor trainer team up with your senior users in the actual training. This way you can make the training specific for your implementation and have the senior users at hand to answer any organization specific questions.

The training of general users is an important part of the change and process implementation in your organization.

Take any feedback during the general training seriously. Provide the users with a sandbox installation of the software so they can try things without breaking your production installation and data. This will give you broad support for the new way of working.

Data import and migration.

Based on the advise in the first article , you will already have identified your current data sources.

During the implementation process the current data will need to be imported in to the DCIM data structure or integrated.

Before you import you will need to assess your data; are all the Excel, Visio and AutoCAD drawings accurate? Garbage import means garbage output in the DCIM tool.

Intelligent import procedures can help to clean your current data; connecting data sources and cross referencing them will show you the mismatches. For example: adding (DCIM) intelligence to importing multiple Excel sheets with fiber cables and then generating a report with fiber ports that have more than 1 cable connected to them (which would be impossible i.r.l ).

Your DCIM vendor or its partner should be able to facilitate the import. Make sure you cover this in the procurement negotiations; what kind of data formats can they import? Should you supply the data in a specific format?

This also brings us back to the basic datacenter knowledge of the DCIM vendor/partner. I have seen people import Excel lists of fiber cable and connect them to PDU’s… The DCIM vendor/partner should provide you a worry free import experience.

Create phases in the data import and have your (already trained) senior users preform acceptance tests. They know your datacenter layout and can check for inconsistencies.

Prepare to be flexible during the import; not everything can be modeled the way you want it in the software.

For example when I bought my first DCIM tool in 2006 they couldn’t model blade servers yet and we needed a work around for it. Make sure the workarounds are known and supported by the DCIM vendor; you don’t want to create all your workaround assets again when the software update finally arrives that supports the correct models. The DCIM vendor should be able to migrate this for you.

Integration.

The first article did a drill down of the importance of integration. Make sure your DCIM vendor can accommodate your integration wishes.

Integration can be very complex and mess-up your data (or worse) if not done correctly. Test extensively, on non-production data, before you go live with integration connections.

The integration part of the implementation process is very suitable for a phased approach. You don’t need all the integrations on day one of the implementation.

Involve IT Information architects if you have them within your company and make sure external vendors of the affected systems are connected to the project.

Roadmap and influence.

Ask for a software development roadmap and make sure your wishes are included before you buy. The roadmap should show you when new features will be available in the next major release of you
r DCIM tool.

The DCIM vendor should also provide you with a release cycle displaying the scheduled minor releases with bug fixes. When you receive a new release it should include release-notes mentioning the specific bugs that are fixed and the new features included in that new release. Ask the DCIM vendor for an example of the roadmap and release-notes.

During the purchase process you may have certain feature requests that the vendor is not able to fulfill yet. Especially new technology, like the blade server example I used earlier, will take some time to appear in the DCIM software release. This is not a big problem as long as the DCIM vendor is able to model it within reasonable time.

One way to handle missing features is to make sure it’s on the software development roadmap and make the delivery schedule part of your purchase agreement.

After you signed the purchase order your influence on the roadmap will become smaller. They will tell you it doesn’t… but it does… Urge your DCIM vendor to start a user-group. This group should be fully facilitated by the vendor and provide valuable input for the roadmap and the future of the DCIM tool. A strong user-group can be of great value to the DCIM vendor AND its customers.

Got any questions on real world DCIM ? Please post them on the Datacenter Pulse network: http://www.linkedin.com/groups?gid=841187 (members only)

Before you jump in to the DCIM hype…

You’re ready to enter the great world of DCIM software and jump right into the hype ?

Do you actually know what you need from a DCIM solution ? What are your functional requirements ?

So before you jump in, let’s take a step back and look at DataCenter Information Management from a 40,000 feet level: the datacenter facility information architecture.

Let’s start with ‘data’;

Data is all around us in the datacenter environment. It’s on the post-it notes on your desk, the dozen Excel files you manage to report and collect measurements and the collection of electrical and architectural drawings sitting in your desk drawer.

A modern day datacenter is filled with sensors connected to control systems. Some of the equipment is connected to central SCADA or BMS systems, some handle all the process control locally at the equipment. HVAC, electrical distribution and conversion systems, access control and CCTV; they all generate data streams. With the growth of datacenters in square meters and megawatts, the amount of data grows too.

The introduction of PUE and focus on energy efficiency have shown us the importance of data and especially data analysis. For most of us this has introduced even more data points, but enabled us to do better analysis of our datacenter’s performance. So; more data has enabled more efficiency and a better return on investment. Some of us could even say they entered the BigData era with datacenter facility data.

DCIM can play a role in the analysis of all this data, but it’s important to know where your data is first. Where is the current data stored ? What are the data streams within your datacenter ? What data is actually available and what data actually matters to your operation ? It’s a false assumption that all the data needs to be pulled in to a DCIM solution; that depends on your processes and your information requirements.

Process

Every datacenter has its collection of structured activities or tasks that produce a specific service or product for our internal or external customer. These are the primary processes focusing on the services your datacenter needs to provide. Examples are operations processes like Work Orders or Capacity Management.

These primary processes are assisted by supporting processes that make the core (primary) processes work and optimize them. Examples are Quality, Accounting or Recruitment processes.

Indentifying the primary and supporting processes in your datacenter enables you to optimize them by executing them in a consisted way very time and checking the output.

If you run an ISO9001 certified shop, you will definitely know what I’m talking about.

To run the processes we need information. Information is used in our processes to make decisions. The needed information can be collected and supplied by an individual or an (IT) system.

When data is collected it’s not yet information. Applying knowledge creates information from data. IT systems can assist us to create information from data, with built-in or collected knowledge.

Indentifying your datacenter processes also enables you to get a grip on the information that is needed to move the processes forward. Is this information available ? What is the quality of the information and process output ? How much time does it take to make it available ? Can this be optimized ?

DCIM solutions can assist you in creating information from data and provide information and process optimization. Most of the DCIM solutions depend on built-in knowledge on how datacenters work and operate, to facilitate this and optimize processes.

DCIM is only one of the applications used to support and optimize our datacenter processes. To support the full stack of processes we need a whole range of applications and tools. These applications can be everything from Planning to Asset Management to Customer Relationship Management (CRM) to SCADA/BMS tools.

Most of us already have some type of SCADA or BMS system running in our datacenter to control and monitor our facility infrastructure. This SCADA or BMS system will handle typical facility protocols like Modbus, BACnet or LonWorks. The programming logic used in most SCADA/BMS systems is not something found in typical DCIM solutions.

With the growing amount of sensors and their data, the SCADA/BMS system must be able to handle hundreds of measurements per minute. It must store, analyze and be able to react-on the provided data to control things like remote valves and pumps. This functionality is also typically not found in DCIM solutions. (So SCADA/BMS does not equal (!=) DCIM.)

Anyone running a production datacenter will already have a collection of applications to support their datacenter processes. You may have a ticketing system, a CRM application, MS Office application, etc.. Some times DCIM is perceived as the only tool you need to manage your datacenter but it will definitely not replace all your current tools and applications.

Model

Now that you have indentified your data, processes and current applications it’s time to focus on what you need DCIM for anyway; define your functional requirements.

One way of assisting you in this definition is creating your own datacenter facility information model.

IT architects are trained in creating information models, so if you have any walking around ask them to assist you.

Example of a model would be the one that the 451 Group created for their DCIM analysis. This is featured in the DCK Guide to Data Center Infrastructure Management (DCIM) (The model doesn’t cover the full scope for every organization, but it helps me to explain what I mean in this blog…)

The model displays functionality fields what would typically exist when operating a datacenter.

You can use a model like this to identify what functionality you currently don’t have (from a process and application perspective) and what can be optimized.

It also enables you to plot your current tools on the model and indentify gaps and overlap. In this example I have plotted one of my SCADA/BMS systems on the (slightly modified) model:

I have also plotted the DCIM need for that project:

Using models like this will give you a sense of what you actually expect from a DCIM solution and assist in creating your functional requirements for DCIM tool selection (RFP/RFI).

Integration is key

Modern day IT information management consists of collections of applications and datastores, connected for optimal information exchange. IT information and business architects have already tried the ‘one application to rule them all’ approach before and failed. Because creating information islands also doesn’t work, we need to enable applications and information stores to talk to each other.

You may have some customer information about the usage of datacenter racks in a CRM system like Salesforce. You may already have some asset information of your CRAC’s in a asset management system or maybe an procurement system. This is all interesting and relevant information for your ‘datacenter view on the world’. Connecting all the systems and datastores could get really ugly, time consuming and error-prone:

IT architects have already struggled with this some time ago when integrating general business applications. This has started things like Service-oriented architecture (SOA) , enterprise service bus (ESB) and application programming interface (API). All fancy words (and IT loves their 3 letter acronyms) for IT architectural models, to be enable applications to talk to each other.

The DCIM solution you select, needs to be able to integrate in to your current world of IT applications and datastores.

When looking at integration, you need to decide what information is authoritative and how the information will flow. Example: you may have an asset management system containing unique asset names and numbers for your large datacenter assets like pumps, CRACs and PDUs. You would want this information to be pushed out to the DCIM solution but changes in the asset names should only be possible in the asset management system. Your asset management system would then be considered authoritative for those information fields and information will only be pushed from the asset system to DCIM and not vice versa (flow).

Integration also means you don’t have to pull all the data from every available data source in to your DCIM solution. Select only the information and data that would really add value to your DCIM usage. Also be aware that integration is not the only way to aggregate data. Reporting tools (sometimes part of the DCIM solution) can collect data from multiple datasources and combine them in one nice report, without the need to duplicate information by pulling a copy in to the DCIM database.

The 451group model does an excellent job of displaying this need for integration showing the “integration and reporting” layer across all layers.

Using your own information model you can also plot integration and data sources.

Integration within the full datacenter stack (from facilities to IT) is also key for the future of datacenter efficiency like I mentioned in my “Where is the open datacenter facility API ?” blog.

So, to summarize:

Look at what data you currently have, where it is stored and how that data flows across your infrastructure.
Look at the information and functionality you need by analyzing your datacenter processes. Indentify information gaps and translate them to functional requirements.
Look at the current tools and applications ; what applications to replace with DCIM and what applications to integrate with DCIM. What are the integration requirements and what information source is authoritative ?
Create your own datacenter facility information model. Position all your current applications on the model. (If you have in-house IT (information) architects; have them assist you…)

Preparing your DCIM tool selection this way will save you from headaches and disappointment after the implementation.

In my next blog we will jump to the implementation phase of DCIM.

More resources:

BMS/SCADA vs DCIM. Can one replace another?
Mark Harris – DCIM Expert blog (warning not vendor neutral but a good resource any way..)
DCIM at the Forrester blogs
Many, many Youtube video’s on DCIM.

Full credits for the DCIM model used in this blog, go to the 451Group. Taken from the excellent DCK Guide to Data Center Infrastructure Management (DCIM) at http://www.datacenterknowledge.com/archives/2012/05/22/guide-data-center-infrastructure-management-dcim/

Disclosure: between 2006 and 2012 I have selected, bought and implemented three different DCIM solutions for the companies I worked for. At that time I was also part of either the beta-pilot group for those vendors or on the Customer Advisory Board. That doesn’t make me a DCIM expert, but it generated some insight into what is sold and what actually works and gets used.

Radio 1 besteed aandacht aan energie&datacentra

Het is geen geheim dat er (overheids) aandacht is voor de hoeveelheid energie die datacentra verbruiken. Er zijn echter ook een hoop initiatieven om dit energie verbruik in te dammen. Hierbij word dan met name gekeken naar de ‘overhead’ energie; de energie die niet naar IT gaat, maar verbruikt word voor bijvoorbeeld koeling.

De gemeente Amsterdam is een grote aanjager van deze efficiëntie slag. Sinds 2011 ligt er al een ‘Green deal’ tussen de rijksoverheid en de gemeente voor deze aanjagers rol.

In deze Green Deal staat zelfs:

Op basis van de Amsterdamse acties streeft de stad naar het vastleggen van een landelijke prestatienorm op basis van best beschikbare technieken voor nieuwe installaties op basis van best beschikbare technieken voor bestaande datacenters op best haalbare prestatie

De Amsterdamse Groenlinks wethouder Maarten van de Poelgeest zit hier boven op, want zo schrijft hij in zijn blog:

Wel is het zo dat de 36 Amsterdamse datacentra maar liefst 11% van het Amsterdamse bedrijfsverbruik voor haar rekening nemen.

Het Radio 1 programma Vara Vroege Vogels ging daarom bij de wethouder op bezoek en besprak de mogelijkheden voor energie efficiënte koeling met Jan Wiersma. Het fragment is te vinden op de site van de Vara: Datacenters:groener! en een copy hier (mp3) lokaal.

Where is the open datacenter facility API ?

For some time the Datacenter Pulse top 10 has featured an item called ‘ Converged Infrastructure Intelligence‘. The 2012 presentation mentioned:

Treat the DC infrastructure as an IT system;
– Converge in the infrastructure instrumentation and control systems
– Connect it into the IT systems for ultimate control
Standardize connections and protocols to connect components

With datacenter infrastructure becoming a more complex system and the need for better efficiency within the whole datacenter stack, the need arises to integrate layers of the stack and make them ‘talk’ to each other.

This is shown in the DCP Stack framework with the need for ‘integrated control systems’; going up from the (facility) real-estate layer to the (IT) platform layer.

So if we have the ‘integrated control systems’, what would we be able to do?

We could:

Influence behavior (can’t control what you don’t know); application developers can be given insight on their power usage when they write code for example. This is one of the needed steps for more energy efficient application programming. It will also provide more insight of the complete energy flow and more detailed measurements.
Design for lower level TIER datacenters; when failure is imminent, IT systems can be triggered to move workloads to other datacenter locations. This can be triggered by signals from the facility equipment to the IT systems.
Design close control cooling systems that trigger on real CPU and memory temperature and not on room level temperature sensors. This could eliminate hot spots and focus the cooling energy consumption on the spots where it is really needed. It could even make the cooling system aware of oncoming throttle up from IT systems.
Optimize datacenters for smart grid. The increase of sustainable power sources like wind and solar energy, increases the need for more flexibility in energy consumption. Some may think this is only the case when you introduce onsite sustainable power generation, but the energy market will be affected by the general availability of sustainable power sources also. In the end the ability to be flexible will lead to lower energy prices. Real supply and demand management in the datacenters requires integrated information and control from the facility layers and IT layers of the stack.
…

Gap between IT and facility does not only exists between IT and facility staff but also between their information systems. Closing the gap between people and systems will make the datacenter more efficient, more reliable and opens up a whole new world of possibilities.

This all leads to something that has been on my wish list for a long, long time: the datacenter facility API (Application programming interface)

I’m aware that we have BMS systems supporting open protocols like BACnet, LonWorks and Modbus, and that is great. But they are not ‘IT ready’. I know some BMS systems support integration using XML and SOAP but that is not based on a generic ‘open standard framework’ for datacenter facilities.

So what does this API need to be ?

First it needs to be an ‘open standard’ framework; publicly available and no rights restrictions for the usage of the API framework.

This will avoid vendor lock-in. History has shown us, especially in the area of SCADA and BMS systems, that our vendors come up with many great new proprietary technologies. While I understand that the development of new technology takes time and a great deal of money, locking me in to your specific system is not acceptable anymore.

A vendor proprietary system in the co-lo and wholesale facility will lead to the lock-in of co-lo customers. This is great for the co-lo datacenter owner, but not for its customer. Datacenter owners, operators and users need to be able to move between facilities and systems.

Every vendor that uses the API framework needs to use the same routines, data structures, object classes. Standardized. And yes, I used the word ‘Standardized’. So it’s a framework we all need to agree up on.

These two sentences are the big difference between what is already available and what we actually need. It should not matter if you place your IT systems in your own datacenter or with co-lo provider X, Y, Z. The API will provide the same information structure and layout anywhere…

(While it would be good to have the BMS market disrupted by open source development, having an open standard does not mean all the surrounding software needs to be open source. Open standard does not equal open source and vice versa.)

It needs to be IT ready. An IT application developer needs to be able to talk to the API just like he would to any other IT application API; so no strange facility protocols. Talk IP. Talk SOAP or better: REST. Talk something that is easy to understand and implement for the modern day application developer.

All this openness and ease of use may be scary for vendors and even end users because many SCADA and BMS systems are famous for relying on ‘security through obscurity’. All the facility specific protocols are notoriously hard to understand and program against. So if you don’t want to lose this false sense of security as a vendor; give us a ‘read only’ API. I would be very happy with only this first step…

So what information should this API be able to feed ?

Most information would be nice to have in near real time :

Temperature at rack level
Temperature outside of the building
kWh, but other energy related would be nice at rack level
warnings / alarms at rack and facility level
kWh price (can be pulled from the energy market, but that doesn’t include the full datacenter kWh price (like a PUE markup))

(all if and where applicable and available)

The information owner would need features like access control for rack level information exchange and be able to tweak the real time features; we don’t want to create unmanageable information streams; in security, volume and amount.

So what do you think the API should look like? What information exchange should it provide? And more importantly; who should lead the effort to create the framework? Or… do you believe the Physical Datacenter API framework is already here?

Good API design by Google : http://www.youtube.com/watch?v=heh4OeB9A-c&feature=gv

Datacenter & SCADA security

Vorig jaar publiceerde het Platform voor Informatie Beveiliging (PvIB) het artikel over SCADA security van mijn en Jeroen.

Hier is een copy van het complete artikel.

Vorige week bereikte mij het nieuws dat deze genomineerd was voor artikel van het jaar 2012:

Altijd leuk om zulke waardering te krijgen, maar het deed mij beseffen dat er nog veel mythes zijn rond de (on)veiligheid van SCADA systemen en BMS systemen binnen datacentra.

In de komende periode zal ik hier wat aandacht aan schenken op mijn blog.

Datacenters & SmartGrid

In de Datacenterworks van Oktober dit jaar stond een artikel over een lopend TNO onderzoek rond ‘Datacenters & smartgrid’. De focus van het onderzoek is de flexibilisering van energie afname bij middelgrote afnemers zoals koel/vries huizen en datacentra. Dit onder de toepasselijke naam ‘Flexiquest’.

Hier de PDF, vanaf pagina 7:

Labels, metrieken en hokjes voor ‘groene ICT’

Oke, laten we eerlijk zijn… de meeste mensen zijn gek op labels plakken en dingen in hokjes stoppen. Het houd de zaken overzichtelijk en zorgt er voor dat we dingen met elkaar kunnen vergelijken. Zo ook in de ICT sector en de datacenter industrie.

In de afgelopen maanden werd ik diverse keren geconfronteerd met ‘misbruik’ van metrieken <pue blog> of de creatie van nieuwe metrieken of labels. Soms oneigenlijk gebruik van bestaande methode, maar steeds vaker een poging tot introductie van nieuwe labels die vooral gefocust zijn op ‘groen’ en de gehele IT dienstverlening keten.

Binnen DatacenterPulse (DCP) hebben we de diverse lagen en onderdelen in het datacenter gevat in een praat plaat genaamd de DataCenterPulse Stack. Naast het feit dat deze de opbouw en onderlinge afhankelijkheden laat zien van de lagen, word hier ook gesproken over metrieken of labels.

Het gedeelte van “Input metrics –> Layer metrics per business sector” doelt daar op. Het verwijst naar de verschillende metrieken en labels die op de diverse lagen beschikbaar zijn.

Voorbeelden hier van zijn:

PUE, op de Physical&Real Estate laag, welke energie efficiëntie in het facilitaire deel van het datacenter in beeld brengt.
WUE, op de Real Estate laag, welke efficiënt water gebruik in beeld brengt.
SPECpower, op de Platform laag, welke energie efficiëntie voor servers in beeld brengt.
Etc…

Diverse organisaties proberen ook al enige tijd een ‘usefull work’ metriek uit te brengen. Deze moet de overhead in energie gebruik laten zien v.s. de hoeveelheid energie die verbruikt word voor het ‘werk dat er echt toe doet’.

Dit is echter lastig op te lossen aangezien ‘usefull work’ voor het ene bedrijf iets totaal anders kan betekenen dan voor het andere bedrijf. De output van IT kan niet altijd op dezelfde manier gewogen worden.

Het brengt ook het probleem met zich mee van het vangen de de alle lagen in de Stack in 1 label of berekening/metriek. Gezien de complexiteit van al deze lagen en de variabelen (kwalitatief / kwantitatief) is de vraag of dat wel haalbaar is.

Een recente discussie die ik mocht bijwonen ging over een ‘groen label voor cloud computing’. Hier mee zou dan gemakkelijk leveranciers te vergelijken zijn en kunnen bedrijven aantonen dat ze ‘groener’ worden door over te stappen naar cloud computing. Het zou dan een F tot A++ label zijn, zoals met wasmachines en koelkasten op dit moment werkt.

Ik begrijp de hang hier naar best, maar laten we deze wens eens uit elkaar trekken:

We beginnen met de definitie: wat is groen dan ? Vaak zie je dat er eigenlijk energie efficiënt word bedoeld. Echter bij groen moet men alle elementen van verbruik mee nemen. Hier in zit dus ook water gebruik en andere grondstoffen. Ook uitstoot moet eigenlijk mee genomen worden. Van totale CO2 uitstoot tot afval van server systemen.
Hoe weet ik of ik ‘groener’ word ? Dit betekend dat ik in de hele context van de vraag moet weten waar ik nu sta en dat ik dit moet kunnen vergelijken met het ICT ecosysteem van een ander.
Wat is dan de definitie van cloud computing ? Hier zijn al hele boeken en blogs over vol geschreven. Deze afkadering is nog steeds erg flexibel. Laten we voor dit argument eens zeggen dat het Software As A Service is (SAAS). Dan betekend dit dat we de hele DCP Stack in 1 label proberen samen te vatten op het onderwerp ‘groen’. Hier in zouden we dus alle soorten koeling, stroom distributie, server typen, besturingssystemen, applicatie frameworks en talen moeten wegen en in 1 label moeten vangen…

Een label op deze twee grote hypes (groen & cloud) die uit zulke complexe onderdelen bestaat… schreeuwt om misbruik door zijn eigen industrie. Zoals we binnen de datacenter industrie ook met PUE hebben gedaan.

Dit brengt ons bij het punt van creatie, acceptatie en standaardisatie van metrieken en labels. Mijn GreenGrid collega Andre Rouyer gebruikt daar altijd een mooi plaatje voor:

Deze begint bij Industry Alliances zoals de Green Grid, ICT~Office, DatacenterPulse, etc.. Dit is vaak de broedplaats voor nieuwe labels en metrieken. Zodra is voldoende markt acceptatie is, worden deze uitgewerkt door Standaardisatie organisaties. Denk hierbij aan NEN, CENELEC en ISO. Deze uitwerking leid tot een meetbaar en auditbare norm op het label of de metriek. Het zorgt voor duidelijke definities en beoordelingscriteria. Hierna worden deze normen opgenomen in (lokale) regelgeving door de diverse Overheden en kan er op gehandhaafd worden. Dit totale proces duur jaren.

Met de PUE hebben we gezien hoe dit proces kan (mis)lopen: bedacht door de GreenGrid en uitgewerkt in 2 a 3 jaar. Op dit moment ligt het op ISO niveau waar het tot een internationale standaard uitgewerkt gaat worden in de komende 2 a 3 jaar. In de tussen tijd heeft echter de overheid de PUE al opgepikt om er op te handhaven. Daarbij is er dus een belangrijke stap overgeslagen.

Het gebruik van metrieken&labels voor regulering vanuit overheid moet daar naast ook niet leiden tot de blokkade van innovatie zoals dat bij de adoptie van ASHRAE 90.1 gebeurde, waarbij het uitgebrachte label elke andere vorm van innovatieve koeling uitsloot. Als dit label vervolgens een wettelijke eis word door overheid adoptie, dan streeft deze in feiten zijn eigen doel voorbij.

Men moet dus goed nadenken over de consequenties van de introductie van metrieken en labels:

Is het wel haalbaar; probeer ik niet een te complex systeem te vangen ?
Zijn de definities van de onderdelen die ik probeer te vangen wel helder ?
Welke manipulatie laat het label toe ? (gaming the system)
Indien er adoptie plaats vind door de overheid; welke effecten zal dit hebben op je sector/industrie ?
…

Het proces daarna is zo mogelijk nog belangrijker: uitproberen en testen van het label/metriek door de markt –> veel feedback verzamelen en verwerken –> aanscherpen en verder uitwerken. Indien blijkt dat het toch niet zo’n goed idee was, dan ook niet bang zijn om weer afscheid te nemen van het idee. Pas als het label goed gerijpt en uitwerkt is, dan is het klaar voor de stap naar standaardisatie.

De roep om een label is makkelijk gedaan, maar zoals de Amerikanen zeggen ‘Be Careful What You Wish For’.