Monday, October 27, 2014

3 faktor dan strategi dalam migrasi data center

3 risk factors and strategies when managing data center migrations

Your primary concern when migrating data centers is making sure services remain available. Learn how to approach this issue, as well as how to migrate hardware and data safely. 
risk.warning.872013.jpg
Data center migrations are complex operations that can be difficult to explain to executives who write the check for the migration activities and need to understand and manage the associated business operations risks. We'll take a look at some of the complexities and risks associated with migrating a data center.

Service availability

The primary purpose of the data center is to host applications that service the business. Whenever you consider migration from one data center to another, you must first consider the availability of the underlying services. These services include infrastructure application such as Active Directory and client-facing applications such as SAP.
As services shift from one data center to another, you must form a strategy that factors in when specific services get moved and the interdependency of applications on one another. A common approach to addressing service availability is to develop move groups and then place interdependent applications in common groupings.
For services that support most enterprise applications such as Active Directory and DNS, a common approach is to extend these core services across the data center. The services remain in both data centers until the migration is complete.

Hardware migration

There are two strategies to migrating physical servers: one is known as "lift and shift" and the other is data replication. In a lift and shift strategy, the hardware is put on a moving truck and installed in the new data center. The system is backed up prior to relocation, but there are risks associated with this strategy.
One of the largest risks is damage to the physical hardware during shipment; damage during shipment can render backups useless. Another challenge is the physical distance between data centers may not permit this option and have services available within an acceptable period.
The second strategy is to perform data migration over a leased circuit. With a leased circuit comes two sub-swing hardware options. One option is to perform a physical to physical (P2P) migration. A P2P migration involves acquiring like hardware that both the application and hardware can be migrated to while keeping downtime to a minimum.
The other hardware migration option is a physical to virtual (P2V) conversion. A P2V involves converting a physical machine to a virtual machine over the leased line. P2Vs serve two purposes: the first is to migrate a workload from one data center to another while keeping hardware costs to a minimum; the second is to undertake a data center transformation by moving to a virtual platform. P2V migrations are popular options, as many engineers are already accustomed to performing these conversions as part of previous data center projects.

Data migration

Getting application data from one site to another may be one of the most complex parts of a data center migration. A simple option would be to perform a tape- or hard drive-based backup and perform a restore; however, similar to a lift and shift migration, backup and restore provides limited capability for restoring services in a timely fashion. Also, backup and restore isn't an optimal method for data migration -- it's better suited for disaster recovery where data recovery options are limited.
The primary method chosen for most data migrations is the provisioning of a leased line. With a dedicated connection between data centers, a migration team can leverage hardware- or software-based synchronization to perform the data migration. Along with the ability to migrate data, this method can be leveraged to perform P2P, P2V, and virtual to virtual (V2V) migrations.
Many organizations choose to have multiple connections between data centers. Connectivity involves a minimum of two circuits: One connection supports regular end user and data center to data center traffic to support applications such as Active Directory and application to application traffic, and a second and normally faster connection is used to perform data synchronization. The dual connections keep the two very different traffic types from interfering with one another.

Conclusion

A successful data center migration strategy begins with identifying the business around availability prior to any migration activity. Technical solutions fall into simple categories after requirements are gathered.

Monday, October 20, 2014

Memeriksa APAR anda.

Pengecekan APAR

 Alat pemadam api ringan atau sering disebut dengan APAR, adalah alat yang dibeli namun diharapkan tidak pernah digunakan sama sekali. Mengapa ? karena apabila APAR digunakan berarti telah terjadi kebakaran, dan telah terjadi kegagalan dalam program pencegahan kebakaran di tempat kerja.
Meskipun APAR tidak pernah diharapkan untuk digunakan, namun kondisi APAR harus selalu dalam kondisi “siap” digunakan kapan saja. Untuk itu perlu dilakukan pengecekan rutin (bisa 1, 3 atau 6 bulan sekali). lalu apakah yang harus kita cek ketika memeriksa APAR dan menentukan apakah APAR masih layak dan “siap” digunakan.
1. Cek label pengisian ulang APAR, kapankah APAR terakhir kali di isi ulang.
2. Cek tekanan (pressure gauge) dari APAR, apakah masih menunjukan posisi hijau.
3. Cek Safety Pin, apakah masih terpasang dengan benar.
4. Cek Handle apakah ada kerusakan sehingga tidak dapat digunakan.
5. Cek selang (nozzle) apakah terdapat kebocoran atau tekukan, sehingga tidak bisa digunakan.
6. Untuk APAR Dry chemical, angkat APAR kemudian balikan dan dengarkan apakah terdengar suara dry chemical terjatuh (seperti suara pasir jatuh) ketika APAR dibalikan.
6. Isi kartu periksa APAR dan gantungkan pada APAR tersebut.
Pengujian APAR juga dapat dilakukan minimal 1 tahun sekali secara random, biasanya pengujian dilakukan bersamaan dengan latihan pemadaman kebakaran (fire drill). Sebaiknya APAR di isi ulang 1-2 tahun sekali. Meskipun  dari pengalaman lapangan diketahui bahwa APAR dry chemical dengan usia 5-7 tahun masih berfungsi dengan baik. tapi tak ada salahnya mempersiapkan yang terbaik untuk kondisi yang terburuk.

Sunday, October 19, 2014

CARA PEMERIKSAAN ALAT PEMADAM API

CARA PEMERIKSAAN ALAT PEMADAM API

PEMERIKSAAN ALAT PEMADAM API RINGAN / BERODA
Setiap APAR wajib diperiksa setidaknya setiap 6 (enam ) bulan.
Pemeriksaan berdasarkan NFPA 10:
1. Alat pemadam api berada di tempat yang ditentukan
2. Alat pemadam api tidak terhalang atau tersembunyi
3. Alat pemadam api disetel/ diatur sesuai dengan NFPA standard no. 10 (portable fire extinguisher)
4. Pressure Gauge/indikator tekanan menunjukkan tekanan yang cukup
5. Menimbang bobot APAR (terutama unit APAR CO2 untuk identifikasi ada tidaknya kebocoran)
6.Pin dan seal terdapat di tempatnya, tidak terdapat kerusakan
7. Alat pemadam api tidak menunjukkan adanya gejala kerusakan/gangguan
8. Nozzle bebas dari sumbatan
9. Terdapat petunjuk penggunaan  dan label pada alat pemadam api
10.Roda dapat berputar untuk unit alat pemadam api beroda
Problem yang umum terjadi pada alat pemadam api ringan:
-Kerusakan/keausan pada bagian parts dari alat pemadam api
-Turunnya tekanan pada indikator meter tekanan/pressure gauge
-Berkurangnya bobot
-Kebocoran pada cylinder atau valve
-Kerusakan pada meter tekanan indikator akibat overpressure
-Korosi pada cylinder
-Perubahan bentuk cylinder

Saturday, October 18, 2014

Kegagalan, Problem dan Transparansi di Data Center



The lack of transparency can be seen as a root cause of outages and incidents
By Jason Weckworth
I recently began a keynote speech at Uptime Institute Symposium 2013 by making a bold statement. As data center operators, we simply don’t share enough of our critical facilities incidents with each other. Yet Uptime Institute maintains an entire membership organization called the Uptime Institute Network that is dedicated to providing owners and operators the ability to share experiences and best practices by facilitating a rich information nexus exchange between members and the Uptime Institute.
Why isn’t every major data center provider already a member of this Network? Why don’t we share more experiences with each other? Why are we reluctant to share details of our incidents? Of course, we love to talk about each other’s incidents as though our competitors were hit with a plague while we remain immune from any potential for disaster. Our industry remains very secretive about sharing incidents and details of incidents, yet we can all learn so much from each other!
I suppose we’re reluctant to share details with each other because of fear that the information could be used against us in future sales opportunities, but I’ve learned that customers and prospects alike expect data center incidents. Clients are far less concerned about the fact that we will have incidents than about how we actually manage them. So let’s raise the bar as operators. Let’s acknowledge the fact that we need to be better as an industry. We are an insurance policy for every type of critical IT business. None of us can afford to take our customers off-line… period. And when someone does, it hurts our entire industry.
Incidents Every Day
An incident does not mean a data center outage. These terms are often confused, misinterpreted or exaggerated. An outage represents a complete loss of power or cooling to a customer’s rack or power supply. It means loss of both cords, if dual fed, or loss of single cord, if utilizing a transfer switch either upstream or at the rack level.
Equipment and systems will break, period. Expect it. Some will be minor like a faulty vibration switch. Others will be major like an underground cable fault that takes out an entire megawatt of live UPS load (which happened to us). Neither of these types of incidents are an outage at the rack level, yet either of them could result in one if not mitigated properly.
Data Center Risk
Do not ask whether any particular data center has failures. Ask what they do when they have a failure. There is a lot of public data available on the causes of data center outages and incidents, but I particularly like a data set published by Emerson (see Figure 1) because it highlights the fact that most data center incidents are caused by human and mechanical failures, not weather or natural disasters. Uptime Institute Network data provide similar results. This means that, as operators, we play a major role with facility management.
Figure 1. Data provided by Emerson shows that human and mechanical failures cause the vast majority of unplanned outages in data centers.
Figure 1. Data provided by Emerson shows that human and mechanical failures cause the vast majority of unplanned outages in data centers.
I have been involved with every data center incident at RagingWire since its opening in 2001. By the end of this year, that will equate to almost 100 megawatt (MW) of generating capacity and 50 MW of critical IT UPS power. I must admit that data center incidents are not pleasant experiences. But learning from them makes us better as a company, and sharing the lessons make us better as an industry.
In 2006, RagingWire experienced one particularly bad incident caused by a defective main breaker that resulted in a complete outage. I distinctly recall sitting in front of the Board of Directors and Executives at 2:00 AM trying to explain what we knew up to that point in time. But we didn’t yet have a root-cause analysis. One of the chief executives looked at me across the table and said, “Jason, we love you and understand the incredible efforts that you and your teams have put forth. You haven’t slept in two days. We know we are stable at the current time, but we don’t yet have an answer for the root cause of the failure, and we have enterprise Fortune 500 companies that are relying on us to give them an answer immediately as our entire business is at risk. So make no mistake. There will be a fall guy, and it’s going to be you if we don’t have the answer to prove this will never happen again. You have four hours, or you and all your engineers are fired!” Fortunately, I’m still at RagingWire, so the story ended well. We used the experience to completely modify our design from N+1 to 2N+2 infrastructure, so that we would never again experience this type of failure. But, I never forgot this idea of our natural tendency to assign blame. It’s hard to fight this cultural phenomenon because there is so much at stake for our operators, but I believe that it is far more important to look beyond blame. Frankly, it doesn’t matter in the immediate aftermath of an incident. Priority #1 is to get operations back to 100%. How does a data center recover? How do you know when your business will be fully protected?
Data centers fail. It is important to understand the root cause and to make sure to fix the vulnerability.
Data centers fail. It is important to understand the root cause and to make sure to fix the vulnerability.
Life Cycle of an Outage
Now I don’t mean to offend anyone, but I do make fun of the communication life cycle that we all go through with every major incident. Of course, we know this is a very serious business, and we live and breathe data center 24 hours per day. But sometimes we need to take a break from the insanity and realize that we’re all in this together. So, here is what I consider to be the communication life cycle of our customers. And often, it’s the same response we see from the Executive Teams!
Stage 1: SURPRISE. Are you kidding? This just happened at your data center? I didn’t think this could happen here! Are my servers down?!?
Stage 2: FEAR. What is really happening? Is it worse than you are telling me? What else is at risk? What aren’t you telling me? Are you going to experience a cascading failure and take down my entire environment?
Stage 3: ANGER. How could this possibly happen? I pay you thousands (or millions) of dollars every month to ensure my uptime! You’re going to pay for this! This is a violation of my SLA.
Stage 4: INTERROGATION. Why did this happen? I want to know every detail. You need to come to my office to explain what happened, lessons learned and why it will never happen again! Where is your incident report? Who did what? And why did it take you so long to notify my staff? Why didn’t you call me directly, before I heard about it from someone else?
Of course, the “not so funny” part of this life cycle is that in reality, all of these reactions are valid. We need to be prepared to address every response, and we truly need to become better operators with every incident.
Operational Priorities
After 12 years of industry growth, many phased expansions into existing infrastructure and major design modifications to increase reliability, I have found that the majority of my time and effort always remains our risk management of the data center as it relates to uptime. Data centers are not created equal. There are many different design strategies and acceptable levels of risk, such as N+1, 2N, 2N+2, etc. However, our common goal as operators is to mitigate risk, address incidents quickly and thoroughly, and return the facility to its original, normal condition, with full redundancy.
The following eight areas represent what I consider to be the most important factors contributing to a data center’s ability to deliver a very high level of operational excellence:
• Staffing. Experience matters. I love to hire Navy nuclear technicians, because they are so disciplined. But, my favorite interview question more than any other, is to ask potential candidates about the incidents they have experienced in the data center. I want people who have lived through the fire. I want to know how they acted under pressure, what went wrong and what surprises they faced. The more difficult the incidents were, the more I appreciate them! There is no substitute for experience, and there is no way to gain experience other than one incident at a time. I also believe that it’s important to have 24×7 technical staff at the data center. Even with sophisticated control systems, there are many, many instances that require human intervention and on-site analysis/decision making, usually within minutes.
• Training. Do you train by process or by trial? It’s important to do both. How do you practice casualty control drills? I like to involve the entire operations staff when we commission a new phase, because we can throw in unexpected failures and test the staff without live load. I also like to thoroughly review every incident with the entire staff during a weekly training meeting, so that they can learn from real-world experiences. Training assessment should be part of every technician’s annual review, with merit given for mastering various areas of the data center operation.
• Resources. I personally prefer in-house expertise and first-level response, but only because design and construction are core in-house disciplines with self-performed labor. During a recent underground cable fault that lost an entire one megawatt UPS feeder, all the loads transferred to alternate UPS sources under a distributed redundancy topology, but the fault created a heat sink with two additional megawatts at risk that we couldn’t cool. With a literal time bomb on our hands and wire temperatures approaching 200°F, we engaged almost 40 in-house electricians to work 24-hours straight in order to run a new overhead feeder and commission it in order to vacate the underground duct bank and cool the environment.
Of course, staffing doesn’t need to be in-house. But it’s important to have immediate access and key contacts at any moment’s notice. This can be suppliers, service contractors or engineers. I have particularly found that key equipment factory engineers have a wealth of knowledge–if you can gain access to them during an incident.
• Incident Reporting. Incident reporting should be the lifeblood of every data center operator. What kind of visibility do they have? Are they reviewed and approved by Operations, Engineering and Executive staff? I personally review, comment on and approve every incident within the data center. Do you share your incident reports with customers? Some operators may prefer to provide a Summary Report, but we should always be willing to share the entire report to any customer that requests one. Another important detail is follow-up. We tend to be very good at documenting initial incidents, but we struggled with all the follow-up as it related to engineering, vendor support and further testing. If your technicians are always putting out fires, it’s difficult to stay focused on the follow-up. For this reason, we initiated a separated SaaS application called FrontRange that allows us to assign tasks with timed escalation for every follow-up item.
• Escalation. Every incident management protocol needs clear escalation channels. What is critical vs. minor? How detailed is your notification and escalation policy? Is re-training frequent due to non-compliance with escalation procedures? Do you have an escalation process that automatically includes engineering staff? Do you include key vendors within your internal escalation process? Do you have automatic dialing so that you can reach multiple sources within minutes with bridge lines? Do you have an incident manager separate from a communication manager? How fast can your staff mobilize, and do they know when to escalate? Is everyone trained regularly?
• Communication. What is your protocol for controlled dissemination of information? Do you have a dedicated communications manager? This may be part of your NOC staff or a dedicated operations staff member with technical knowledge. How will clients be notified during an incident? Do you allow clients direct visibility to equipment status? Do you set up bridge calls with automatic dialers to affected customers for direct communication during events? Do you have a timed protocol to deliver incident reports with 24 or 48 hours? Do you have a communication protocol for your executive team or account managers so they can also contact your customers or at least have knowledge of what is happening? Sometimes it is just as important to communicate with your internal staff as it is your customers.
• Back-up Plans. Can you provide examples of when the unexpected happened? What are your contingency plans? The data center design has back-up redundancy, but what about operational back up with staffing, suppliers, engineering resources and spare parts? You need basic support like food, clothes and sleep. We’ve experienced needs to keep qualified supervisors or directors on-site just to help with MOP-writing or operational commissioning after a break-fix, yet often these staff members can be completely exhausted.
• Top 10 Incidents. One of the most challenging sales support meetings I ever attended was for a Fortune 100 company. The CIO explained that they had never used colocation outsourcing in the past, and they were particularly concerned with our ability to handle incidents efficiently and communicate clearly to their teams exactly what was happening on a real-time basis. Of course, I am proud of our process and procedures around incident management, and I quickly described many of the ideas that I have touched on within this article. Then he surprised me. He asked me if I could name the Top 10 incidents we’ve had in our data center, what the root causes were, and what engineering changes or process changes we made as a result. I quickly responded by saying “yes, I am pretty sure I know most of these.” So he stated that we would like to know on the spot, because that knowledge off the top of my head from an executive staff member would clearly demonstrate that these issues are important and top of mind to everyone in operations. We spent the next 2 hours talking through each incident as best as I could remember. I must admit that although I named ten incidents, they probably weren’t the top ten over the past ten years. And it was an incredibly stressful meeting for me! But it was an awesome teaching moment for me and my staff.
Conclusion: Our Industry Can Be Better
We all need to know the incidents that shape our data center. We need to qualify improvements to the data center through our incident management process, and we need to be able to share these experiences with both our staff and our customers and with other operators.
Placing blame takes a back seat to identifying root causes and preventing recurrence.
Placing blame takes a back seat to identifying root causes and preventing recurrence.
I encourage every operator or data center provider to join the Uptime Institute Network. This is not a sales or vendor organization. It’s an operator’s organization with a commitment from all members to protect information of other members. Here are just a few of the benefits of the Network:
• Learning experiences from over 5,000 collected incidents from members
• Abnormal incident trending that allow members to focus resources on areas most likely to cause downtime
• “Flash” reports or early warning bulletins that could impact other member’s facilities
• Cost savings resulting from validation of equipment purchases or alternative sourcing options
• Tours of Network member data centers with the opportunity to apply ideas on best practices or improvements
• Access to member presentations from technical conferences
G weckworth 3
Sharing information can help find new ways of preventing old problems from causing issues.
I am also hoping to challenge our industry. If we can become more transparent as operators within a common industry, we will all become better. Our designs and technology may be different, but we still share a very common thread. Our common goal remains the confidence of our industry through uptime of our facilities.

weckworthJason Weckworth is senior vice president and COO, RagingWire Data Centers. He has executive responsibility for critical facilities design and development, critical facilities operations, construction, quality assurance, client services, infrastructure service delivery and physical security. Mr. Weckworth brings 25 years of data center operations and construction expertise to the data center industry. Previous to joining RagingWire, he was owner and CEO of Weckworth Construction Company, which focused on the design and construction of highly reliable data center infrastructure by self-performing all electrical work for operational best practices. Mr. Weckworth holds a bachelor’s degree in Business Administration from the California State University, Sacramento.

Apakah Uptime Tier Classification System ?





Explaining the Uptime Institute’s Tier Classification System

An abbreviated version of this column was written for Data Center Knowledge in response to an interview with AFCOM Denver Chapter President Hector Diaz, on September 11, 2014.
Uptime Institute’s Tier Classification System for data centers is approaching the two decade mark. Since its creation in the mid-1990s, the system has evolved from a shared industry terminology into the global standard for third-party validation of data center critical infrastructure.
Over the years, some industry pundits have expressed frustration with the Tier System for being confusing. In many cases these writers have misrepresented the purpose and purview of the program.
Invariably, these authors and interview subjects have never been involved with a Tier Certification project. Typically, the commentator’s understanding of the Tiers is entirely secondhand and ten years out of date.
Anyone in the industry who knew our late founder Ken Brill knows the Institute doesn’t shy away from rigorous debate. And we happily engage in substantive discussions about the Tiers program with clients and interested parties. Unfortunately, many of the public commentators vaguely naysaying about the Tiers are so grossly uninformed that debate isn’t possible.
I would like to take this opportunity to explain what the Tiers look like today, illustrate how Tier Certification works, list some companies that have invested in Tier Certification and offer Uptime Institute’s vision for the future.
What are the Tiers? 
Uptime Institute created the standard Tier Classification System to consistently evaluate various data center facilities in terms of potential site infrastructure performance, or uptime. The below is a summary and please see Tier Standard: Topology and accompanying Accredited Tier Designer Technical Papers.
The Tiers (I-IV) are progressive; each Tier incorporates the requirements of all the lower Tiers.
Tier I: Basic Capacity A Tier I data center provides dedicated site infrastructure to support information technology beyond an office setting. Tier I infrastructure includes a dedicated space for IT systems; an uninterruptible power supply (UPS) to filter power spikes, sags, and momentary outages; dedicated cooling equipment that won’t get shut down at the end of normal office hours; and an engine generator to protect IT functions from extended power outages.
Tier II: Redundant Capacity Components Tier II facilities include redundant critical power and cooling components to provide select maintenance opportunities and an increased margin of safety against IT process disruptions that would result from site infrastructure equipment failures. The redundant components include power and cooling equipment such as UPS modules, chillers or pumps, and engine generators.
Tier III: Concurrently Maintainable A Tier III data center requires no shutdowns for equipment replacement and maintenance. A redundant delivery path for power and cooling is added to the redundant critical components of Tier II so that each and every component needed to support the IT processing environment can be shut down and maintained without impact on the IT operation.
Tier IV: Fault Tolerance Tier IV site infrastructure builds on Tier III, adding the concept of Fault Tolerance to the site infrastructure topology. Fault Tolerance means that when individual equipment failures or distribution path interruptions occur, the effects of the events are stopped short of the IT operations.
Data center infrastructure costs and operational complexities increase with Tier Level, and it is up to the data center owner to determine the Tier Level that fits his or her business’s need. A Tier IV solution is not “better” than a Tier II solution. The data center infrastructure needs to match the business application, otherwise companies can overinvest or take on too much risk.
Uptime Institute recognizes that many data center designs are custom endeavors, with complex design elements and multiple technology choices. As such, the Tier Classification System does not prescribe specific technology or design criteria beyond those stated above. It is up to the data center owner to meet those criteria in a method that fits his or her infrastructure goals.
Uptime Institute removed reference to “expected downtime per year” from the Tier Standard in 2009. The current Tier Standard does not assign availability predictions to Tier Levels. This change was due to a maturation of the industry, and understanding that operations behaviors can have a larger impact on site availability than the physical infrastructure.
If the Tier Classification system still seems unclear at this point, please take a deep breath and re-read the section above. If you’re not feeling too confused, let’s move on…
Tier Certification
Now that we have a clear understanding of the Tier Standard, let’s discuss Certification.
The Tier Certification process typically starts with a company deploying new data center capacity. The data center owner defines a need to achieve a specific Tier Level to match a business demand.
Data center owners turn to Uptime Institute for an unbiased, vendor neutral benchmarking system, to ensure that data center designers, contractors and service providers are delivering against their requirements and expectations.
Tier Certification is a performance based evaluation of a data center’s specific infrastructure, and not a checklist or cookbook. Uptime Institute is the only organization permitted to Certify data centers against the Tier Classification System. Uptime Institute does not design, build or operate data centers. Our only role is to evaluate site infrastructure, operations and strategy.
The first step in a Tier Certification process is a Tier Certification of Design Documents (TCDD). Uptime Institute Consultants review 100% of the design documents, ensuring each subsystem among electrical, mechanical, monitoring, and automation meet the fundamental concepts and there are no weak links in the chain. Uptime Institute then provides a report to the owner with the Tier deficiencies. Uptime Institute conducts a compliance review of the revised drawings, and then awards a TCDD letter and foil if the design meets the criteria.
Uptime Institute has conducted over 400 TCDDs, reviewing the most sophisticated data center designs from around the world. As you might imagine, we’ve learned a few things from that process. One of the lessons is that some companies would achieve a TCDD, and walk away from following through on Facility Certification for any number of reasons. Some organizations were willfully misrepresenting the Tier Certification, using a design foil to market a site that was not physically tested to that standard.
The TCDD was never supposed to be a final stage in a certification process, but rather a checkpoint for companies to demonstrate that the first portion of the capital project met requirements. Uptime Institute found that stranded Design Certifications were detrimental to the integrity of the Tier Certification program. In response, Uptime Institute has implemented an expiration date on TCDDs. All Tier Certification of Design Documents awards issued after 1 January 2014 will expire two years after the award date.
Data center owners use the Tier Certification process to hold the project teams accountable, and to ensure that the site performs as it was designed. Which brings us to the next phase in a Tier Certification process: Tier Certification of Constructed Facility (TCCF).
During a TCCF, a team of Uptime Institute consultants conducts a site visit, identifying discrepancies between the design drawings and installed equipment. Our consultants observe tests and demonstrations to prove Tier compliance. Fundamentally, this is the value of the Tier Certification, finding these blind spots and weak points in the chain. When the data center owner addresses the deficiencies, Uptime Institute awards the TCCF letter, foil and plaque.
Tier Certification Clients 
Does the industry find value in this process? The clearest proof is the list of companies investing in Tier Certification. It is easy to claim Tier compliance and a wholly different matter to lay your solution open to a rigorous review by Uptime Institute. There are more Certifications underway at this moment than at any other point in the 20-year history of the Tiers.
Look at adoption among the telecommunications companies, colocation providers and data center developers: Digital Realty, Compass Data Centers, CenturyLink, and Switch. We have been pleased to impress each and every one of those companies with our dedication to quality and thoroughness, because we understand all that is on the line for them and their clients.
As the IT industry moves further into the cloud and IaaS mode of IT service delivery, the end user has less control over the data center infrastructure than ever before. Tiers and Operational Sustainability provide third-party assurance, on a comprehensive level, that the underlying data center infrastructure is designed and operated to the customer’s performance requirements.
Increasingly, enterprise companies are stipulating Tier Certification in RFPs to data center service providers. If you want to be competitive, unsubstantiated marketing claims are not sufficient.
Beyond Tiers: Operations 
As mentioned previously, Uptime Institute recognizes the huge role operations plays in keeping data center services available. To that end, Uptime Institute developed a data center facilities management guideline in 2010 (Tier Standard: Operational Sustainability) and certifies data center operations. This is a site-specific scorecard and benchmarking of a facilities management team’s processes, with an on-site visit and detailed report.
For companies with existing sites, or for whatever reason have not chosen to certify data center facilities against Tiers, the operations team can be certified under the Management & Operations (M&O) Stamp of Approval.
For the purposes of an M&O Stamp of Approval, the client and Uptime Institute work together to assess the selected site(s) against the M&O criteria. The criteria was drawn from Uptime Institute’s Tier Standard: Operational Sustainability, and then was vetted through a Coalition composed of key stakeholders in the enterprise owner, outsourced operations, and multi-tenant industry segments. This was to verify M&O’s compatibility with a variety of management solutions and across multiple computing environments.
The key areas reviewed, observed, and validated include:
-Staffing and Organization (on-staffing levels, qualifications, and skill mix)
-Training and Professional Development Assessment
-Preventative Maintenance Program and Processes
-Operating Conditions and Housekeeping
-Planning, Management, and Coordination practices and resources.
Please refer to Tier Standard: Operational Sustainability for full criteria.
By covering these essential areas, a management team can operate a site to its full uptime potential, obtain maximum leverage of the installed infrastructure/design and improve the efficacy of operations,.
The Path Forward? 
In addition to the certifications listed above, Uptime Institute is delivering and developing further services for the IT industry around corporate governance and IT resource efficiency. As we bring those services to market, we will commit to being more present in the public forum.
With further education in the market, we hope to engage in substantive debates about our processes and approach, rather than defending claims from individuals with incorrect or incomplete knowledge of the Tiers program.
Fundamentally, it is our responsibility to better explain our approach and intellectual property. We owe it to our hundreds of clients who have invested in Tiers Certification.
Bio: Matt Stansberry has researched the convergence of technology, facility management and energy issues in the data center for over a decade. Since January 2011, he is Director of Content and Publications at Uptime Institute.