Key DCIM Functionality Considerations
This is the second entry in a Data Center Frontier series that explores the ins and outs of data infrastructure management, and how to tell whether your company should adopt a DCIM system. This series, compiled in a complete Guide, also covers implementation and training, and moving beyond the physical aspects of a facility.
The following are key DCIM functionality considerations to take into account when choosing a system for your business or customers.
Energy Efficiency Monitoring
The ecosystem of the data center has many potential points to monitor. While it would be ideal to monitor everything, cost and value becomes part of the decision process. The focal point of what will be monitored is typically related to which stakeholders or departments are driving the project. It also depends on the age of the data center and how much or little monitoring is already in place. From the facility side, the basic PUE information can be derived by instrumenting only 2 points in the power chain; the utility input energy and the output energy of the UPS (IT energy).
While PUE was originally based on power (kW) draw, which is an instantaneous measurement. In 2011 PUE was updated to be calculated based on annualized energy (kWh measured or averaged over 12 months of operation). This reflects a more accurate picture of the yearly performance, rather than spot power measures which will vary widely depending on when they were taken. As can be seen by the figure below, this requires energy metering at the utility input, as well as the 3 possible points of IT energy measurement, beginning at the output of the UPS (PUE category 1).
In 2011 PUE was updated to be calculated based on annualized energy (kWh measured or averaged over 12 months of operation). This reflects a more accurate picture of the yearly performance.
From an IT perspective, there are also many advantages to monitor power distribution downstream from the UPS, such as at the floor level PDUs (PUE2), or at the rack (PUE3), including identifying cascading failures and PDU overloads. However, this is also where the age of the data center infrastructure becomes a factor.
In newer data centers, floor level PDUs typically have branch circuit monitoring, which can be remotely polled by DCIM (or BMS). Many older data centers do not have this functionally in the floor level PDU. This leaves two options; retrofit branch circuit monitoring or utilize so called “intelligent” rack level PDUs (power strips). The first option, to retrofit the PDUs, falls under the jurisdiction of the facilities department and can be difficult and disruptive and in some cases, may require a power shutdown.
The second option has long been the more popular option, typically driven and deployed by the IT group. In many cases these rack PDUs have Ethernet connectivity and can be easily polled by DCIM systems. In other cases, there may only be lower cost, locally metered rack PDUs or simple “power strips”, neither of which have any remote connectivity. This leaves the option of replacing these with intelligent PDUs or installing “in-line” power monitoring with Ethernet connectivity, which can be polled by the DCIM platform.
Power Distribution Monitoring
Hidden Exposure of Cascade Failure
Many of the more basic power monitoring functions may already be done to one degree or another by some BMS systems. However, in many older data centers, there is no branch circuit monitoring installed, resulting in the need for periodic manual branch circuit surveys (by electricians with clamp-on ammeters and a clipboard). This is typically done to try to avoid circuit breaker overloads or perhaps as a rudimentary form of power capacity planning to see if or how much IT equipment could be added to a cabinet. Even in a relatively well-organized and managed data center, this information may not be readily available, communicated or cohesively correlated within and between the facilities, operations and IT departments. The lack of real-time power and energy monitoring at the rack can delay or disrupt a technical refresh or, worse yet, expose the rack to failure if the branch circuit protection trips when more or new IT equipment is installed.
This hidden exposure can be seen in the figure below, which depicts a potential scenario wherein the typical manual “clamp-on” ammeter is used to measure (A-B) redundant branch circuits to a rack at one point in time, while the plot lines show continuous current measurements over time for the A and B circuits, as well as the sum (A+B) of both.
In the figure below, at the time the manual readings were taken, it would seem as if the total current drawn across (A and B) circuits were only 14 amps (7A+7A). However, the continuous current plot over time shows that at multiple times during the day, the sum of the (A+B) circuits actually exceeded the 16 amp (80%) threshold (which is the maximum current that should be safely drawn from a 20 amp branch circuit, per US National Electrical Code “NEC”). While under normal circumstances, when both circuits are active, there would be no problem in the example below. However, should a problem occur, such as the loss of either one of the branch circuits (either accidentally or during a maintenance procedure), the remaining active circuit could trip during the peak current excursions, since it would now be carrying the entire load (slightly above 18 amps). This represents a lurking exposure to cascade failure.
As can be seen by the example above, these peaks would be very difficult to discover, even with regular manual survey snapshot readings. This exposure to cascade failure of redundant power paths can only be revealed by continuous monitoring and recording of current on each branch circuit (A-B) and then setting threshold alarms when the sum exceeds the prescribed limits. DCIM can help monitor and manage these thresholds and alerts, minimizing these potential cascading failures.
High-Power Rack PDU Overloads
While the figure above illustrates the hidden exposure of manual branch current surveys, there is also another concealed risk contained in high-power rack PDUs (which contain multiple circuit breakers to prevent grouped outlet banks from overloading).
While many data centers may take regular weekly or monthly readings of rack power draw, intermittent short term peak current draws and potential exposure to branch circuit overloads will not be detected. Without knowing how much current is being drawn in real-time and trended continuously, just adding a single server would be like playing “Russian Roulette,” since it could result in a tripped circuit breaker.
The ability for DCIM to provide continuous real-time power monitoring and detect and display these peak power conditions can help the mitigate risk of an outage.
sumber: https://datacenterfrontier.com/key-dcim-funtionality-considerations/
Comments
Post a Comment