Less talk, more action on CNI resilience: White House

The US needs less talk, more action on improving the security of critical national infrastructure (CNI), a new report for the President has warned.

It suggests that private sector CEOs should be put under growing pressure to resource cyber-resilience initiatives across national infrastructure.

A focus on cyber-physical resilience

The “inevitability” of cyber-attacks meanwhile “makes it imperative to shift our focus towards building resilient systems” the President’s Council of Advisors on Science and Technology (PCAST) said on February 27.

The CISO of Google Cloud, Phil Venables and the Chief Scientific Officer of Microsoft, Eric Horvitz led a working group that delivered the paper.

Needed: Crisply defined, regularly updated performance goals

In it, they called for the creation of “a small set of crisply defined, regularly updated performance goals applicable across sectors that are biased toward leading versus lagging (retrospective) indicators of cyber-physical resilience. The aim should be to radically simplify and reduce the workload of reporting on hundreds of lagging indicators down to tens of goals that are more universally understandable and impactful.”

PCAST suggested that these (it proposed 10 "leading indicators", shared at bottom) be reported to Sector Risk Management Agencies, warning bluntly that "almost no information is currently available to indicate how an organization is preparing for future cyber-physical challenges.

"This has to change."

A "National Critical Infrastructure Observatory"

PCAST also called for a new “National Critical Infrastructure Observatory” to help “understand the weaknesses and strengths of our infrastructure… outmatch adversarial attacks and prepare for accidents and catastrophes.”

The report notes that “much of the technology that underpins cyber and cyber-physical systems was engineered without appropriate consideration of security needs” and that, currently, “responsibility for our Nation’s systems is fragmented… This manifests as challenges of prioritization, inconsistencies in authority to regulate, and often insufficient speed and breadth of response to discovered vulnerabilities or incidents.”

PCAST said the White House should direct cybersecurity agency CISA to put pressure on the private sector to invest more in cybersecurity.

Time to "increase expectations..."

As the report puts it, CISA should be tasked with increasing “the expectations that boards, CEOs, and other executives, as the owners and operators of our critical infrastructure, contribute more time and resources to ensure that infrastructure is reliable and resilient.”

“The private sector should further augment its ‘tone at the top'' with “resources in the ranks” to increase operations and activities aimed at strengthening resilience, they said. CISA should also work with local utility commissions and overseers (especially for water and electricity) to ensure that “necessary investments for cyber-physical resilience are made.”

The report is the latest indication that US policy makers are recognising that, untroubled by regulatory pressure, the private sector may not invest enough in resilience to cybersecurity challenges, nor in understanding how to respond effectively to incidents like ransomware attacks.

Notably, it highlights that “when Colonial Pipeline’s digital infrastructure was penetrated by a ransomware attack in 2021, protective responses operators performed based on fears and lack of understandings of the overall cyber-physical system halted flows on 5,500 miles of its pipelines, … disrupting fuel supplies across the East Coast.” [The Stack’s italics.]

The US must “learn aggressively from failures and close-calls” PCAST said.

“We need to continue to support and enhance public/ private sector information, threat, and vulnerability sharing through Information Sharing and Analysis Centers (ISACs), the DHS Joint Cyber Defense Collaborative (JCDC), and the National Security Agency (NSA) Cyber Collaboration Center. It is also important to have a well-functioning national incident review board to ensure that major incidents or close calls only happen once. DHS’s Cyber Safety Review Board is an excellent start but needs to be better resourced and empowered… to conduct investigations.”

More technically, they urged the creation of 1: “Sector minimum viable operating capabilities”; 2: “An expressive and informative set of leading indicators of cyber-physical resilience”; 3: “Transparency and sharing of information about minimum viable operating delivery objectives and status on achieving leading indicators, provided in a controlled context.”

The paper was published as federal agencies like CISA promise an increasingly muscular approach to highlighting poor security among software providers – and amid recognition that US agencies, left to their own devices, will also fail to properly ensure their own security. (CISA’s $7 billion CDM project, a centralised asset discovery and vulnerability management programme that now touches close to three-million endpoints across some 92 agencies is one reflection of that recognition.)

The full paper, which The Stack believes deserves a close read, is here and provides more detail on the potential agency responsibilities in building out greater oversight of cyber-physical resilience. We have pulled out its suggested indicators, for easy access, in the expandable box below.

Crisply defined performance goals: A starter for 10...

PCAST proposed 10 leading indicators as a starting point for discussion.

These are as follows:

1) Hard-Restart Recovery Time: the time to reconstitute/rebuild a system from scratch (as distinct from backup and recovery time objectives). This metric is intended to assess an organization's ability to remove circular dependencies during a restart, to assure backups can survive fully destructive shutdowns or attacks, and that software and data can be restored to service. The application of this concept will vary across different sectors.

2) Cyber-Physical Modularity: a system-wide measure, computed as the mean of the impact of single points of failure. The measure considers the additional failures and impacts that come as cascades via dependencies on each initial single-point failure. This can be captured by a summary measure of the impacts of each primary point of failure. Alternatively, the measure can be computed as the mean operational capability of the service, even if degraded, summed over all single points of failure. For example, given that each single point of failure may cause a temporary outage or reduction of quality of service, what is the consequent median tested recovery time for all single points of failure to be repaired/recovered? For how many single points of failure is the defined minimum viable operating delivery objective sustained?

3) Internet Denial / Communications Resilience: Internet denial testing: consider loss of Internet connectivity as a special notable point of failure. Explicitly test the impacts, nature of degraded service, and disruptions vs. operational continuity in the face of Internet disconnection. Some services are so critical that they should be operable safely, even in some degraded state, in the absence of network connectivity. Consider backup communication channels to diverse modes of communication in the event of Internet failure.

4) Fail-over to Manual Operations: for physically actuated systems typically controlled by cyber operational technology, what is the degree of local manual control that can sustain a minimum viable operational delivery objective when automation is lost? How frequently is manual control practiced to sustain organization muscle-memory of its use? Additionally, to what extent is there a broader primary or back-up analog “control plane” to the system or components? While digitization is inevitable and valuable, maintaining some degree of analog control may be necessary and warranted for certain highly critical systems.

5) Control Pressure Index: the extent to which defense-in-depth30 is applied by measuring how much of a critical security or resilience objective is carried by a single control (that if failed would put the whole system at risk).

6) Software Reproducibility: extent of software in a particular system that can be repeatedly and continuously built and distributed while maintaining conformance with the Office of Management and Budget’s (OMB) June 2023 Secure Software Development Framework (SSDF) requirements, including disclosing software bill of materials (SBOMs) and supply chain levels for software artifacts (SLSA) conformance levels. It is especially critical for vendors of software to critical infrastructure sectors to provide vital patches in a timely manner that will work with an infrastructure organization’s updated IT environment and for software providers to assure the continuity of the build environments throughout that software’s supported life. Critical infrastructure (e.g., hospitals, water) must be able to update legacy systems without losing additional software tools. A software reproducibility metric could be contextualized as “time to support” surrounding updates. Modern software lifecycle management practices in DevSecOps approaches are highly applicable.

7) Preventative Maintenance Levels: percentage of the overall cost of systems operations that is devoted to preventative maintenance (e.g., upgrades, security patching, reducing technical debt).

8) Inventory Completeness: extent of the universe of an organization's operations – including information technology (IT), operations technology (OT), and supply chain (to 4th party as well as 3rd party)—that is encapsulated in a validated and managed inventory or asset register.

9) Stress-Testing Vibrancy (Red Teaming): extent of systems that have been subjected to an extreme offensive, adversarial security test (possibly AI augmented), to test defenses against reliable operation (this could be against an especially constructed “cyber range” and might be achieved with “chaos engineering” principles. This should include explicit testing against multi-point attacks—where an adversary is coming after multiple points in a system with multiple tactics, potentially both physical and cyber.

10) Common Mode Failures and Dependencies: Identify organizations (and others in their supply chain) that in the event of failure would represent significant harm to a whole sector— because of the concentration they represent. As part of this, finding and eliminating circular dependencies is vital i.e., organization X depends on Y to cover and vice-versa.