III. Statistical Prediction and Classification of Electronic Network-Disease
3.2 Background & Related Works
3.2.1 Multi-factor Authentication Framework Overview
Availability of effective treatment responses infectious RF carriers who test positive (e.g. evidence of infection is present in a specified CubeSat’s received authentication log files)
4
The diagnostic test is not harmful to an authentication receiver nor cause unnecessary modifications of the incoming RF-Event’s physical RF characteristics.
5
The diagnostic test should be accurate in classification of benign vs. infectious RF-Events according to some policy-based threshold(s).
Figure 9. Multi-Factor Authentication Framework
3.2.1 Multi-factor Authentication Framework Overview
• 3.2.1.1 Network-disease Specification
A network abnormality may be attributed to some known or unknown cause. When the cause of a specified abnormality is suspicious of originating from unauthorized or malicious activity such as a cyberattack, its occurrence can be classified as a symptom of realization of disease. There may be several abnormalities which contribute to observable network-disease outcomes.
62
A specific statement of abnormal network behavior such as the loss of resource availability, caused by a successful DDoS cyberattack provides clarification for strategic targeting, planning, and mitigation of a specific network-disease outcome. Moreover, a prevention strategy may specify those electronic transmission states that are authorized and unauthorized to assist in network-disease defense and mitigation.
• 3.2.1.2 Policy Specification
After network-disease specification and vulnerability assessment, a user’s policy may dictate the flow of information between electronic transmission devices for increased security control. Policy specifies the desired communication paths which originate from trusted electronic devices in authorized transmission states. In addition, naming convention, targeted RF fingerprint ROIs and RF-measurement criteria should be carefully considered. The policy should also indicate the type of electronic receiver that will be employed for demodulation and ultimate authentication of received RF transmission events. Policy should state requirements for interoperability, standardization and invariant field selection. Each of these decisions will guide the RF signature collections process. Finally, levels of acceptance for fingerprint similarity should describe if additional testing is required when a test result is uncertain.
• 3.2.1.3 RF Signature Benchmarking
RF benchmarking provides trusted RF signatures for diagnostic comparison of new RF-Event claiming to originate from a known fixed transmission source. An authenticating device may possess local or reach-back RF diagnostic capability. When a local device is trained for self-evident authentication of a received RF-Event, the device contains a trusted RF-signature template within its local memory and can conduct the benchmark similarity test while conducting normal communication operations. The memory location of the processor is assumed to be secured for normal operations using RF fingerprints [57].
63
Such a device trains for self-evident authentication using device-specific observations of an authorized RF-Event transmission from a specified source. As the main characteristics of the RF-Event are collected, additional statistics may be considered if useful.
During a diagnostic test, policy acceptance or rejection thresholds are used by the authentication device to provide a final test estimation of the RF-Events condition as either benign or infectious for causing network-disease. RF signature collection provides an initial first step towards developing a useful network diagnostic test benchmark. The aim is to collect a set of RF signatures, usable as templates for integration as a network treatment response in a comprehensive and wellness plan.
• 3.2.1.4 RF-Biomarker Candidate Selection
Following the collection of RF signature benchmarks, the screening of the most useful RF-measurements is done using statistical and objective analysis. A composite feature-set contains all RF-measurements and statistics of characteristic distributions, however they may not provide useful discrimination information for electronic devices that originate from the same manufacturer and only differ by serial number. Such devices have digital minutia details such as MAC address and FCC-IDs, however they may be mimicked using software defined radios (SDRs) or even worse, may not be considered during network authentication.
The purpose of RF-screening is the discovery of the set of RF-Biomarkers from the candidate feature-set, which provides the most useful electronic device verification accuracy. The goal of candidate screening is to provide the top verification feature-set of a claimed electronic device. The top performing RF-biomarkers are used to compare the logical contents of m to the physical attributes of the RF-Event’s benchmark to improve posterior classification estimates.
64
• 3.2.1.5 Gold Standard Validation
A diagnostic test is a formal classification method that partitions a condition into two (e.g.
True or False) generalized states [39]. A common diagnostic test, in practice, requires a standard reference for comparisons. A benchmark comparison test quantifies a truth reference’s measures of performance and is commonly referred to, in the medical community, as a gold standard (GS) [42] [58] [39]. A device-specific gold standard (GS) is a source of information, which tells us the true status of received RF transmission event (RF-Event) [42] condition as either benign or infectious. In this article, the validation test GS file consists of a set of repeatable RF-Events originating from a single trusted device and one or more logically equivalent RF-Event transmissions which originate from physically distinct (distrusted) devices.
Benchmark validation occurs when a GS truth reference is used to assess the diagnostic performance of a classifier and provides insight into the robustness of the benchmark’s trained RF signature against new unseen RF signatures. A new validation set of RF-Event collections are collected from the trusted transmission device using identical configurations used for benchmarking to make up the GS file dataset of RF-Events. In addition, RF-measurements are collected from 𝑇𝑇𝑅𝑅𝐵𝐵 by 𝑅𝑅𝑅𝑅𝐶𝐶.
The goal is to design a truth reference dataset such that the combination of RF-Event conditions (benign vs. infectious) are unknown to a designated authentication device 𝑅𝑅𝑅𝑅𝐶𝐶. The GS dataset contains the true RF pathology of an RF-Event’s condition as benign [𝑒𝑒 = 1] or an infectious condition [𝑒𝑒 = 0]. Upon receipt of a new RF-Event, 𝑅𝑅𝑅𝑅𝐶𝐶 employs local diagnostic testing, compares the RF-Biomarker feature-set to its known RF signature benchmark template and reports a diagnostic result. A benign claim test result [𝑇𝑇 = 1] occurs when the pathological RF origin’s similarities of the RF-Event meet acceptable tolerance levels.
65
An infectious test result [𝑇𝑇 = 0] occurs when the pathological origin of the RF-Event which fails to meet sufficient origin similarity threshold levels. To conduct a sensitivity or specificity test using a GS, the true condition of all RF-Events samples may consist of entirely all benign or infectious events.
Often times, this practice provides insight into the system’s detection capability, but may not provide insight into future observations of RF-Event’s received under normal operating conditions. To gain insights into normal operational performance, the GS file should contain an operationally representative proportion of infectious to benign RF-Events. Such a GS file can then be used to assess the estimated system performance under various system modes. The sequence and selection of benign vs. infectious RF-Events should occur randomly to avoid verification bias and to reduce unavoidable experimental errors. After all RF-events contained in the GS file have been presented to the system for classification the raw counts are tabulated for the True Positive, True Negative, False Positive, and False Negative probability rates [39] as described in Section-II (Measuring Diagnostic Accuracy).
A conventional 2x2-count table provides preliminary diagnostic assessment, using a GS file for validation, of N RF-Events. A true positive (TP) GS test result occurs when a received carrier’s true signature condition is benign and a diagnostic test reports a benign condition [T=1, D=1]. A true negative (TN) condition occurs when the carrier’s true status is infectious and the diagnostic result is infectious [T=0, D=0]. When a diagnostic test reports an infectious carrier condition and the true condition indicated by the GS are benign, a false positive (FP) count is increased [T=1, D=0]. Similarly, when a GS indicates a true benign condition and the test reports an infectious condition, a false negative (FN) result occurs [T=0, D=1].
66
At the conclusion of the GS validation test, the reported diagnostic results are compared to the truth reference of dataset under various threshold and parameter settings. Depending on the operational ecosystem that a user expects to employ diagnostic testing and their threshold level specifications, a receiver operating curve (ROC) may be useful in deciding the system settings that may provide the best performance to support their policy goals and objectives.
Moreover, a visualization of diagnostic results may also be useful for Cyber defenders during network defense operations as decision-support cues. The GS validation process concludes with a report of the intrinsic accuracy of each diagnostic test. The intrinsic accuracy provides the inherent accuracy (ACC) of a diagnostic test. The posterior classification accuracy provides insight into cost and benefit trade-offs associated with appropriate treatment selection following a diagnostic test.
• 3.2.1.6 Treatment Response
The purpose of this step provides diagnostic reasoning insight that involves a consideration of cost and benefit to the network itself, Cyber defender’s and key stake holder interests. Some responses are automatic, however in uncertainty; an automatic response may pose high-risk situations. A benign RF-Event is highly probable for originating from an authorized source transmission state and is not likely to cause network-disease to an authenticating device. However, an infectious Event contains suspicious origin integrity evidence which indicates abnormal RF-Event transmissions that may lead to network-disease if such events go undetected or untreated.
Treatment, in this context, refers to troubleshooting responses taken to mitigate or eliminate early warning signs of network-disease resulting from infectious credential acceptance.
67 3.2.1 6.1 Trade-Offs and Risk
There are trade-offs associated with each post-test treatment response of a network’s diagnostic result. A benefit occurs when the discovery of infection occurs [𝑇𝑇 = 1, 𝑒𝑒 = 1] and attempts to gain access are blocked as a treatment response, which ultimately results in the non-occurrence of network-disease. However, a cost occurs when network-disease occurs despite the use of treatment (e.g. blocking). If the cost of each diagnostic test were identical, then the more tests necessary to make a treatment decision increases with each additional test. Decision-makers aim to make the correct network treatment decision with as few diagnostic tests as necessary.
An arbitrary policy may specify a minimum accuracy of 90% pretest classification accuracy before recommending treatment for a network. Policy determines the goals and objectives and RF-Event similarity thresholds of acceptance for a given operational ecosystem that has known threat prevalence. When a diagnostic result falls below such a treatment threshold, a
“do nothing” and continue to monitor treatment recommendation may occur to mitigate network-disease symptoms. When intrinsic diagnostic accuracy is undesirable and error are high, additional diagnostics maybe necessary to provide useful decision-support for treatment. In Figure 10 a diagnostic test that falls between 𝑇𝑇ℎ1and 𝑇𝑇ℎ2 indicates inconclusive results and suggests a need for additional diagnostic testing.
Network treatment options are recommendable for results greater than 𝑇𝑇ℎ1. Situation (b) may occur when pre-test diagnostic accuracy results contain high errors resulting in less accurate posterior predictive estimates. The use of two thresholds may provide enhanced performance in uncertainty. Unfortunately, prior knowledge of the pre-test classification accuracy is often uncertain and lacks gold standard performance testing.
68 3.2.1.1.2 Risk
Consider a common network infrastructure, which consists of n-nodes. Each node’s original configuration through common network administration has inherent trust. That is, the set of nodes, which form the backbone of the network, are the trusted devices. 𝑇𝑇𝑇𝑇 collections of trusted devices form RF-biomarker baseline signatures. Signature development only considers authorized transmission carrier states. Policy specifies trusted device pairings for network communications according to transmission source origination to destination. RF signature comparisons occur as logical credential claims arrive to treatment 𝑅𝑅𝑇𝑇 nodes.
If a physical and logical match is indicated, the bit-level credential is likely authentic and benign; however, when levels are significantly dissimilar, the origin integrity of the carrier is likely infectious and treatment recommendations to prevent network-disease may be necessary. When results indicate high risk, more information about the RF event may be necessary to validate the origin integrity of fixed transmission sources.
𝑟𝑟𝑅𝑅𝑟𝑟𝑘𝑘(𝑦𝑦) ≡ 𝑃𝑃[𝑒𝑒 = 1| 𝑇𝑇 = 𝑡𝑡] (2) In general larger values of 𝑌𝑌 indicate higher levels of risk. In binary marker evaluations, we consider the simple setting where RF-Events either have high or low symptomatic risk values.
That is, high 𝑟𝑟𝑅𝑅𝑟𝑟𝑘𝑘(0) ≡ 𝑃𝑃[𝑒𝑒 = 0| 𝑌𝑌 = 0] = 𝑒𝑒𝑃𝑃𝑁𝑁, or the low value where low 𝑟𝑟𝑅𝑅𝑟𝑟𝑘𝑘(1) ≡ 𝑃𝑃[𝑒𝑒 = 1| 𝑌𝑌 = 1] = 𝑃𝑃𝑃𝑃𝑁𝑁.
Pepe recommends that the distribution of risk in the population indicated by the RF-biomarker should be reported (absolute risk and the frequencies of those risks in the population) [59]. The cumulative distribution function of the RF-biomarker under consideration is given by 𝐹𝐹. The risk level is
𝑅𝑅(𝐷𝐷) = 𝑃𝑃[𝑒𝑒 = 1| 𝑇𝑇 = 𝐹𝐹−1(𝐷𝐷)]. (3)
69
Let 𝑝𝑝 = prevalence which indicates how widespread the potential of network-disease (threat) is throughout the entire population under consideration.
Figure 10. Post-Test Diagnostic Treatment Decision Rules in Uncertainty
• 3.2.1.7 Refine/Update
After final RF-Biomarker selection, threshold selections, a simulation assesses the posterior accuracy of a diagnostic test using a GS validation file. Updates to the framework proposal can occur at any step without regard to order.