• No results found

Workshop Series on Open Source Research Methodology in Support of Non-Proliferation

N/A
N/A
Protected

Academic year: 2021

Share "Workshop Series on Open Source Research Methodology in Support of Non-Proliferation"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

The International Centre for Security Analysis

The Policy Institute at King’s

King’s College London

Workshop Series on Open

Source Research Methodology

in Support of Non-Proliferation

Workshop 1: Exploiting Big Data in

Support of Non-Proliferation

Workshop Series Funded by the Carnegie

Corporation of New York

(2)

1

Contents

Executive Summary ... 2

Introduction ... 3

1: Key Concepts ... 4

Defining and Understanding the Concept of Big Data ... 4

Big Data and Organisational Culture ... 5

2: Big Data Methodologies and Applications ... 6

Quantitative Text Analysis ... 6

Satellite Imagery ... 7

Identifying Anomalies using Big Data ... 8

3: Legal and Ethical Considerations of Using Big Data ... 9

4: Breakout Group Discussions ... 11

Breakout Session One: Big Data and Proliferation Networks ... 11

(3)

2

Executive Summary

The workshop discussed a wide-range of issues relating to the potential applications of big data analytics in non-proliferation. A summary of the key discussion points and issues that must be addressed in the future is given below:

1. Big data analytics may provide useful early indicators of anomalies and deviations from normal or expected behaviour with regards to non-proliferation.

 Detecting early indicators of irregularities or illicit activities can then provide the analyst with the opportunity to investigate these further.

 Assessing what is considered to be “normal” activity is a significant challenge. 2. Incorporating big data analytics for non-proliferation into existing analytical

frameworks often results in organisational resistance.

 Organisational roadblocks remain a significant barrier to the implementation of big data methodologies.

 Demonstrating the value of big data analytics and open source research

methodologies is important in overcoming organisational resistance.

3. The successful application of big data methodologies and approaches to non-proliferation issues rests on integrating a varied range of data sources.

 Collating data sources into a coherent and useful dataset for detailed analysis is vital for the effective use of big data and open source research methodologies.  Understanding and effectively utilising existing datasets is as important as

incorporating new sources of information.

4. New and underused sources of data may be of significant value for the non-proliferation community when combined with big data analytics.

 Satellite imagery is one such source of data that could be integrated into existing analytical frameworks.

 However, costs and processing capabilities remain substantial barriers to the effective adoption of new data sources and research methods.

5. Legal and ethical concerns must be addressed.

 Legality and ethics must be considered if the use of open source information and

big data methodologies is to be fair and lawful in support of non-proliferation.  The use of big data analytics and open source research methodologies in

(4)

3

Introduction

The International Centre for Security Analysis at King’s College London is running a four-part workshop series on open source research methodologies in support of non-proliferation. The series is funded by the Carnegie Corporation of New York. It is designed to stimulate discussion on how open source information and research methodologies can help tackle key issues in non-proliferation, including verification and

monitoringby the nuclear safeguards community. The workshop series is as follows:

1. Workshop One: the exploitation of big data research methodologies and applications in support of non-proliferation;

2. Workshop Two: the exploitation of financial open source information in support of non-proliferation;

3. Workshop Three: the challenges and solutions to the effective collection and analysis of open source information on states’ nuclear-relevant scientific and technical capabilities;

4. Workshop Four: the challenges and solutions to the effective collection and analysis of open source information on both state and state nuclear non-proliferation strategies.

The first workshop on the exploitation of big data in support of non-proliferation was held in Vienna on the 26th-27th January 2015. The workshop aimed to examine the role of big data analytical tools, research techniques and methodologies for non-proliferation efforts. The workshop approached this topic from a multi-disciplinary perspective drawing upon expertise from: academia, law enforcement, the law and the private sector. A multi-disciplinary approach was deliberately chosen to reflect the diverse approaches to big data which may be of relevance to non-proliferation activities.

The workshop also aimed to facilitate the discussion of new open source research methodologies and approaches that may be of use for the non-proliferation community. To achieve this, the workshop was structured as a series of panel presentations and two breakout sessions. The panel presentations were designed to give insights into innovative and interesting applications of big data and open source research methodologies. The breakout sessions sought to stimulate discussion on the practical applications of big data methodologies in support of non-proliferation.

The International Centre for Security Analysis would like to thank all the workshop participants for their contributions.

(5)

4

1: Key Concepts

Defining and Understanding the Concept of Big Data

Big data is a contested concept with no single overarching definition. Often definitions will refer to the “3Vs”:

Volume: intuitively, big data as a concept implies that the size of datasets is a key defining characteristic;

Velocity: the speed of data generation and processing;

Variety: the context of data and the wide range of forms it can take.

Additional “Vs” have been added periodically in variations on the 3Vs definition, including: veracity (data quality) and variability (data inconsistencies). The workshop did not dwell on defining big data; instead, early discussions focussed on how to conceptualise big data in a practical and meaningful way. This enabled the workshop to focus on the more significant questions relating to how big data applications could support and enhance the activities of the non-proliferation community.

One intuitive way of thinking about big data and big data methodologies that was discussed centred on conceptualising it as: a process to access, collect and analyse data that is out of reach with current tools and technologies. These tools and technologies have dramatically improved over the last 20 years, enabling more sophisticated processes to analyse ever larger and more complex datasets. However, developments in tools, technologies and methodologies encourage us to attempt to collect, process and analyse ever-larger, out of reach datasets. It is important to recognise that big data methodologies must be dynamic and flexible as datasets increase in size, scope and complexity.

Furthermore, many of the discussions emphasised the importance of conceptualising big data not just from the perspective of the developer of capabilities but also from the perspective of end users. The aim should be to develop a clear and comprehensive understanding in the non-proliferation community about what we understand big data to mean and why it has become popular as a concept.

Developing this understanding is critical as we are now in a world of exabyte data, a vast amount of information that is hard to comprehend. An intuitive way to grasp this volume of data is to consider the thickness of a note (0.0043 inches). An ‘exa’ quantity of notes, stacked on top of each other, would stretch to the moon and back over 250,000 times. This fact clearly demonstrates that in big data disciplines we are increasingly moving far beyond the capacities of human analysis and testing the limits of computer processing.

(6)

5 Big Data and Organisational Culture

The implementation of big data methodologies within organisations is often a significant challenge and one that a number of the workshop panellists addressed. Discussions throughout the workshop often returned to issues relating to organisational culture. Here a number of the most significant discussions are described, including issues relating to:

 Organisational control of information;

 Understanding existing data and integrating new sources of information;

 Development of big data and open source research methodologies.

Existing datasets often exist in silos, controlled by individuals or analytical teams who may be reluctant to share information with their colleagues. Integrating disparate datasets into a single, coherent and intuitive platform is therefore both a technical and organisational challenge. It requires collaborating closely with technical developers to design and implement a platform that analysts can use effectively. It also requires demonstrating to analysts the value of sharing information across departments and analytical teams.

A related issue is the procurement and development of big data capabilities; a common problem is an overemphasis on procurement rather than innovation and integration. This then feeds into the problems of big data analytics and technologies that fail to satisfy the requirements of the end user. It is a challenge that is further exacerbated by an institutional resistance or poor understanding of the benefits that effectively developed big data solutions could offer to existing processes.

Access to the most interesting or potentially useful datasets is often difficult. Such big datasets are often controlled by organisations and not released due to a number of concerns including: commercial sensitivity or privacy concerns. Furthermore, datasets are not released because organisations find it extremely challenging to place a value on the data they control.

Related to this is the concept that organisations should retain an effective control of their data while developing the necessary information technology infrastructure and analytical approaches to maximise the utility of big data methods. An understanding of what datasets are under an organisation’s control is important in a rapidly evolving data environment. Developing this understanding will enable an organisation to effectively implement open source research and big data methodologies to unique research problems, for example specific issues in non-proliferation.

(7)

6

2: Big Data Methodologies and Applications

Quantitative Text Analysis Overview

Quantitative text analysis is an expressly quantitative variant of content analysis that uses a statistical measure to monitor word frequency and relevance in a piece of text. The development of quantitative text analysis as a methodology has been fundamentally driven by the large, and growing, volumes of easily accessible and non-copyrighted text available online. The applications of this quantitative text analysis are varied, including:

 Fast annotation of large volumes of text;

 Discover associations between text features to shape insight and prediction;

 Perform event detection or sentiment analysis;

 Topic tracking in a large corpus of text to identify changes in behaviour over time;  Almost limitless industry-specific applications.

Applications for Non-Proliferation

Quantitative text analysis offers a number of other practical applications that may be relevant to non-proliferation. These include: tracking public opinion on social media; measuring latent political positions or issues frames within political texts. In the second breakout session (see below), the groups discussed the use of sentiment analysis of textual data to predict behaviour and its possible application to provide indications of decision-maker intent and public opinion regarding nuclear security developments. Quantitative text analysis also facilitates more complex analyses of document features such as the frequency a word appears as the object or subject. This creates a document frequency matrix that can be weighted according to key input features and used as a basis for further analysis.

Non-proliferation research and analysis often involves collecting, processing and analysing large volumes of unstructured text in a range of file formats. Applying quantitative text analysis methodologies could be used to create structure from unstructured text and enable analyses of keywords, latent political positions. It may also provide a framework for further evaluation by acting as an aid for the analyst faced with processing and analysing large volumes of text.

Challenges

The workshop addressed the main barriers to implementing quantitative text analysis as one big data approach to analysing large volumes of unstructured text. One of these challenges is the unique and highly technical lexicon of the non-proliferation community;

(8)

7 to overcome this, linguistics experts could work with non-proliferation analysts to develop a unique dictionary of words relating to non-proliferation.

Another set of challenges relates to the practical means of processing and analysing large volumes of text. Primarily, this is not a problem of data storage but rather developing tools and techniques for extracting and structuring typically unstructured texts. A number of techniques have been developed to address this challenge. One such development is the use of natural language processing to extract and structure useful data from unstructured texts.

Furthermore, assessing proliferation risks requires the analysis of large quantities of text in multiple languages. This poses an additional challenge to the application of quantitative text analysis methodologies which predominantly focus on languages written in the Roman script. Techniques often use the white spaces between words to parse sentences but this method will face challenges in other languages where sentences are constructed very differently in terms of grammar and non-Roman scripts are used for written text. Satellite Imagery

Overview

The non-proliferation community already uses satellite imagery in support of its analysis and evaluation of proliferation issues. There are likely to be growing opportunities to capitalise on the falling cost, increased resolution and intra-daily monitoring capabilities of commercial satellite imagery. However, it is important to recognise that the integration of satellite imagery into existing analyses may pose unique legal and ethical challenges (see discussion on legal issues below).

Applications for Non-Proliferation

Although the non-proliferation community already uses satellite imagery to complement other information sources, further integration may be beneficial. In particular, the ability to monitor a particular area of interest over an extended period of time using satellite imagery will produce a time-series library of data. This library of information can then be cross-referenced against other sources of information to develop a more comprehensive understanding of activities.

One particular area where satellite imagery may complement existing information sources is through the integration of time-series imagery with trade data collected from a range of sources. This could provide the analyst with not only a richer picture of activity in a given location but also a more detailed understanding of trade networks, supply chains and the movement of goods across the Earth’s surface. Discussions again touched on whether satellite imagery, especially time-series imagery, may be able to provide indicators of anomalies or deviations from typical behaviour and activity.

(9)

8 Identifying Anomalies using Big Data

Overview

One of the most discussed concepts in the workshop was the potential applications of big data in identifying anomalies or early indicators of illicit non-proliferation activity. By identifying and flagging anomalies to the analyst, big data methodologies may be most useful as a guide for further analytical or investigative work.

Methods to do this effectively rely on a thorough understanding of existing datasets within an organisation and integrating these across teams. Providing a single portal to search across all structured and unstructured documents will enable analysts to use existing data more effectively to identify anomalies, inconsistencies or suspicious activities.

Applications for Non-Proliferation

Using big data analytics to identify anomalies in activity or reporting may be a useful application for non-proliferation. In particular, an experienced analyst may benefit from a system that uses big data techniques to search across datasets to identify possible anomalies that the analyst can analyse further. This may become especially important as analysts face ever greater volumes of information to process and analyse most of which is likely to be “noise” rather than “signal”.

Practical applications of big data in identifying anomalies, red flags or proliferation risk factors were discussed in the first breakout session which centred on analysing state and non-state actor proliferation networks (see below). The use of big data for anomaly detection is one avenue the workshop identified as a potentially useful area of further research in the non-proliferation community.

Challenges

Discussions on the applications of big data in this regard also identified potential limitations. Using big data analytics to identify anomalies in behaviour or activities may only be useful in specific contexts. For example, discussions during the workshop questioned how reliable or useful this approach would be in identifying anomalous behaviour by individuals or small groups of people, for example small proliferation networks.

It was suggested that it may be more productive to integrate anomaly-seeking and behavioural approaches in the analysis of state activity and behaviour rather than non-state actors. However, in both cases, a key challenge will be developing a process to accurately understand and articulating what is meant by “normal” behaviour or activity so that analysts may identify any deviations from this expected result.

(10)

9

3: Legal and Ethical Considerations of Using Big Data

Introduction

The use of big data analytics raises a host of legal and ethical concerns that, in recent years, have centred on the issue of privacy. All of the panels touched on the issue of privacy with a number of interesting points raised:

 Privacy is a critical issue but one which is difficult to assess. In particular, framing big data analytics in terms of privacy automatically primes privacy concerns in analysts which may affect the nature and output of research and analysis;

 The most interesting, valuable or useful datasets are often controlled and not released, in part because of concerns over privacy;

 There is a global debate, particularly evident in the European Union, regarding an individual’s right to control how their personally identifiable data is used;

 The process of collecting, transferring and analysing datasets from across the globe is complex and involves the need to comply with different laws regarding data usage across multiple jurisdictions;

 Using insights gleaned from personal data captured by big data analytics could be

problematic due to an individual’s right to remain anonymous under EU law. Big Data and the Existing Non-Proliferation Legal Framework

The non-proliferation legal regime rests on three key areas: 1. Treaties and agreements;

2. Export control laws;

3. Interruption and interdiction.

These three are fundamental to the goal of preventing proliferation but efforts in these three areas tend to be disaggregated and further complicated by different international and domestic legal approaches. The problems posed by a disaggregated approach and different legal regimes makes integrating big data analytics and methodologies particularly challenging.

More practically it may be very difficult to use big data techniques and methodologies to quantify or measure the intent of an individual to proliferate. Even if big data analytics can be used to identify proliferation activities the path to prosecution is difficult. The reliance on international agreements and the difficulty in securing extradition are significant barriers to effective prosecution. This has led to alternative legal and political approaches, especially economic sanctions, which have had some success in preventing companies and individuals from accessing the US market.

(11)

10 Implications for Non-Proliferation

To effectively guide the use of big data by organisations requires the implementation of “big ethics” or “big guidance” principles. These must relate to the collection, retention and use of data as well as the protection of that data from external threats.

The collection, analysis and retention of any data that can be considered “personal” data raise a number of legal and ethical challenges for the non-proliferation community to address. The retention of any such data must be fair and lawful; this means it must be up-to-date, accurate and held for no longer than is necessary. This imposes additional costs on organisations which must institute effective and comprehensive safeguards to mitigate the risk of a data breach. The need to comply with local, national and international laws on data protection through the “anonymisation” or “de-identification” of personal data may, in some cases, be prohibitively time-consuming and expensive. Such costs must be weighed against the anticipated benefits offered by big data methodologies for non-proliferation.

New forms of data that may be of use in non-proliferation analysis also pose new legal and ethical challenges. For instance, satellite imagery offers non-proliferation analysts the opportunity to monitor sites and facilities of interest. However, as the capabilities of commercial satellite companies develop, particularly in higher resolution imagery, legal concerns regarding surveillance and privacy could arise.

Discussions during the workshop also focused on the dichotomy between technical and non-technical challenges. Significant time and research effort has centred on understanding and overcoming technological barriers to the application of big data methodologies. However, less attention has been paid to the non-material, non-technical challenges relating to the use of big data in support of non-proliferation, especially:

Governance: current approaches tend to be focused on existing developments rather than anticipating future innovations and developments;

Regulation: related to governance, regulation of 21st century innovations and technological developments is typically carried out by 20th century institutions, laws and norms;

Fragmentation: big data analytics are seen as a solution to challenges across a multitude of disciplines; if true we require big data analytics to be simultaneously highly flexible and very context specific.

Addressing these issues involves not only developing new technologies and approaches but also instituting new legal and organisational structures.

(12)

11

4: Breakout Group Discussions

Breakout Session One: Big Data and Proliferation Networks Outline

The first breakout session attempted to understand the applications of big data methodologies and analytics to the challenge of identifying and analysing proliferation networks. This was a broad starting point for a discussion on a range of topics, including:

 The differences between state and non-state proliferation networks and activities

and how these differences affect the implementation of big data analytics;

 Whether big data analytics may be able to identify anomalies or suspicious proliferation activity that uses the avenues of legitimate commerce;

 The important distinction between using big data to assess the capability to carry out proliferation activities and its use in assessing the intent to proliferate.

Discussion

A key focus of the three breakout groups was the potential applications of big data analytics in pattern recognition and anomaly detection in large, disparate datasets. This approach requires not only the implementation of effective analytical approaches but also a comprehensive assessment of existing datasets and the integration of new sources of data as they arise. Discussions emphasised the important differences between state and non-state proliferation networks which affect the analyst’s ability to identify anomalous behaviour or to even characterise what is considered to be “normal” behaviour.

The discussion also identified practical steps to enable a more effective integration of big data into non-proliferation activities. Discussions focused on integrating a range of big data analytics where appropriate to analyse various stages of proliferation pathways. It is important to ensure that the integration of big data into current analytical methods is both relevant and useful. For example, free text extraction techniques may be useful in achieving understanding and clarity from large sets of proliferation-relevant scientific and technical data.

There is already a wealth of information available on issues in relation to non-proliferation issues that could provide richer insights through the application of big data analytics. In particular, big data analytics could be applied to trade data, especially the trade mechanisms for the transfer of sensitive or dual-use technologies. However, there are also areas where the non-proliferation community has little insight. Examples include the dark web, which is a largely untapped resource, and visual data which is needs to be geo-tagged or labelled for analysis.

(13)

12 Breakout Session Two: Big Data and Non-Proliferation Sentiment

Outline:

The second breakout session attempted to understand how big data may be used to assess the attitudes of decision-makers and the wider public of a state towards proliferation. This issue formed the basis of a wide-ranging discussion on a number of topics, including:

 Developing “conversational maps” using big data analytics to assess change in attitudes and sentiment over time and possibly identify divergences that may indicated covert proliferation-related activities;

 Big data analytics also offer the ability to identify clusters of activity and sentiment towards specific topic within particular geographic locations to provide indicators of localised behaviour;

 Challenges to accurately assessing sentiment and attitudes include: language barriers, especially where we are reliant on machine translation; the questionable veracity of some open source information; and the problem of disinformation. Discussion

Discussions within groups covered a range of opportunities and challenges to the application of big data methodologies in assessing decision-maker and public attitudes towards non-proliferation. A key focus of these discussions was the importance of assessing disparate pieces of information in conjunction with other sources rather than in isolation. In addition, the group discussions recognised that big data analytics are an aid for the analyst but are not yet sophisticated enough presently to replace human analysis. It is important to train analysts to make sense of, and draw meaningful conclusions from, results derived from big data analytics in the context of non-proliferation.

The discussion was conducted with reference to another challenge: assessing the reliability of open source information. A particular point of discussion was the development of an analytical approach to recognise and filter active disinformation from any analysis of sentiment and trends. This is not a unique challenge to big data sentiment analysis and there already exists a developed literature on assessing the reliability, veracity and usefulness of open source information.

It was recognised that big data analytics may not be useful in analysing decision-maker sentiment if the volume of non-proliferation relevant data, such as speeches, official statements, is small or subject to government filters. This raises the problem of inherent biases within the relevant dataset. Similar challenges may arise when attempting to assess public sentiment, especially in foreign languages, through the application of open source research methodologies and big data analytics.

References

Related documents

'his eBook is for informational purposes only an is not intene for use as a source of legal" )usiness" a source of legal" )usiness" accounting or

Though many individuals were supportive of the International House Program between the years 1946-197 1, three deserve special recognition, according to Dr. Ayers, editor

In addition, students will be required to maintain a log (in the form preferred by the student and acceptable to the mentor) as evidence of the research carried out for the

The complementarity among energy sources also depends on system constraints (such as the electricity grid) and those constraints will also evolve in the future as a result of

An ATQP gives an operator the possibility to establish an alternative training and checking programme that should maintain at least an equivalent or improved

Figure 5.1 Seasonal measurements of gross photosynthesis (Pg) at midday and volumetric soil water content ( θ v ) in the 0 to 15 cm profile in Kentucky bluegrass, tall fescue,

The approach is to estimate a complete system of factor share equations for low-skilled labor, high-skilled labor, capital, energy, and materials, taking account of biased

This project analyzed the change in academic motivation for 200 students from WPI based on data obtained from the Wabash National Study of Liberal Arts Education (WNS).. This