• No results found

How To Write A Network Analysis

N/A
N/A
Protected

Academic year: 2021

Share "How To Write A Network Analysis"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

FishGraph: A Network-Driven

Data Analysis

Patrícia Cavoto*, Victor Cardoso*, Régine Vignes Lebbe§, André Santanchè*

*UNICAMP – University of Campinas, São Paulo, Brasil §ISYEB - UMR 7205 – CNRS, MNHN, UPMC, EPHE UPMC Univ. Paris 06, Sorbonne Universités, Paris, France

(2)

Outline

• Motivation

• Goal

• ReGraph: from FishBase to FishGraph

• Data Experiments

(3)

Motivation

 Collaborative research involving:

 LIS - Laboratory of Information Systems – UNICAMP, Brazil

 MNHN - National Museum of Natural History and Sorbonne Univertès – Paris, France

 FishBase Consortium

(4)

Motivation

 FishBase: a relational database and information system for

biological data storage of fish species, with millions of records

containing:

 Species, taxonomic classification and predators

 Locations (country and ecosystem)

(5)

Motivation

 Identification Key:

 A biology mechanism to identify a specific specimen

 Composed by a set of questions that guides scientists in this identification

 Has one or more species associated

 Similar to a decision tree

(6)

6 - Freshwater fishes of Africa

Five pairs of external gill slits

Single, or single pair of gill

openings

Head without extended rostrum,

gill slits lateral

Head with extended rostrum,

gill slits ventral

Body without scales, or scales

small and not clearly visible.

Body with clearly visible scales.

Body slender, elongate and

eel-like Body not eel-like … …

Identification Key

Example

(7)

Identification Key Problem

6 - Freshwater fishes of Africa 1419 - Species of Schilbe of Africa adapted from: http://fishbase.org/keys/description.php?keycode=6 http://fishbase.org/keys/description.php?keycode=1419 7

?

(8)

Identification Key Problem

6 - Freshwater fishes of Africa … … Adipose fin present Adipose fin absent

(9)

Identification Key Problem

6 - Freshwater fishes of Africa 1419 - Species of Schilbe of Africa … … Adipose fin present Adipose fin absent Adipose fin present Adipose fin absent … … adapted from: http://fishbase.org/keys/description.php?keycode=6 http://fishbase.org/keys/description.php?keycode=1419 11

(10)

Motivation

 Biological data (as in FishBase) form a big network

 Biologists need network analysis for:

 Identify the most important species in an specific food chain;

 Define areas (or species) for preservation;

 Find relations in a network of identification keys.

(11)

Motivation

How to support biologists in

network-driven analysis?

(12)

Goal

Build a

network

database for

analysis from a

relational

(13)

ReGraph: from FishBase to FishGraph

Graph databases:

 Very effective in network analysis

 Flexible structure

 Easy to run transitive relationships

(14)

ReGraph: from FishBase to FishGraph

ReGraph: a framework that generates a graph database from a relational database.

(15)

ReGraph: from FishBase to FishGraph

ReGraph: a framework that generates a graph database from a relational database.

FishGraph: A Network-Driven Data Analysis 15

(16)

ReGraph: from FishBase to FishGraph

ReGraph: maintain the graph database synchronized with the relational database (one-way synchronization).

(17)

ReGraph: from FishBase to FishGraph

ReGraph: maintain the graph database synchronized with the relational database (one-way synchronization).

(18)

ReGraph: from FishBase to FishGraph

ReGraph: Relational and Graph Databases keep their native form.

Current systemsCurrent

(19)

ReGraph: from FishBase to FishGraph

ReGraph: allows adding new data in the graph database.

FishGraph: A Network-Driven Data Analysis 19

(20)

ReGraph: from FishBase to FishGraph

ReGraph: mapped and annotated subgraphs are integrated and avaiable for running analysis.

(21)

ReGraph: from FishBase to FishGraph

ReGraph: connects data in the local graph with global graphs on the web.

FishGraph: A Network-Driven Data Analysis 21

Semantic Web

(22)

ReGraph: from FishBase to FishGraph

STEPS:

1. Map data from relational database to graph

2. Run the ETL process to load initial data

3. Synchronism process starts to run after the first loading

4. Add new information as annotation (optional)

(23)

ReGraph: from FishBase to FishGraph

ReGraph: used to generate FishGraph (graph database) from FishBase (relational database)

FishGraph: A Network-Driven Data Analysis 23

(24)

ReGraph: from FishBase to FishGraph

GENERA

FAMILIES

ORDERS CLASSES

(25)

ReGraph: from FishBase to FishGraph

FishGraph: A Network-Driven Data Analysis 25

SPECIES ECOSYSTEM COUNTRY KEY GENUS FAMILY ORDER CLASS belongs_to

(26)

Experiments: Identification Key Analysis

 Data used:

 Identification keys

 Species

 Geographic locations (countries and ecossystems)

(27)

Experiments: Identification Key Analysis

 Long term goal:

 Start the identification process from any node across several identification keys.

 Goals in this analysis:

 Find similarities between keys

 Find differences between keys

 Analyze groups of keys

(28)

The annotated “share” edge connects keys that share at least one species.

Experiments: Identification Keys Analysis

SPECIES COUNTRY GENUS FAMILY CLASS belongs_to

(29)

Experiments: Identification Keys Analysis

FishGraph: A Network-Driven Data Analysis 29

Components based on the “share” edge connecting two or more distinct keys with their associated species.

A component is a subgraph in which there is a path from any node to another one.

(30)

Experiments: Identification Keys Analysis

205 316

Identification key

Species colored by family (same order and class)

(31)

Experiments: Identification Keys Analysis

FishGraph: A Network-Driven Data Analysis 31

205 - Key to the species of scorpionfishesoccurring in the Western Central Pacific. 316 - Key to the species of Indo-Pacific Scorpionfish(Genus Scorpaenopsis).

205 316

Identification key

Species colored by family (same order and class)

(32)

Experiments: Identification Keys Analysis

205 316

Identification key

Species colored by family (same order and class)

(33)

Experiments: Identification Keys Analysis

FishGraph: A Network-Driven Data Analysis 33

205 - Key to the species of scorpionfishesoccurring in the Western Central Pacific. 316 - Key to the species of Indo-Pacific Scorpionfish(Genus Scorpaenopsis).

205 316

Identification key

Species colored by family (same order and class)

(34)

Experiments: Taxonomic Classification Analysis

 Data used:  Class  Order  Family  Genus  Species

(35)

Experiments: Taxonomic Classification Analysis

 Goals:

 Compare data in FishGraph with data in a global graph (DBpedia)

 Find divergences

 Propose reviews

(36)

Experiments: Taxonomic Classification

(37)

Experiments: Taxonomic Classification Analysis

(38)

Conclusions

 Graph databases to perform network analyses

 One-way synchronization

 Annotations and connection with other sources on the Web

 Network-driven data analysis for knowledge discovery:

 Identification keys

(39)

Conclusions

 Future Work:

 Register provenance from data obtained from web sources

 Organize “same as” nodes in the local graph

 Enable distinct graph mappings from one relational model

(40)

FishGraph: A Network-Driven

Data Analysis

Thank you!

Acknowledgments:

Unicamp, LIS members, FishBase Consortium, FAPESP, CNPq, CAPES

References

Related documents

Using this extended form of NEMS and generic factors for cost and performance of refurbishing, retrofitting, and repowering as a function of plant characteristics such as heat

(v) five specimens of polycrystalline particles with all sides machined using the FIB (Figure 5.6a), including the ligament back-surface, and containing a grain boundary

The primary data was collected through questionnaires to gather information on the effect of change management on performance of government institution for the Case

The research results highlight the importance of selecting auditors with the appropriate ISO 9001:2015 knowledge and experience to assess ISO 9001:2015 Quality Management

Businesses, public sector institutions and non-profit-organisations now pay their licence fee based on the number of their business premises, their employees and their

Drug IV PO Methadone and buprenorphine equivalents are controversial. Do not make adjustments without consulting CPS or APS. Consult CPS for patients on Methadone, Suboxone, Subutex,

After this part we will know the requirement of the different ATLAS software and we will have a general idea on the features offered by the different tools which will be

(1) a historical introduction to world city networks, focusing on Amsterdam, (2) the contemporary connectivity and competition of Amsterdam, Utrecht, Rotterdam and The Hague, to