• No results found

A Crowd Method for Internet-based Software with Big Data

N/A
N/A
Protected

Academic year: 2021

Share "A Crowd Method for Internet-based Software with Big Data"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

A Crowd Method for Internet-based

Software with Big Data

Gang Yin

Software Collaboration and Data Mining Group

2014

中南大学

英特尔

(2)

Contents

Motivation

Approach

Application

(3)

Internet-based Software

On the Internet

The various online user communities are reshaping the

development of Internet-based software

(4)

Attractive Solutions

and Features

Rapid Experience

and Response

Continuous Evolution

and Improvement

Characteristics of Internet-based Software

Function

Construction

(5)
(6)

Open Source Miracles

Eric Raymond

Richard Stallman

Linus Torvalds

lead the Linux kernel

project

launched the GNU

Project, wrote the GPL

(7)

Open Source Miracles

Sourceforge:3.5 million users, 400,000 projects

Collaborative Development Communities

(8)

Open Source Miracles

2 million users

users

developers

IT practitioners

… …

14 million topics

Avg. response time :

11 minutes

Open source software has strongly demonstrated the

Knowledge Sharing Communities

(9)
(10)

Other Peer-based Practices

Sharing

Collaboration

Peering

(11)

Open

Source

Crowd-based

Approach ?

Crowd-based Approach

High-Level

Language

Software

Engineering

Engineering

Approach

1960s

1970s

1990s

Automated

Approach

(12)

Crowd-based Approach: Step I

Crowd-based Approach

Traditional

Approaches

Peer-based

Approaches

(13)

“Big Data” in Software Development

— API — — software — — user — — tag — — time —

project profile

source code

issue tracker

mailing list

… …

Collaborative Development Communities

Q & A

tags / features

forum posts

… …

blogs / news

Knowledge Sharing Communities
(14)

The power of “Big Data”

Crowd-based Approach

Traditional

Approaches

Peer-based

Approaches

SourceForge GitHub ohloh Softpedia StackOverflow

Scope

Quality

(15)

Crowd-based Approach: Step II

Fundamental

Approaches

Peer-based

Approaches

Human-Centric

Approaches for Mining

Engineering Data

Approaches for Mining

Community Data

Data-Centric

(16)

Trustie Project

National High-Tech Development Plan (863 Program)

National Trustworthy Software Resource Sharing and

Cooperating Production Environment

(17)

Contents

Motivation

Approach

Application

(18)

Software Trustworthiness

The history of Linux

suggested a surprising

theories about software

engineering.

“Given enough eyeballs,

all bugs are shallow”

(19)

Software Trustworthiness

Open source software

gives us a new sense

“Trustworthiness of

Internet-based software is

hidden in the big data”

Engineering Data + Community Data

(20)

Data-centric Innovation Cycle

Software

Data

Crowd-based Creation Crowd-based Evolution Crowd-based Construction

(21)
(22)

Principles of the Crowd Method

(23)

Research Issues on Software “Big Data”

Internet Software Communities Mass Collaboration Open Resource

How to find the software more accurately across

the various Internet

How to locate the

trustworthy software artifacts in Internet

Data Analysis

How to support the engineers and crowds to

collaborate in large scale development ?

How to enable the crowd development for

the industrial software production ?

How to evaluate the

contribution of the developers in

projects ?

How to evaluate the

trustworthiness of software artifacts ?

(24)

Results on Data Analysis

Developers’ productivity plateaus within 6-7 months in small and medium projects and it takes up to 12 months in large projects.

(25)

Results on Data Analysis

The crowds can find interesting

projects The crowds can collaborate with

(26)

New Results on Mass Collaboration

• Similarity of the texts of bugs and posts

Text

• The time when the issues and Q & A are published

Time

• Co-occurred users in the two communities

Co-occurred

users

StackOverflow

Q&A Community

Android

(27)

New Results on Mass Collaboration

Classifier

Top-N

0.42 0.23 0.17 0.12

Coder Reviewer

Prediction

(28)

New Results on Resource Sharing

Fine grained, efficient software resource

classification for Crowd generated artifacts

Ohloh

Freecode

Aggregation of online descriptions Hierarchical Classifier Software Communities
(29)

Platform and Practices

Trustie Software Resource Sharing Toolset

Trustie Software Data Storage and Analysis Toolset Development

Environment Trustie Collaborative Development Toolset Application

Practices

Application in Large Scale Software Industries

Neusoft Careland Wonders group

Digital China

Common Application Modes and Platforms

Enterprise Version Community Version Education Version Application in Mission Critical Systems Space flight Electricity Flight control Defense Software Communities Component-based SPL Service-oriented SPL Heterogeneous SPL Runtime-monitoring SPL Third-party SPL Third-party SPL

(30)

Contents

Motivation

Approach

Application

Is the Crowd Method

Critical

information

systems

Software

engineering

education

Software

industries

(31)

Application in Internet Communities

Collaboration Community

– more than 32,000 users

– more than 1,500 projects

– users and projects can be

analyzed comprehensively

Sharing Community

– various kinds of software

resources

(32)

Application in Software Industries

Trustie are imported into more than 10 software companies in China,

Trustie supported the

development of 8 health care information systems in Neusoft.

Software reusability increases 75%;

productivity increases 65%

Neusoft Corporation

Digital China set up the industrial SPL for trustworthy taxation

software development.

Software reusability increases 60%; # of bugs decreases 20%

(33)
(34)

Application in Universities

Course project Course Course Course project project project Interests Collaboration

MOOC 2.0

(35)

Application in Universities

Project Hosting  Version control  Issue tracking  Project profile  Forum/wiki http://forge.trustie.net http://course.trustie.net Course Hosting  Course management  Member management  Exercise monitoring  Resource management http://contest.trustie.net Contest Hosting  Contest publishing  Submission of works  Discussion  Ranking
(36)

Future Work

Application of Trustie

Technologies

MOOP, MOOC 2.0

Software engineering education

Software garden and industries

Research on the Crowd Method

– Data-driven collaborative development

– Data-driven software resource sharing

– Data-driven trustworthiness analysis

Data Mining Software Engineering Network Analysis Critical System Industry Education

(37)

Thank You !

Questions ?

http://forge.trustie.net

2014

中南大学

英特尔

References

Related documents

With the rapid development of information technology, cloud computing, big data, Internet of things, mobile Internet applications gradually deepened, which bring great

Open challenges and research issues of big dataand cloud computing are discussed in detail by many researcherswhich highlights the challenges in data management,data variety

Research on real time network data mining technology for big data RESEARCH Open Access Research on real time network data mining technology for big data Jing Hu1,2* and Xianbin Xu1

Research on trust mechanism of cooperation innovation with big data processing based on blockchain RESEARCH Open Access Research on trust mechanism of cooperation innovation with big

Today, Bloor Research believes that the company is well placed to exploit the capabilities of kdb+ in other big data markets and, especially, with respect to the Internet of

Workspace: secure data handling for research data; Water Science Software. Institute: transforming software needed for Big Data analysis; REACH-NC: searchable public database

Big Data Storage Layer.. The distributed Vugen, Controllers and monitors are executing the test scenarios interacting with the Big Test Data in the storage layer

Towards Integration of Big Data Analytics in Internet of Things Mashup Tools.4. Internet of Things (IoT) & Application