A Crowd Method for Internet-based
Software with Big Data
Gang Yin
Software Collaboration and Data Mining Group
2014
中南大学
—
英特尔
Contents
•
Motivation
•
Approach
•
Application
Internet-based Software
On the Internet
The various online user communities are reshaping the
development of Internet-based software
Attractive Solutions
and Features
Rapid Experience
and Response
Continuous Evolution
and Improvement
Characteristics of Internet-based Software
Function
Construction
Open Source Miracles
Eric Raymond
Richard Stallman
Linus Torvalds
lead the Linux kernel
project
launched the GNU
Project, wrote the GPL
Open Source Miracles
Sourceforge:3.5 million users, 400,000 projects
•
Collaborative Development Communities
Open Source Miracles
•
2 million users
–
users
–
developers
–
IT practitioners
–
… …
•
14 million topics
–
Avg. response time :
11 minutes
Open source software has strongly demonstrated the
•
Knowledge Sharing Communities
Other Peer-based Practices
Sharing
Collaboration
Peering
Open
Source
Crowd-based
Approach ?
Crowd-based Approach
High-Level
Language
Software
Engineering
Engineering
Approach
1960s
1970s
1990s
Automated
Approach
Crowd-based Approach: Step I
Crowd-based Approach
Traditional
Approaches
Peer-based
Approaches
“Big Data” in Software Development
— API — — software — — user — — tag — — time —project profile
source code
issue tracker
mailing list
… …
Collaborative Development CommunitiesQ & A
tags / features
forum posts
… …
blogs / news
Knowledge Sharing CommunitiesThe power of “Big Data”
Crowd-based Approach
Traditional
Approaches
Peer-based
Approaches
SourceForge GitHub ohloh Softpedia StackOverflowScope
Quality
Crowd-based Approach: Step II
Fundamental
Approaches
Peer-based
Approaches
Human-CentricApproaches for Mining
Engineering Data
Approaches for Mining
Community Data
Data-Centric
Trustie Project
National High-Tech Development Plan (863 Program)
National Trustworthy Software Resource Sharing and
Cooperating Production Environment
Contents
•
Motivation
•
Approach
•
Application
Software Trustworthiness
The history of Linux
suggested a surprising
theories about software
engineering.
“Given enough eyeballs,
all bugs are shallow”
Software Trustworthiness
Open source software
gives us a new sense
“Trustworthiness of
Internet-based software is
hidden in the big data”
Engineering Data + Community Data
Data-centric Innovation Cycle
Software
Data
Crowd-based Creation Crowd-based Evolution Crowd-based Construction
Principles of the Crowd Method
Research Issues on Software “Big Data”
Internet Software Communities Mass Collaboration Open ResourceHow to find the software more accurately across
the various Internet
How to locate the
trustworthy software artifacts in Internet
Data Analysis
How to support the engineers and crowds to
collaborate in large scale development ?
How to enable the crowd development for
the industrial software production ?
How to evaluate the
contribution of the developers in
projects ?
How to evaluate the
trustworthiness of software artifacts ?
Results on Data Analysis
Developers’ productivity plateaus within 6-7 months in small and medium projects and it takes up to 12 months in large projects.
Results on Data Analysis
The crowds can find interesting
projects The crowds can collaborate with
New Results on Mass Collaboration
• Similarity of the texts of bugs and posts
Text
• The time when the issues and Q & A are published
Time
• Co-occurred users in the two communities
Co-occurred
users
StackOverflow
Q&A Community
Android
New Results on Mass Collaboration
Classifier
Top-N
0.42 0.23 0.17 0.12Coder Reviewer
Prediction
New Results on Resource Sharing
Fine grained, efficient software resource
classification for Crowd generated artifacts
Ohloh
Freecode
Aggregation of online descriptions Hierarchical Classifier Software CommunitiesPlatform and Practices
Trustie Software Resource Sharing Toolset
Trustie Software Data Storage and Analysis Toolset Development
Environment Trustie Collaborative Development Toolset Application
Practices
Application in Large Scale Software Industries
Neusoft Careland Wonders group
Digital China
Common Application Modes and Platforms
Enterprise Version Community Version Education Version Application in Mission Critical Systems Space flight Electricity Flight control Defense Software Communities Component-based SPL Service-oriented SPL Heterogeneous SPL Runtime-monitoring SPL Third-party SPL Third-party SPL
Contents
•
Motivation
•
Approach
•
Application
Is the Crowd Method
Critical
information
systems
Software
engineering
education
Software
industries
Application in Internet Communities
•
Collaboration Community
– more than 32,000 users
– more than 1,500 projects
– users and projects can be
analyzed comprehensively
•
Sharing Community
– various kinds of software
resources
Application in Software Industries
Trustie are imported into more than 10 software companies in China,
Trustie supported the
development of 8 health care information systems in Neusoft.
Software reusability increases 75%;
productivity increases 65%
Neusoft Corporation
Digital China set up the industrial SPL for trustworthy taxation
software development.
Software reusability increases 60%; # of bugs decreases 20%
Application in Universities
Course project Course Course Course project project project Interests CollaborationMOOC 2.0
Application in Universities
Project Hosting Version control Issue tracking Project profile Forum/wiki http://forge.trustie.net http://course.trustie.net Course Hosting Course management Member management Exercise monitoring Resource management http://contest.trustie.net Contest Hosting Contest publishing Submission of works Discussion RankingFuture Work
•
Application of Trustie
Technologies
–
MOOP, MOOC 2.0
–
Software engineering education
–
Software garden and industries
•
Research on the Crowd Method
– Data-driven collaborative development
– Data-driven software resource sharing
– Data-driven trustworthiness analysis
Data Mining Software Engineering Network Analysis Critical System Industry Education
Thank You !
Questions ?
http://forge.trustie.net
2014
中南大学
—
英特尔