• No results found

Project Participants

N/A
N/A
Protected

Academic year: 2021

Share "Project Participants"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Annual Report for Period:10/2006 - 09/2007 Submitted on: 08/15/2007

Principal Investigator: Yang, Li . Award ID: 0414857

Organization: Western Michigan Univ Title:

Projection and Interactive Exploration of Large Relational Data

Project Participants Senior Personnel

Name: Yang, Li

Worked for more than 160 Hours: Yes Contribution to Project:

Post-doc

Graduate Student

Name: Sanver, Mustafa

Worked for more than 160 Hours: Yes Contribution to Project:

Mustafa Sanver is a Ph.D. student who has been working on both data visualization and database components. Name: Zhao, Dongfang

Worked for more than 160 Hours: Yes Contribution to Project:

Dongfang Zhao worked on the data embedding component and was supported in the 2005-06 academic year. Name: Hua, Danyang

Worked for more than 160 Hours: Yes Contribution to Project:

Danyang Hua works on the database component and has been supported since 2007. Undergraduate Student

Technician, Programmer Other Participant

Research Experience for Undergraduates

Organizational Partners Other Collaborators or Contacts

Activities and Findings Research and Education Activities:

(2)

This research project consists of three technical components: data visualization, database support, and data embedding. Please refer to the Publications section for related references.

In the data visualization component, we have been developing a visualization tool [Yang & Sanver TVCG'07] that takes multi-resolution aggregated data as data input. Two interactive visualization techniques, density-based parallel coordinates and footprint splatting with grand tour, have been extended to support the rendering of data aggregated at multiple resolutions. The tool supports overview-and-drill-down of large relational data and allows users to interactively specify subsets of data for further visualization, possibly at more detailed resolutions. The visualization tool, with further development, can be used by industry and other agencies for scalable interactive data visualization and

exploration. We have also developed techniques for pruning and visualizing frequent itemsets and many-to-many association rules [Yang TKDE'03]. Future work in this component includes usability study and better GUI design.

In the database component, we have studied multi-resolution data aggregation as a common representation of data between database and visualization tools [Yang & Sanver TVCG'07]. Data aggregated at multiple resolutions are piggybacked onto internal nodes of a k-d-B tree. The k-d-B tree structure is extended to improve query performance and node fan-outs while keeping data aggregation information. We have conducted experiments on both synthetic and real world data sets. Performances of data access and index maintenance have been tested. Future work includes study of better indexing mechanism and data mining techniques using multi-resolution data aggregation as input.

In the data embedding component, four methods (k-MST [Yang ICPR'04], min-k-ST [Yang TPAMI'05], k-EC [Yang PRL'05], and k-VC [Yang SIGKDD'05, Yang TPAMI'06]) were proposed to build connected neighborhood graphs for robust and reliable dimensionality

reduction. A new locally isometric embedding method LMDS [Yang ICPR'06, Yang TPAMI'07] is discovered. Incremental methods [Zhao & Yang ICPR'06, Zhao & Yang TPAMI'07] have been developed for neighborhood graph construction and projection of large data and data streams. Future work includes systemization and evaluation of existing data embedding methods.

Assessment of Project's Status:

Most activities of this project, as defined in the research and education plan in the original proposal in 01/2004 and revised in 07/2004, have been completed. Compared with the original research objectives, the following list highlights some of the major accomplishments:

1. To support interactive exploration of large relational data, we have studied multi-resolution data aggregation and used high dimensional partition-based tree index to piggyback the aggregated data as an intermediate representation of large relational data for interaction visualization.

2. Two visualization techniques, footprint splatting with grand tour and parallel coordinates, have been extended to visualize multidimensional aggregated data.

3. A client-server visualization tool has been developed to demonstrate the feasibility and effectiveness of this approach in multiresolution visualization of large relational data. Multiple visualization clients can get data from a data server using TCP/IP connections. The feature-rich visualization tool supports many graphical user interactions, including overview-and-drill-down by allowing users to interactively specify subsets of data for further visualization. Software design allows easy integration of new data visualization techniques into the tool.

4. Four methods were proposed to build connected neighborhood graphs for data embedding. A locally isometric embedding method LMDS is proposed. Incremental methods have been developed for projection of large data sets and data streams.

The following summarizes our ongoing work. These are what we expect to accomplish during the No-Cost Extension period: 1. Better GUI and query interface design; documentation for end users and developers;

2. Conducting user study;

3. Performance experiments on large data sets up to a few terabytes;

4. Data clustering and other data mining algorithms using multi-resolution data aggregation as data input; 5. Ongoing data embedding research.

(3)

Visual exploration of large relational data poses fundamental challenges to both data visualization and database management systems. A major finding of this project is the density-based methodology to interactively explore large relational data sets. It uses multi-resolution data

aggregation as a common representation of data between relational databases and visualization tools. Data aggregated at multiple resolutions are stored in internal nodes of a partition-based high dimensional tree index. Such a piggyback ride of aggregated data supports the

overview-and-drill-down data access pattern for interactive data exploration. It has build-in support for visual interaction and data scalability. Existing visualization techniques are extended to support this data representation.

In addition, the proposed multi-resolution data representation has potential applications to accelerate data aggregation queries and OLAP queries. It can be used as data input for efficient mining of large data sets. It also provides support for privacy preservation where permissions can be granted to users based on data resolutions. New techniques and algorithms in these areas are parts of our ongoing research.

We have developed a set of algorithms and methods for nonlinear data embedding and dimensionality reduction. This research calls for new ideas from differential geometry and may have fundamental impacts on multivariate data mining and data processing. This is an important part of our ongoing work.

Training and Development:

Three Ph.D. students (Mustafa Sanver, Dongfang Zhao, and Danyang Hua) are supported by this grant. M.S. students doing thesis work have also greatly benefited from the research supported by this grant.

Parts of this research are used in two courses (CS6430 - Advanced DBMS and CS6030 - Knowledge Discovery and Data Mining) taught at the Department of Computer Science, Western Michigan University (2005 - 2007).

Outreach Activities:

We have established collaborations with the Department of Business Information Systems, College of Business and the Department of Educational Leadership, Research and Technology, College of Education. Through such collaborations, we expect to have access to real world data sets and applications and to conduct user study with participation from students with diversified backgrounds.

Journal Publications

Li Yang, "Distance-preserving projection of high dimensional data for nonlinear dimensionality reduction", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1243, vol. 26, (2004). Published, 10.1109/TPAMI.2004.66

Li Yang, "Pruning and visualizing generalized association rules in parallel coordinates", IEEE Transactions on Knowledge and Data Engineering, p. 60, vol. 17, (2005). Published, 10.1109/TKDE.2005.14

Li Yang, "Building k-edge-connected neighborhood graphs for distance-based data projection", Pattern Recognition Letters, p. 2, vol. 26, (2005). Published, 10.1016/j.patrec.2005.03.021

Li Yang, "Building k edge-disjoint spanning trees of minimum total length for isometric data embedding", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 16, vol. 27, (2005). Published, 10.1109/TPAMI.2005.192

Li Yang, "Data embedding techniques and applications", Proceedings of the 2nd International Workshop on Computer Vision meets Databases (CVDB'2005), Baltimore, MD, June 2005, p. 29, vol. , (2005). Published, 10.1145/1160939.1160948

Li Yang, "Building connected neighborhood graphs for isometric data embedding", Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'2005), Chicago, IL, August 2005, p. 722, vol. , (2005). Published,

10.1145/1081870.1081963

Li Yang, "Building k-connected neighborhood graphs for isometric data embedding", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 827, vol. 28, (2006). Published, 10.1109/TPAMI.2006.89

Li Yang, "Alignment of overlapping locally scaled patches for multidimensional scaling and dimensionality reduction", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. , vol. , (2007). Accepted, 10.1109/TPAMI.2007.70706

(4)

Li Yang, " k-edge connected neighborhood graph for geodesic distance estimation and nonlinear data projection", Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), Cambridge, UK, August 2004, p. 196, vol. 1, (2004). Published,

10.1109/ICPR.2004.1334057

Li Yang, "Sammon's nonlinear mapping using geodesic distances", Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), Cambridge, UK, August 2004, p. 303, vol. 2, (2004). Published, 10.1109/ICPR.2004.1334180

Dongfang Zhao, Li Yang, "Incremental construction of neighborhood graphs for nonlinear dimensionality reduction", Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, August 2006, p. 177, vol. 3, (2006). Published,

10.1109/ICPR.2006.707

Li Yang, "Building connected neighborhood graphs for locally linear embedding", Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, August 2006, p. 194, vol. 4, (2006). Published, 10.1109/ICPR.2006.345

Li Yang, "Locally multidimensional scaling for nonlinear dimensionality reduction", Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, August 2006, p. 202, vol. 4, (2006). Published, 10.1109/ICPR.2006.774

Dongfang Zhao, Li Yang, "Incremental isometric embedding of high dimensional data using connected neighborhood graphs", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. , vol. , (2007). Submitted,

Li Yang, Mustafa Sanver, "Multiresolution data aggregation for visual exploration of large relational data", IEEE Transactions on Visualization and Computer Graphics, p. , vol. , (2007). Submitted,

Books or Other One-time Publications

Li Yang, "Data projection techniques and their application in sensor array data processing", (2005). Book chapter, Published Editor(s): Mehmed Kantardzic, Jozef Zurada

Collection: Next Generation of Data Mining Applications Bibliography: pages 57-77, Wiley-IEEE Press

Li Yang, Tosiyasu L. Kunii, "Visual database", (2007). Book chapter, Submitted Editor(s): Benjamin Wah, Jeffrey Tsai

Collection: Wiley Encyclopedia of Computer Science and Engineering Bibliography: John Wiley & Sons Inc

Web/Internet Site URL(s):

http://www.cs.wmich.edu/~yang Description:

A dedicated web site will be setup once we finish the development of the first release of the software tool. Other Specific Products

Contributions Contributions within Discipline:

We have devised multi-resolution data aggregation and have used high dimensional partition-based tree index to piggyback the data aggregated at multiple resolutions as an intermediate representation of large relational data for interactive visualization.

(5)

aggregated data.

A client/server visualization tool is developed to demonstrate the feasibility and effectiveness of this approach. We have developed an approach to visualize generalized association rules in parallel coordinates.

In data embedding, a set of algorithms and methods are developed for building connected neighborhood graphs and for locally isometric data embedding. Incremental methods are developed to project large data sets and data streams.

Contributions to Other Disciplines:

The proposed multi-resolution data representation has potential applications in: (1) optimization of traditional database queries such as data aggregation queries and OLAP queries; (2) efficient mining of large data sets; (3) privacy-preserving data mining.

Contributions to Human Resource Development:

Three Ph.D. students (Mustafa Sanver, Dongfang Zhao, and Danyang Hua) are supported by this grant. The PI and the students have gained great research experience in working on this project. M.S. students doing thesis work have also greatly benefited from the research supported by this grant.

Contributions to Resources for Research and Education:

Parts of this research are used in two courses (CS6430 - Advanced DBMS and CS6030 - Knowledge Discovery and Data Mining) taught at the Department of Computer Science, Western Michigan University (2005 - 2007). Students in these courses have benefited from the results of this research.

Contributions Beyond Science and Engineering:

Special Requirements Special reporting requirements: None

Change in Objectives or Scope: None

Unobligated funds: less than 20 percent of current funds Animal, Human Subjects, Biohazards: None

Categories for which nothing is reported: Organizational Partners

Any Product

References

Related documents

AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination... empower  people  through

hydropower, gas and diesel/HFO), monitoring of the calculations will have to be done every year to justify or re-calculate the estimated CERs. v) It is important for a country

Furthermore, the body has the potential to catch and learn such behavior, allowing the guest to acquire the habits of hospitality from the host, thus fos- tering not only mutuality

Counterfactual simulations reveal the impact of design parameters on partic- ipation and quality outcomes: multiple prizes discourage stronger participants and encourage

Removal Debonded, Deteriorated Existing Variable Thickness Concrete Overlay shall be measured as the actual square yards (square meters) of marked removal areas defined in 848.19,

Undergraduate accounting lessons are sufficient to understand the accounting procedures and financial structures of the customer companies and recommendations with

Spletna aplikacija nam je zagotovila pridobivanje podatkov študentov, saj pomeni testni sistem s podatkovno bazo, kjer so shranjeni podatki študentov, vpisanih na fakulteti.

After this literature review on the application of DNN for partial discharge recognition or/and classification, it was found that these techniques are more accurate than the