• No results found

We have presented a new subspace clustering mining algorithm to find weighted dense maximal 1-complete regions in high dimensional datasets. Our algorithm is very memory efficient, since it does not need to keep all the clus- ters found so far in the memory. Unlike other density mining algorithms which tend to find only patterns in the dense subspaces while ignore patterns in less dense subspaces, our algorithm finds clusters in subspaces of all densities. Our experiments showed that our algorithm is more efficient thanCLOSET+ from both time complexity and memory consumption perspectives.

References

1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, Proceed- ings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’98), ACM, New York, June 1998, 94–105

2. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules Between Sets of Items in Large Databases, Proceedings of the 1993 ACM SIGMOD Inter- national Conference on Management of data (SIGMOD’93), ACM, New York, May 1993, 207–216

3. Agrawal, R., Srikant, R.:Fast Algorithms for Mining Association Rules, Morgan Kaufmann, Los Altos, CA, 1998, 580–592

4. Blake, C., Merz, C.: UCI Repository of machine learning databases, http://www.ics.uci.edu/mlearn/MLRepository.html, 1998

5. Ganter, B., Kuznetsov, S.O.: Stepwise Construction of the Dedekind–MacNeille Completion,Proceedings of Sixth International Conference on Conceptual Struc- tures (ICCS’98), August 1998, 295–302

6. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations, Springer, Berlin Heidelberg New York, 1999

7. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules, Proceeding of the Seventh International Con- ference on Database Theory (ICDT’99), Springer, Berlin Heidelberg New York, January 1999, 398–416

8. Peeters, R.: The maximum edge biclique problem is NP-complete, Discrete Applied Mathematics,131, 2003, 651–654

9. Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, May 2000, 21–30

10. Roman, T., Natalie, F., et al.: The COG database: an updated version includes eukaryotes, BMC Bioinformatics,4, September 2003

11. Wang, J., Han, J., Pei, J.: CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), ACM, New York, 2003, 236–245

12. Zaki, M.J., Hsiao, C.J.: Charm: an Efficient Algorithm for Closed Itemset Min- ing, Proceedings of the Second SIAM International Conference on Data Min- ing (SDM’04), April 2002

Chun-Hao Chen1, Tzung-Pei Hong2, and Vincent S. Tseng3 1 Department of Computer Science and Information Engineering,

National Cheng-Kung University, Tainan, Taiwan, ROC [email protected]

2

Department of Electrical Engineering, National University of Kaohsiung [email protected]

3 Department of Computer Science and Information Engineering, National Cheng-Kung University, Tainan, Taiwan, ROC [email protected]

Summary. In this chapter, we propose a mining algorithm based on angles of ad-

jacent points in a time series to find linguistic trends. The proposed approach first transforms data values into angles, and then uses a sliding window to generate con- tinues subsequences from angular series. Several fuzzy sets for angles are predefined to represent semantic concepts understandable to human being. The a priori-like fuzzy mining algorithm is then used to generate linguistic trends. Appropriate post- processing is also performed to remove redundant patterns. Finally, experiments are made for different parameter settings, with experimental results showing that the proposed algorithm can actually work.

1 Introduction

Time-series data are commonly seen around our daily life. They are the data recorded at each time interval. For example, the stock prices evolving over a period of time are an example of a time series. Many sets of data in the fields like telecommunication, bioinformatics and medical treatment, are time series data.

Finding useful patterns from time-series data has recently become an im- portant issue for researchers in the data-mining fields. Indyk et al. focused on the problem of identifying representative trends, such as relaxed periods and average trends over a period of observations in time series [8]. They first gen- erated a template set of sketches by using polynomial convolution, where each sketch is a low dimensional vector. The sketches were then used to replace each interval to find representative trends. Patel et al. proposed a method based on Euclidean distance to findk-motifs, which mean frequently occurring patterns in time series [11]. They first normalized time series data, and then used ap- proximated piecewise aggregation to reduce data dimension [9, 15]. After the

C.-H. Chen et al.: Mining Linguistic Trends from Time Series, Studies in Computational Intelligence (SCI)118, 49–60 (2008)

data dimension was reduced, they further transformed the data into a discrete representation and minedk-motifs from the transformed time series. Agrawal et al. proposed an algorithm to capture the shapes from historical time-series database by using a simple translation [2]. They first transformed the differ- ence value of every two adjacent data points into a predefined category, such as increase, steep increase, steep decrease, decrease, no-change, and zero. The same time series may be labeled more than one category. In other words, the intervals among these categories have overlapped a little. The transformed symbolic series were then used for querying desired results.

Most of the above approaches, however, usually require predefined crisp intervals for each category. It thus needs domain knowledge and depends on applications. Udechukwu et al. thus proposed a domain-independent trend- encoding method to mine frequent trends [13]. They transformed the difference value between two adjacent data points into an angle, instead of the difference value itself. The angles lay within the range900to 900, and were partitioned

into 52 predefined angular categories, represented by letters. They then used the data structure of suffix trees to find the maximally repeated patterns as frequent trends. In this way, the effect of the domain knowledge could be reduced. Their approach, however, had too many angular categories, which might cause users hard to understand the meaning of the patterns easily.

As to fuzzy data mining, Hong et al. proposed several fuzzy mining al- gorithms to mine linguistic association rules from quantitative data [6, 7, 10]. They transformed each quantitative item into a fuzzy set and used fuzzy oper- ations to find fuzzy rules. Their approaches, however, focused on transaction data. For time-series data, Song et al. proposed a fuzzy stochastic time series and built a model by assuming the values are fuzzy sets [12]. Chen et al. pro- posed a two-factor time-variant fuzzy time-series model to deal with forecast- ing problems [4]. Au and Chan proposed a fuzzy mining approach to find fuzzy rules for classifying time-series [1]. Watanabe exploited the Takagi–Sugeno model to build a time-series model [14].

In this chapter, we thus propose a mining algorithm based on angles of adjacent points in a time series to find linguistic trends. Several fuzzy sets for angles are predefined to represent semantic concepts understandable to human being. The a priori-like fuzzy mining algorithm is then used to generate linguistic trends. Appropriate post-processing is also performed to remove redundant patterns. Since the final results are represented by linguistic terms, they will be friendlier to human than quantitative representation.