Mining ontology and structural knowledge - Data Mining Patterns New Methods And Applications Pa

An increasing number of studies are done that use search engines. In natural language processing research, many systems have begun using search engines. For example, Keller, Lapata, and Ourioupina (2002) use the Web to obtain frequencies for unseen bigrams in a given corpus. They count for adjective-noun, noun-noun, and verb- object bigrams by querying a search engine, and demonstrate that Web frequencies (Web counts) correlate with frequencies from a carefully edited corpus such as the British National Corpus (BNC). Aside from counting bigrams, various tasks are attainable using Web-based models: spelling correction, adjective ordering, compound noun bracketing, countability detection, and so on (Lapata & Keller, 2004). For some tasks, simple unsupervised models perform better when n-gram frequencies are obtained from the Web rather than a standard large corpus: the web yields better counts than BNC.

Some studies have used a search engine to extract relational knowledge from among entities, thereby harnessing the ontology of a target domain. For example, the relation between a book and an author can be extracted through putting a query to a search engine using the names of the book and the (possible) author, analyzing the text, and determining whether the relation is recogniz- able. In addition, the pattern which describes an entity and its class is identifiable through a search engine. The popularly known pattern is called Hearst pattern, which include “A such as B” and “B is a (kind of) A”: We can infer that A is a class of B if many mentions exist in these patterns. Although this approach is heuristic-based, an

important study could be made toward obtaining patterns using supervised / unsupervised learning. Various patterns that describe a specific kind of relation and how to obtain such patterns are important issues.

Recognizing relations among entities is a necessary ingredient for advanced Web systems, including question answering, trend detection, and Web search. In the future, there will increasingly be studies that use search engines to obtain structural knowledge from the web. A search engine can be considered as a database interface for a machine with the huge amount of global information on social and linguistic activities.

conclusIon

This chapter describes a social network mining approach using the Web. Several studies have addressed similar approaches. We organize those methods into small pseudocodes. POLYPHONET, which was implemented using several algorithms described in this chapter, was put into service at JSAI conferences over three years and at the UbiComp conference. We also discuss important issues including entity recognition, social network analysis, and applications. Lastly, future trends toward general-purpose social network extraction and structural knowledge extraction are described.

Merging the vast amount of information on the Web and producing higher-level information might foster many knowledge-based systems of the future. Acquiring knowledge through Googling (Cimiano, 2004) is an early work for this concept. Increasing numerous studies of the last few years have been conducted using search engines for these. More studies in the future will use search engines as database interfaces for ma- chines and humans to the world’s information.

references

Adamic, L., Buyukkokten, O., & Adar, E. (2003).

Social network caught in the web. First Monday,

8(6).

Aleman-Meza, B., Nagarajan, M., Ramakrishnan,

C., Sheth, A., Arpinar, I., Ding, L., Kolari, P., Joshi, A., & Finin, T. (2006). Semantic analytics on social networks: Experiences in addressing

the problem of conflict of interest detection. In Proceedings of the WWW2006.

Anagnostopoulos, A., Broder, A. Z., & Carmel, D. (2005). Sampling search-engine results. In

Proceedings of WWW 2005.

Bekkerman, R., & McCallum, A. (2005). Disam- biguating web appearances of people in a social

network. In Proceedings of WWW 2005.

Bollegara, D., Matsuo, Y., & Ishizuka, M. (2006). Extracting key phrases to disambiguate

personal names on the Web. In Proceedings of

ECAI 2006.

Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In

Proceeings of7th WWW Conf.

Cafarella, M., & Etzioni, O. (2005). A search

engine for natural language applications. In Proc.

WWW2005.

Calishain, T., & Dornfest, R. (2003). Google hacks:

100 industrial-strength tips & tools. O’Reilly.

Chakrabarti, S. (2002). Mining the web: Discov-

ering knowledge from hypertext data. Morgan Kaufmann.

Cimiano, P., Handschuh, S., & Staab, S. (2004).

Towards the self-annotating web. In Proceedings

of WWW2004.

Cimiano, P., Ladwig, G., & Staab, S. (2005). Gimme’ the context: Context-driven automatic

Semantic annotation with C-PANKOW. In Pro-

ceedings of WWW 2005.

Cimiano, P., & Staab, S. (2004). Learning by

googling. SIGKDD Explorations, 6(2), 24–33.

Culotta, A., Bekkerman, R., & McCallum, A. (2004). Extracting social networks and contact in-

formation from email and the web. In CEAS-1.

Davis, I., & Jr., E. V. Relationship: A vocabulary for describing relationships between people. http://vocab.org/relationship/

Faloutsos, C., McCurley, K. S., & Tomkins, A. (2004). Fast discovery of connection subgraphs. In Proceedings of the ACM SIGKDD 2004. Finin, T., Ding, L., & Zou, L. (2005). Social

networking on the Semantic Web. The Learning

Organization, 12(5), 418-435.

Freeman, L. C. (1979). Centrality in social net-

works: Conceptual clarification. Social Networks,

1, 215–239.

Gandon, F., Corby, O., Giboin, A., Gronnier, N., & Guigard, C. (2005). Graph-based inferences in a Semantic web server for the cartography of

competencies in a telecom valley. In Proceedings

of ISWC05.

Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2003). Learning probabilistic models of link

structure. Journal of Machine Learning Research,

3, 679–707.

Goecks, J., & Mynatt, E. D. (2004). Leveraging

social networks for information sharing. In Pro-

ceedings of ACM CSCW 2004.

Golbeck, J., & Hendler, J. (2004a). Accuracy of metrics for inferring trust and reputation in Se-

mantic web-based social networks. In Proceedings

of EKAW 2004.

Golbeck, J., & Hendler, J. (2004b). Inferring

reputation on the Semantic web. In Proceedings

Golbeck, J., & Hendler, J. (2005). Inferring trust

relationships in web-based social networks. ACM

Transactions on Internet Technology, 7(1). Golbeck, J., & Parsia, B. (2006). Trust network-

based filtering of aggregated claims. International Journal of Metadata, Semantics and Ontolo- gies.

Guha, R., & Garg, A. (2004). Disambiguating

entities in Web search. TAP project, http://tap.

stanford.edu/PeopleSearch.pdf

Hanneman, R., & Riddle, M. (2005). Introduc-

tion to social network methods. University of California, Riverside.

Harada, M., Sato, S., & Kazama, K. (2004). Finding authoritative people from the Web. In

Proceedings of the Joint Conference on Digital Libraries (JCDL2004).

Haveliwala, T. (2002). Topic-sensitive PageRank. In Proceedings of WWW2002.

Jin, Y., Matsuo, Y., & Ishizuka, M. (2006). Ex- tracting a social network among entities by web

mining. In Proceedings of the ISWC ‘06 Workshop

on Web Content Mining with Human Language

Techniques.

Kamvar, S., Schlosser, M., & Garcia-Molina, H. (2003). The eigentrust algorithm for reputation

management in P2P networks. In Proceedings

of WWW2003.

Kautz, H., Selman, B., & Shah, M. (1997). The

hidden web. AI magazine, 18(2), 27–35.

Keller, F., Lapata, M., & Ourioupina, O. (2002). Using the web to overcome data sparseness. In

Proceedings of the EMNLP 2002.

Knees, P., Pampalk, E., & Widmer, G. (2004).

Artist classification with web-based data. In

Proceedingsof the 5th International Conference

on Music Information Retrieval (ISMIR).

Kumar, R., Raghavan, P., Rajagopalan, S., &

Tomkins, A. (2002). The web and social networks.

IEEE Computer, 35(11), 32–36.

Lapata, M., & Keller, F. (2004). The web as a baseline: Evaluating the performance of unsupervised web-based models for a range of nlp tasks. In Proceedings of theHLT-NAACL 2004. Leskovec, J., Adamic, L. A., & Huberman, B. A. (2005). The dynamics of viral marketing. http://www.hpl.hp.com/research/idl/papers/viral/viral.pdf

Li, X., Morie, P., & Roth, D. (2005). Semantic integration in text: From ambiguous names to

identifiable entities. AI Magazine Spring, pp. 45–68.

Lloyd, L., Bhagwan, V., Gruhl, D., & Tomkins,

A. (2005). Disambiguation of references to in-

dividuals (Tech. Rep. No. RJ10364(A0410-011)).

IBM Research.

Malin, B. (2005). Unsupervised name disam-

biguation via social network similarity. Workshop

Notes on Link Analysis, Counterterrorism, and Security.

Mann, G. S., & Yarowsky, D. (2003). Unsupervised

personal name disambiguation. In Proceedings

of the CoNLL.

Manning, C., & Schutze, H. (2002). Foundations of

statistical natural language processing. London:

The MIT Press.

Massa, P., & Avesani, P. (2005). Controversial us- ers demand local trust metrics: An experimental

study on epinions.com community. In Proceed-

ings of the AAAI-05.

Matsuo, Y., Hamasaki, M., Takeda, H., Mori, J., Bollegala, D., Nakamura, Y., Nishimura, T., Hasida, K., & Ishizuka, M. (2006a). Spinning multiple social networks for Semantic web. In

Matsuo, Y., Mori, J., Hamasaki, M., Takeda, H., Nishimura, T., Hasida, K., & Ishizuka, M. (2006b). POLYPHONET: An advanced social

network extraction system. In Proceedings of

the WWW 2006.

Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2003). Mining social network of conference

participants from the web. In Proceedings of the

Web Intelligence 2003.

Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2004a). Finding social network for trust cal-

culation. In Proc. ECAI2004, pages 510–514.

Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2004b). Finding social network for trust calcu-

lation. In Proceedings of the 16th European Con-

ference on Artificial Intelligence (ECAI2004). Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2005). Social network extraction from the web

information. Journal of the Japanese Society for

Artificial Intelligence, 20(1E), 46–56.

Mika, P. (2005a). Flink: Semantic web technology for the extraction and analysis of social networks.

Journal of Web Semantics, 3(2).

Mika, P. (2005b). Ontologies are us: A unified

model of social networks and Semantics. In

Proceedings of the ISWC2005.

Miki, T., Nomura, S., & Ishida, T. (2005). Semantic web link analysis to discover social relationship

in academic communities. In Proceedings of the

SAINT 2005.

Milgram, S. (1967). The small-world problem.

Psychology Today, 2, 60–67.

Mori, J., Ishizuka, M., Sugiyama, T., & Matsuo, Y. (2005a). Real-world oriented information shar-

ing using social networks. In Proceedings of the

ACM GROUP ‘05.

Mori, J., Matsuo, Y. & Ishizuka, M. (2005b). Finding user Semantics on the web using word

co-occurrence information. In Proceedings of

the International Workshop on Personalization

on the Semantic Web (PersWeb ‘05).

Nakagawa, H., Maeda, A., & Kojima, H. Au- tomatic term recognition system termextract.

http://gensen.dl.itc.utokyo.ac.jp/gensenweb\_eng. html .

Nigram, K., McCallum, A., Thrun, S., & Mitch-

ell, T. (2000). Text classification from labeled

and unlabeled documents using EM. Machine

Learning, 39, 103–134.

Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814.

Pentland, A. S. (2005). Socially aware computation and communication. IEEE Computer.

Quinlan, J. R. (1993). C4.5: Programs for machine

learning. CA: Morgan Kaufmann.

Richardson, M., Agrawal, R., & Domingos, P. (2003). Trust management for the Semantic Web. In Proceedings of the ISWC2003.

Scott, J. (2000). Social network analysis: A hand- book (2nd ed.). SAGE publications.

Staab, S., Domingos, P., Mika, P., Golbeck, J., Ding, L., Finin, T., Joshi, A., Nowak, A., & Val- lacher, R. (2005). Social networks applied. IEEE

Intelligent Systems, 80–93.

Tyler, J., Wilkinson, D., & Huberman, B. (2003).

Email as spectroscopy: Automated discovery of community structure within organizations.

Kluwer, B.V.

Wacholder, N., Ravin, Y., & Choi, M. (1997). Disambiguation of proper names in text. In Pro- ceedings of the 5th Applied Natural Language Processing Conference.

Wasserman, S., & Faust, K. (1994). Social network analysis. Methods and applications. Cambridge: Cambridge University Press.

Watts, D., & Strogatz, S. (1998). Collective dynamics of small-world networks. Nature, 393, 440–442.

endnotes

1 _{http://www.friendster.com/} 2 _{http://www.orkut.com/} 3 _{http://www.imeem.com/} 4 _{http://360.yahoo.com/}

5 http://flink.Semanticweb.org/. The system

won the first prize at the Semantic Web Challenge in ISWC2004.

6 _{As of October, 2005 by Google search}

engine. The hit count is that obtained after omission of similar pages by Google.

7 _{Using the disaster mitigation research com-}

munity in Japan.

8 _{We use an}_entity_{as a broader term of a} person.

9 _{http://www.google.com/} 10 _{As of 2004.}

Chapter VII

Discovering Spatio-Textual

In document Data Mining Patterns New Methods And Applications Pascal Poncelet (2008) pdf (Page 188-193)