An increasing number of studies are done that use search engines. In natural language process- ing research, many systems have begun using search engines. For example, Keller, Lapata, and Ourioupina (2002) use the Web to obtain frequen- cies for unseen bigrams in a given corpus. They count for adjective-noun, noun-noun, and verb- object bigrams by querying a search engine, and demonstrate that Web frequencies (Web counts) correlate with frequencies from a carefully edited corpus such as the British National Corpus (BNC). Aside from counting bigrams, various tasks are attainable using Web-based models: spelling correction, adjective ordering, compound noun bracketing, countability detection, and so on (Lapata & Keller, 2004). For some tasks, simple unsupervised models perform better when n-gram frequencies are obtained from the Web rather than a standard large corpus: the web yields better counts than BNC.
Some studies have used a search engine to extract relational knowledge from among enti- ties, thereby harnessing the ontology of a target domain. For example, the relation between a book and an author can be extracted through putting a query to a search engine using the names of the book and the (possible) author, analyzing the text, and determining whether the relation is recogniz- able. In addition, the pattern which describes an entity and its class is identifiable through a search engine. The popularly known pattern is called Hearst pattern, which include “A such as B” and “B is a (kind of) A”: We can infer that A is a class of B if many mentions exist in these patterns. Although this approach is heuristic-based, an
important study could be made toward obtaining patterns using supervised / unsupervised learn- ing. Various patterns that describe a specific kind of relation and how to obtain such patterns are important issues.
Recognizing relations among entities is a necessary ingredient for advanced Web systems, including question answering, trend detection, and Web search. In the future, there will increasingly be studies that use search engines to obtain struc- tural knowledge from the web. A search engine can be considered as a database interface for a machine with the huge amount of global informa- tion on social and linguistic activities.
conclusIon
This chapter describes a social network mining approach using the Web. Several studies have addressed similar approaches. We organize those methods into small pseudocodes. POLYPHONET, which was implemented using several algorithms described in this chapter, was put into service at JSAI conferences over three years and at the UbiComp conference. We also discuss important issues including entity recognition, social net- work analysis, and applications. Lastly, future trends toward general-purpose social network extraction and structural knowledge extraction are described.
Merging the vast amount of information on the Web and producing higher-level information might foster many knowledge-based systems of the future. Acquiring knowledge through Googling (Cimiano, 2004) is an early work for this concept. Increasing numerous studies of the last few years have been conducted using search engines for these. More studies in the future will use search engines as database interfaces for ma- chines and humans to the world’s information.
references
Adamic, L., Buyukkokten, O., & Adar, E. (2003).
Social network caught in the web. First Monday,
8(6).
Aleman-Meza, B., Nagarajan, M., Ramakrishnan,
C., Sheth, A., Arpinar, I., Ding, L., Kolari, P., Joshi, A., & Finin, T. (2006). Semantic analytics on social networks: Experiences in addressing
the problem of conflict of interest detection. In Proceedings of the WWW2006.
Anagnostopoulos, A., Broder, A. Z., & Carmel, D. (2005). Sampling search-engine results. In
Proceedings of WWW 2005.
Bekkerman, R., & McCallum, A. (2005). Disam- biguating web appearances of people in a social
network. In Proceedings of WWW 2005.
Bollegara, D., Matsuo, Y., & Ishizuka, M. (2006). Extracting key phrases to disambiguate
personal names on the Web. In Proceedings of
ECAI 2006.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In
Proceeings of7th WWW Conf.
Cafarella, M., & Etzioni, O. (2005). A search
engine for natural language applications. In Proc.
WWW2005.
Calishain, T., & Dornfest, R. (2003). Google hacks:
100 industrial-strength tips & tools. O’Reilly.
Chakrabarti, S. (2002). Mining the web: Discov-
ering knowledge from hypertext data. Morgan Kaufmann.
Cimiano, P., Handschuh, S., & Staab, S. (2004).
Towards the self-annotating web. In Proceedings
of WWW2004.
Cimiano, P., Ladwig, G., & Staab, S. (2005). Gimme’ the context: Context-driven automatic
Semantic annotation with C-PANKOW. In Pro-
ceedings of WWW 2005.
Cimiano, P., & Staab, S. (2004). Learning by
googling. SIGKDD Explorations, 6(2), 24–33.
Culotta, A., Bekkerman, R., & McCallum, A. (2004). Extracting social networks and contact in-
formation from email and the web. In CEAS-1.
Davis, I., & Jr., E. V. Relationship: A vocabulary for describing relationships between people. http://vocab.org/relationship/
Faloutsos, C., McCurley, K. S., & Tomkins, A. (2004). Fast discovery of connection subgraphs. In Proceedings of the ACM SIGKDD 2004. Finin, T., Ding, L., & Zou, L. (2005). Social
networking on the Semantic Web. The Learning
Organization, 12(5), 418-435.
Freeman, L. C. (1979). Centrality in social net-
works: Conceptual clarification. Social Networks,
1, 215–239.
Gandon, F., Corby, O., Giboin, A., Gronnier, N., & Guigard, C. (2005). Graph-based inferences in a Semantic web server for the cartography of
competencies in a telecom valley. In Proceedings
of ISWC05.
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2003). Learning probabilistic models of link
structure. Journal of Machine Learning Research,
3, 679–707.
Goecks, J., & Mynatt, E. D. (2004). Leveraging
social networks for information sharing. In Pro-
ceedings of ACM CSCW 2004.
Golbeck, J., & Hendler, J. (2004a). Accuracy of metrics for inferring trust and reputation in Se-
mantic web-based social networks. In Proceedings
of EKAW 2004.
Golbeck, J., & Hendler, J. (2004b). Inferring
reputation on the Semantic web. In Proceedings
Golbeck, J., & Hendler, J. (2005). Inferring trust
relationships in web-based social networks. ACM
Transactions on Internet Technology, 7(1). Golbeck, J., & Parsia, B. (2006). Trust network-
based filtering of aggregated claims. International Journal of Metadata, Semantics and Ontolo- gies.
Guha, R., & Garg, A. (2004). Disambiguating
entities in Web search. TAP project, http://tap.
stanford.edu/PeopleSearch.pdf
Hanneman, R., & Riddle, M. (2005). Introduc-
tion to social network methods. University of California, Riverside.
Harada, M., Sato, S., & Kazama, K. (2004). Finding authoritative people from the Web. In
Proceedings of the Joint Conference on Digital Libraries (JCDL2004).
Haveliwala, T. (2002). Topic-sensitive PageRank. In Proceedings of WWW2002.
Jin, Y., Matsuo, Y., & Ishizuka, M. (2006). Ex- tracting a social network among entities by web
mining. In Proceedings of the ISWC ‘06 Workshop
on Web Content Mining with Human Language
Techniques.
Kamvar, S., Schlosser, M., & Garcia-Molina, H. (2003). The eigentrust algorithm for reputation
management in P2P networks. In Proceedings
of WWW2003.
Kautz, H., Selman, B., & Shah, M. (1997). The
hidden web. AI magazine, 18(2), 27–35.
Keller, F., Lapata, M., & Ourioupina, O. (2002). Using the web to overcome data sparseness. In
Proceedings of the EMNLP 2002.
Knees, P., Pampalk, E., & Widmer, G. (2004).
Artist classification with web-based data. In
Proceedingsof the 5th International Conference
on Music Information Retrieval (ISMIR).
Kumar, R., Raghavan, P., Rajagopalan, S., &
Tomkins, A. (2002). The web and social networks.
IEEE Computer, 35(11), 32–36.
Lapata, M., & Keller, F. (2004). The web as a baseline: Evaluating the performance of unsuper- vised web-based models for a range of nlp tasks. In Proceedings of theHLT-NAACL 2004. Leskovec, J., Adamic, L. A., & Huberman, B. A. (2005). The dynamics of viral marketing. http://www.hpl.hp.com/research/idl/papers/vi- ral/viral.pdf
Li, X., Morie, P., & Roth, D. (2005). Semantic integration in text: From ambiguous names to
identifiable entities. AI Magazine Spring, pp. 45–68.
Lloyd, L., Bhagwan, V., Gruhl, D., & Tomkins,
A. (2005). Disambiguation of references to in-
dividuals (Tech. Rep. No. RJ10364(A0410-011)).
IBM Research.
Malin, B. (2005). Unsupervised name disam-
biguation via social network similarity. Workshop
Notes on Link Analysis, Counterterrorism, and Security.
Mann, G. S., & Yarowsky, D. (2003). Unsupervised
personal name disambiguation. In Proceedings
of the CoNLL.
Manning, C., & Schutze, H. (2002). Foundations of
statistical natural language processing. London:
The MIT Press.
Massa, P., & Avesani, P. (2005). Controversial us- ers demand local trust metrics: An experimental
study on epinions.com community. In Proceed-
ings of the AAAI-05.
Matsuo, Y., Hamasaki, M., Takeda, H., Mori, J., Bollegala, D., Nakamura, Y., Nishimura, T., Hasida, K., & Ishizuka, M. (2006a). Spinning multiple social networks for Semantic web. In
Matsuo, Y., Mori, J., Hamasaki, M., Takeda, H., Nishimura, T., Hasida, K., & Ishizuka, M. (2006b). POLYPHONET: An advanced social
network extraction system. In Proceedings of
the WWW 2006.
Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2003). Mining social network of conference
participants from the web. In Proceedings of the
Web Intelligence 2003.
Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2004a). Finding social network for trust cal-
culation. In Proc. ECAI2004, pages 510–514.
Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2004b). Finding social network for trust calcu-
lation. In Proceedings of the 16th European Con-
ference on Artificial Intelligence (ECAI2004). Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2005). Social network extraction from the web
information. Journal of the Japanese Society for
Artificial Intelligence, 20(1E), 46–56.
Mika, P. (2005a). Flink: Semantic web technology for the extraction and analysis of social networks.
Journal of Web Semantics, 3(2).
Mika, P. (2005b). Ontologies are us: A unified
model of social networks and Semantics. In
Proceedings of the ISWC2005.
Miki, T., Nomura, S., & Ishida, T. (2005). Semantic web link analysis to discover social relationship
in academic communities. In Proceedings of the
SAINT 2005.
Milgram, S. (1967). The small-world problem.
Psychology Today, 2, 60–67.
Mori, J., Ishizuka, M., Sugiyama, T., & Matsuo, Y. (2005a). Real-world oriented information shar-
ing using social networks. In Proceedings of the
ACM GROUP ‘05.
Mori, J., Matsuo, Y. & Ishizuka, M. (2005b). Finding user Semantics on the web using word
co-occurrence information. In Proceedings of
the International Workshop on Personalization
on the Semantic Web (PersWeb ‘05).
Nakagawa, H., Maeda, A., & Kojima, H. Au- tomatic term recognition system termextract.
http://gensen.dl.itc.utokyo.ac.jp/gensenweb\_eng. html .
Nigram, K., McCallum, A., Thrun, S., & Mitch-
ell, T. (2000). Text classification from labeled
and unlabeled documents using EM. Machine
Learning, 39, 103–134.
Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814.
Pentland, A. S. (2005). Socially aware computation and communication. IEEE Computer.
Quinlan, J. R. (1993). C4.5: Programs for machine
learning. CA: Morgan Kaufmann.
Richardson, M., Agrawal, R., & Domingos, P. (2003). Trust management for the Semantic Web. In Proceedings of the ISWC2003.
Scott, J. (2000). Social network analysis: A hand- book (2nd ed.). SAGE publications.
Staab, S., Domingos, P., Mika, P., Golbeck, J., Ding, L., Finin, T., Joshi, A., Nowak, A., & Val- lacher, R. (2005). Social networks applied. IEEE
Intelligent Systems, 80–93.
Tyler, J., Wilkinson, D., & Huberman, B. (2003).
Email as spectroscopy: Automated discovery of community structure within organizations.
Kluwer, B.V.
Wacholder, N., Ravin, Y., & Choi, M. (1997). Disambiguation of proper names in text. In Pro- ceedings of the 5th Applied Natural Language Processing Conference.
Wasserman, S., & Faust, K. (1994). Social network analysis. Methods and applications. Cambridge: Cambridge University Press.
Watts, D., & Strogatz, S. (1998). Collective dy- namics of small-world networks. Nature, 393, 440–442.
endnotes
1 http://www.friendster.com/ 2 http://www.orkut.com/ 3 http://www.imeem.com/ 4 http://360.yahoo.com/5 http://flink.Semanticweb.org/. The system
won the first prize at the Semantic Web Challenge in ISWC2004.
6 As of October, 2005 by Google search
engine. The hit count is that obtained after omission of similar pages by Google.
7 Using the disaster mitigation research com-
munity in Japan.
8 We use an entity as a broader term of a person.
9 http://www.google.com/ 10 As of 2004.