Future Work - Embedding Approaches for Relational Data

human generated big data in our relational learning models.

6.2 Future Work

There are a lot of possible directions for the future research. We briefly outline some interesting directions as follows.

The Co-occurrence Data

The proposed method in Chapter 3 takes the input as a set of association mea- surements between two groups of heterogeneous objects. It is shown to be capable of recovering the original shape of the synthetical data when the input association matrix reflecting well the similarity values between the two sets of heterogeneous objects. However, in the real world application, the provided associations are usually very sparse and their magnitudes have little effect on their relative strength of associations. This is a prevailing issue in many co-occurrence data sets. Taking the text corpus as an example, the words that occur most frequently may have little to no meaning on their own (e.g., the, great, we), but they have a strong co-occurrence rates in the document corpus. On the contrary, some words that are rarely found in a corpus may be strongly attached to the underlying semantics of the documents. Thus, this is an important research direction when modelling the co-occurrence data, and how to resolve it in the embedding based approaches requires much thought and creativity.

Co-Embeddings and Topic Modelling

Topic modelling [27] is an important research direction in machine learning, nat- ural language processing and information retrieval. In developing the co-embeddings or joint embeddings of the documents and words, our hope is to identify different sets of neighbouring words as potential overlapping "topic clusters" and simultaneously put each document to be close to its topical keywords in the same dimensional space. This idea is presented in Figure 3.5 and a formal and detailed application can be found in [63]. But it is often too restricted and impractical to put the documents and

6.2 Future Work 138 words in the same space for interpreting the documents’ topical associations. In the co-embedding models, an appropriate positioning of a very long document that is consist of a large number of topics is hard as it needs to be simultaneously placed close to a disparate set of "topic clusters". In turn, the requirement that different topics should be put close to some same set of documents will inevitably position these topics badly, making them intermingled rather than forming meaningful ”topic clusters”. One way to alleviate this issue is to partition each document into different paragraphs where each paragraph embodies only one or a few topics. Then the input to the co-embedding algorithms would be the associations of the words and the paragraphs. Or it is recommend to have a number of different embedding spaces to interpret the document-topic associations as well as the word-topic associations.

Graph Connectivity Patterns in Multi-dimensional Data Modelling

We have developed a novel embedding method for the multi-dimensional data in Chapter 4, it is also important to consider the use of graph connectivity patterns for predicting links in multi-dimensional data. Such models attract fewer attentions compared to the embedding-based models since they require much more computational efforts. Some theoretical work [160, 161] show that the connectivity-based approaches are often complementary to the embedding approaches, as they are con- centrating on different aspects of the dependency structures. Furthermore, they are computationally efficient if some patterns or rules can be explained from only some short paths in the graph. Combining the strengths of embedding and connectivity based models is therefore a promising direction, where some efforts [160, 162–164] are continually devoted to this field.

Co-Embeddings for the Document Network

Document network embodies a linkage network between documents as well as a co-occurrence term-document matrix. Current research works [33, 137, 145] usually employ an LDA [27] model for the word generation with a regulariser based on the linkage structure. In a similar manner, we can handle the document network in a co-embedding generation setting, with the document-word Euclidean distances

6.2 Future Work 139 explaining the co-occurrence statistics and the document-document distances for explaining the linkage structure. And we can simply add them up with a weight controlling parameter to give the global cost function for parameter learning. An ideal mapping of this should comply with both the document-word associations and the document-document linkages in the data. Once the co-embeddings are computed, it could be used for various machine learning applications, e.g., clustering, classification and data visualisation.

References

[1] L. Getoor and B. Taskar,Introduction to statistical relational learning. MIT press, 2007.

[2] P. C. W. Davies,The forces of nature. CUP Archive, 1979.

[3] R. Costanza, B. G. Norton, and B. D. Haskell,Ecosystem health: new goals

for environmental management. Island Press, 1992.

[4] S. Muggleton, R. Otero, and A. Tamaddoni-Nezhad,Inductive logic programming. Springer, 1992, vol. 38.

[5] K. Kersting and L. De Raedt, “1 Bayesian logic programming: theory and tool,”Statistical Relational Learning, p. 291, 2007.

[6] S. Muggleton, “Learning stochastic logic programs,”Electron. Trans. Artif.

Intell., vol. 4, no. B, pp. 141–153, 2000.

[7] M. Richardson and P. Domingos, “Markov logic networks,”Machine learning, vol. 62, no. 1, pp. 107–136, 2006.

[8] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, “Learning probabilistic relational models,” inIJCAI, vol. 99, 1999, pp. 1300–1309.

[9] L. Getoor, N. Friedman, D. Koller, and A. Pfeffer, “Learning probabilistic relational models,” inRelational data mining. Springer, 2001, pp. 307–335.

[10] D. Heckerman, C. Meek, and D. Koller, “Probabilistic models for relational data,” Technical Report MSR-TR-2004-30, Microsoft Research, Tech. Rep., 2004.

REFERENCES 141 [11] P. P.-S. Chen, “The entity-relationship model - toward a unified view of data,”

ACM Transactions on Database Systems (TODS), vol. 1, no. 1, pp. 9–36, 1976.

[12] B. Taskar, P. Abbeel, and D. Koller, “Discriminative probabilistic models for relational data,” inProceedings of the Eighteenth conference on Uncertainty in

artificial intelligence. Morgan Kaufmann Publishers Inc., 2002, pp. 485–492.

[13] J. Neville and D. Jensen, “Relational dependency networks,”Journal of Ma-

chine Learning Research, vol. 8, no. 3, pp. 653–692, 2007.

[14] M. I. Jordan,Learning in graphical models. Springer Science & Business Media, 1998, vol. 89.

[15] K. P. Murphy, Y. Weiss, and M. I. Jordan, “Loopy belief propagation for approximate inference: An empirical study,” inProceedings of the Fifteenth

conference on Uncertainty in artificial intelligence. Morgan Kaufmann

Publishers Inc., 1999, pp. 467–475.

[16] M. J. Wainwright, M. I. Jordanet al., “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1–2, pp. 1–305, 2008.

[17] S. Kok and P. Domingos, “Learning markov logic network structure via hypergraph lifting,” inProceedings of the 26th annual international conference

on machine learning. ACM, 2009, pp. 505–512.

[18] J. Davis and P. Domingos, “Bottom-up learning of markov network structure,”

inProceedings of the 27th International Conference on Machine Learning

(ICML-10), 2010, pp. 271–278.

[19] J. Van Haaren and J. Davis, “Markov network structure learning: A random- ized feature generation approach.” inAAAI, 2012, pp. 1148–1154.

[20] Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel, “Infinite hidden relational models,”

REFERENCES 142 [21] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda, “Learning systems of concepts with an infinite relational model,” inAAAI, vol. 3, 2006, p. 5.

[22] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochastic blockmodels,”Journal of Machine Learning Research, vol. 9, no. Sep, pp. 1981–2014, 2008.

[23] A. Bordes, J. Weston, R. Collobert, and Y. Bengio, “Learning structured embeddings of knowledge bases,” inConference on Artificial Intelligence, 2011. [Online]. Available: http://infoscience.epfl.ch/record/192344/files/ Bordes_AAAI_2011.pdf

[24] A. Bordes, X. Glorot, J. Weston, and Y. Bengio, “A semantic matching energy function for learning with multi-relational data,”Machine

Learning, vol. 94, no. 2, pp. 233–259, Feb. 2014. [Online]. Available:

http://link.springer.com/article/10.1007/s10994-013-5363-6

[25] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” inAdvances

in neural information processing systems, 2013, pp. 3111–3119.

[26] I. Borg and P. J. Groenen, Modern multidimensional scaling: Theory and

applications. Springer Science & Business Media, 2005.

[27] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal

of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.

[28] T. Mu and J. Goulermas, “Automatic generation of co-embeddings from relational data with adaptive shaping,”IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. 35, no. 10, pp. 2340–2356, Oct. 2013.

[29] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE

transactions on knowledge and data engineering, vol. 17, no. 6, pp. 734–749,

REFERENCES 143 [30] Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD international

conference on Knowledge discovery and data mining. ACM, 2008, pp.

426–434.

[31] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review of relational machine learning for knowledge graphs,”Proceedings of the IEEE, vol. 104, no. 1, pp. 11–33, 2016.

[32] Q. Lu and L. Getoor, “Link-based classification,” inProceedings of the 20th

International Conference on Machine Learning (ICML-03), 2003, pp. 496–

503.

[33] J. Chang and D. M. Blei, “Hierarchical relational models for document networks,”The Annals of Applied Statistics, pp. 124–150, 2010.

[34] A. McCallum, A. Corrada-Emmanuel, and X. Wang, “Topic and role discovery in social networks,”Computer Science Department Faculty Publication Series, p. 3, 2005.

[35] X. Wang, N. Mohanty, and A. McCallum, “Group and topic discovery from relations and text,” inProceedings of the 3rd international workshop on Link

discovery. ACM, 2005, pp. 28–35.

[36] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,”AI magazine, vol. 29, no. 3, p. 93,

2008.

[37] X. Wang and G. Sukthankar, “Multi-label relational neighbor classification using social context features,” in Proceedings of the 19th ACM SIGKDD

international conference on Knowledge discovery and data mining. ACM,

2013, pp. 464–472.

[38] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author-topic model for authors and documents,” inProceedings of the 20th conference on

REFERENCES 144 [39] M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a feather: Ho-

mophily in social networks,”Annual review of sociology, vol. 27, no. 1, pp. 415–444, 2001.

[40] M. Yazdani, R. Collobert, and A. Popescu-Belis, “Learning to rank on network data,” inMining and Learning with Graphs, no. EPFL-CONF-192709, 2013.

[41] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” inAdvances in

Neural Information Processing Systems, NIPS, 2013.

[42] S. Chakrabarti, B. Dom, and P. Indyk, “Enhanced hypertext categorization using hyperlinks,” inACM SIGMOD Record, vol. 27, no. 2. ACM, 1998, pp. 307–318.

[43] J. Neville and D. Jensen, “Collective classification with relational dependency networks,” inProceedings of the Second International Workshop on Multi-

Relational Data Mining, 2003, pp. 77–91.

[44] D. Jensen, J. Neville, and B. Gallagher, “Why collective inference improves relational classification,” inProceedings of the tenth ACM SIGKDD interna-

tional conference on Knowledge discovery and data mining. ACM, 2004, pp.

593–598.

[45] C. Li, L. Jin, and S. Mehrotra, “Supporting efficient record linkage for large data sets using mapping techniques,” World Wide Web, vol. 9, no. 4, pp. 557–584, 2006.

[46] I. Bhattacharya and L. Getoor, “Entity resolution in graphs,”Mining graph data, p. 311, 2006.

[47] L. Otero-Cerdeira, F. J. Rodríguez-Martínez, and A. Gómez-Rodríguez, “On- tology matching: A literature review,” Expert Systems with Applications, vol. 42, no. 2, pp. 949–971, 2015.

REFERENCES 145 [48] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate record detection: A survey,”IEEE Transactions on knowledge and data engineering, vol. 19, no. 1, pp. 1–16, 2007.

[49] G. A. Miller, “WordNet: a lexical database for English,” Communications

of the ACM, vol. 38, no. 11, pp. 39–41, 1995. [Online]. Available:

http://dl.acm.org/citation.cfm?id=219748

[50] X. Dong, A. Halevy, and J. Madhavan, “Reference reconciliation in complex information spaces,” inProceedings of the 2005 ACM SIGMOD international

conference on Management of data. ACM, 2005, pp. 85–96.

[51] P. Singla and P. Domingos, “Entity resolution with markov logic,” inData

Mining, 2006. ICDM’06. Sixth International Conference on. IEEE, 2006, pp.

572–582.

[52] I. Bhattacharya and L. Getoor, “Collective entity resolution in relational data,”

ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1,

p. 5, 2007.

[53] S. E. Whang and H. Garcia-Molina, “Joint entity resolution,” inData Engi-

neering (ICDE), 2012 IEEE 28th International Conference on. IEEE, 2012,

pp. 294–305.

[54] I. Jolliffe,Principal Component Analysis, ser. Springer Series in Statistics. New York: Springer-Verlag, 2002.

[55] X. He and P. Niyogi, “Locality preserving projections,” inNIPS, vol. 16, 2003, pp. 234–241.

[56] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,”Neural computation, vol. 15, no. 6, pp. 1373–1396, 2003.

[57] L. Maaten, “Learning a parametric embedding by preserving local structure,”

inInternational Conference on Artificial Intelligence and Statistics, 2009, pp.

REFERENCES 146 [58] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally

linear embedding,”Science, vol. 290, no. 5500, pp. 2323–2326, 2000.

[59] R. Hettiarachchi and J. F. Peters, “Multi-manifold lle learning in pattern recognition,”Pattern Recognition, vol. 48, no. 9, pp. 2947–2960, 2015.

[60] W. Zhang, X. Xue, H. Lu, and Y. Guo, “Discriminant neighborhood embedding for classification,”Pattern Recognition, vol. 39, no. 11, pp. 2240–2243, 2006.

[61] C. Ding and L. Zhang, “Double adjacency graphs-based discriminant neighborhood embedding,” Pattern Recognition, vol. 48, no. 5, pp. 1734–1742, 2015.

[62] Z. Zhang, M. Zhao, and T. W. S. Chow, “Constrained large margin local projection algorithms and extensions for multimodal dimensionality reduction,”

Pattern Recognition, vol. 45, no. 12, pp. 4466–4493, 2012.

[63] A. Globerson, G. Chechik, F. Pereira, and N. Tishby, “Euclidean embedding of co-occurrence data,”Journal of Machine Learning Research, vol. 8, no. 10, pp. 2265–2295, 2007.

[64] J. Choo, S. Bohn, G. Nakamura, A. M. White, and H. Park, “Heterogeneous Data Fusion via Space Alignment Using Nonmetric Multidimensional Scaling.” inSDM. SIAM, 2012, pp. 177–188.

[65] A. P. Singh and G. J. Gordon, “Relational learning via collective matrix factorization,” in Proceedings of the 14th ACM SIGKDD international

conference on Knowledge discovery and data mining. ACM, 2008, pp.

650–658. [Online]. Available: http://dl.acm.org/citation.cfm?id=1401969

[66] T. Franz, A. Schultz, S. Sizov, and S. Staab, “Triplerank: Ranking semantic web data by tensor decomposition,” in International semantic

web conference. Springer, 2009, pp. 213–228. [Online]. Available:

REFERENCES 147 [67] M. Nickel, V. Tresp, and H.-P. Kriegel, “A three-way model for collective learning on multi-relational data,” in Proceedings of the 28th

international conference on machine learning (ICML-11), 2011, pp. 809–816.

[Online]. Available: http://machinelearning.wustl.edu/mlpapers/paper_files/ ICML2011Nickel_438.pdf

[68] R. Jenatton, N. L. Roux, A. Bordes, and G. R. Obozinski, “A latent factor model for highly multi-relational data,” inAdvances in Neural Information

Processing Systems, 2012, pp. 3167–3175. [Online]. Available: http://papers.

nips.cc/paper/4744-a-latent-factor-model-for-highly-multi-relational-data

[69] R. Socher, D. Chen, C. D. Manning, and A. Ng, “Reasoning With Neural Tensor Networks for Knowledge Base Completion,” in Advances

in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou,

M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2013, pp. 926–934. [Online]. Available: http://papers.nips.cc/paper/ 5028-reasoning-with-neural-tensor-networks-for-knowledge-base-completion. pdf

[70] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review of relational machine learning for knowledge graphs,”Proceedings of the IEEE, 2016.

[71] V. Ng and C. Cardie, “Improving machine learning approaches to coreference resolution,” inProceedings of the 40th Annual Meeting on Association for

Computational Linguistics. Association for Computational Linguistics, 2002,

pp. 104–111.

[72] R. S. S. Prakash, D. Jurafsky, and A. Y. Ng, “Learning to merge word senses,”

EMNLP-CoNLL 2007, vol. 1005, 2007.

[73] A. Singhal, “Introducing the Knowledge Graph: things, not strings.” [Online]. Available: https://googleblog.blogspot.com/2012/05/ introducing-knowledge-graph-things-not.html

REFERENCES 148 [74] R. Qian, “Understand Your World with Bing.” [Online]. Available: http://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/

[75] D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prageret al., “Building watson: An overview of the deepqa project,”AI magazine, vol. 31, no. 3, pp. 59–79, 2010.

[76] G. E. Hinton and S. T. Roweis, “Stochastic neighbor embedding,” inAdvances

in neural information processing systems, 2003, pp. 857–864.

[77] I. T. Jolliffe, “Principal component analysis and factor analysis,” inPrincipal

component analysis. Springer, 1986, pp. 115–128.

[78] E. Kokiopoulou, J. Chen, and Y. Saad, “Trace optimization and eigenproblems in dimension reduction methods,”Numerical Linear Algebra with Applications, vol. 18, no. 3, pp. 565–602, 2011.

[79] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,”science, vol. 290, no. 5500, pp. 2323–2326, 2000.

[80] H. Hotelling, “Relations between two sets of variates,”Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936.

[81] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,”Neural computation, vol. 16, no. 12, pp. 2639–2664, 2004.

[82] R. Kindermann and J. L. Snell,Markov random fields and their applications. American Mathematical Society, 1980, vol. 1.

[83] Y. Yamanishi, “Supervised bipartite graph inference,” inAdvances in Neural

Information Processing Systems, 2009, pp. 1841–1848.

[84] M. Gönen, “Embedding heterogeneous data by preserving multiple kernels,”

in Proceedings of the 21st European Conference on Artificial Intelligence,

REFERENCES 149 [85] T. Mu, J. Y. Goulermas, I. Korkontzelos, and S. Ananiadou, “Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities,”Journal of the Association for Information Science

and Technology, vol. 67, no. 1, pp. 106–133, 2016.

[86] M. Khoshneshin, W. Street, and P. Srinivasan, “Bayesian Embedding of Co- occurrence Data for Query-Based Visualization,” in2011 10th International

Conference on Machine Learning and Applications and Workshops (ICMLA),

vol. 1, Dec. 2011, pp. 74–79.

[87] Y. Maron, E. Bienenstock, and M. James, “Sphere Embedding: An Application to Part-of-Speech Induction,” inAdvances in Neural Information Processing

Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel,

and A. Culotta, Eds. Curran Associates, Inc., 2010, pp. 1567–1575.

[88] M. J. Greenacre,Theory and Applications of Correspondence Analysis. Aca- demic Press, 1984.

[89] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, “Indexing by latent semantic analysis,”JASIS, vol. 41, no. 6, pp. 391–407, 1990.

[90] J. R. Bellegarda, “Latent semantic mapping,”IEEE Signal Processing Maga- zine, vol. 22, no. 5, pp. 70–80, 2005.

[91] D. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,” Neural computation, vol. 16, no. 12, pp. 2639–2664, 2004.

[92] C. Lee, A. Elgammal, and M. Torki, “Learning representations from multiple manifolds,”Pattern Recognition, vol. 50, pp. 74–87, 2016.

[93] T. Iwata, K. Saito, N. Ueda, S. Stromsten, T. L. Griffiths, and J. B. Tenenbaum, “Parametric embedding for class visualization,”Neural Computation, vol. 19,

REFERENCES 150 [94] P. Sarkar, S. M. Siddiqi, and G. J. Gordon, “A latent space approach to dynamic embedding of co-occurrence data,” inProceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS

2007), 2007.

[95] I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” inProceedings of the seventh ACM SIGKDD international

conference on Knowledge discovery and data mining. ACM, 2001, pp.

269–274.

[96] M. Rege, M. Dong, and F. Fotouhi, “Bipartite isoperimetric graph partitioning for data co-clustering,”Data Mining and Knowledge Discovery, vol. 16, no. 3, pp. 276–312, 2008.

[97] N. Srebro, J. D. M. Rennie, and T. S. Jaakola, “Maximum-margin matrix

In document Embedding Approaches for Relational Data (Page 152-173)