Model Performance - Northeastern University

Below is a table showing how a neural network model with one hidden layer performed with and without using features generated by word-vectors.

IMDb Set Accuracy Instance F1

2 layer NN 18.21 56.14

2 layer NN with wv 19.63 60.21

CNN 17.43 56.92

Table 6.1: Set-Accuracy and Instance-F1 Results for IMDb dataset.

20 Newsgroup F1

2 layer NN 56.72

2 layer NN with wv 58.97

CNN 57.59

Table 6.2: F1 Results for 20 Newsgroup dataset.

We can see that for both the datasets, the neural network model using features generated by word-vectors is the better performing model. In fact this model even outperforms all other previously tested models.

References

Wentian Li. Random texts exhibit zipf’s-law-like word frequency distribution.

IEEE Transactions on information theory, 38(6):1842–1845, 1992.

Thorsten Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, Carnegie-mellon univ pittsburgh pa dept of computer science, 1996.

Yiming Yang and Jan O Pedersen. A comparative study on feature selection in text categorization. In Icml, volume 97, pages 412–420, 1997.

Trevor Mansuy and Robert J Hilderman. A characterization of wordnet features in boolean models for text classification. In Proceedings of the fifth Aus-

tralasian conference on Data mining and analystics-Volume 61, pages 103–109. Australian Computer Society, Inc., 2006.

Martin F Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.

J Ross Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.

Dimitrios Mavroeidis, George Tsatsaronis, Michalis Vazirgiannis, Martin Theobald, and Gerhard Weikum. Word sense disambiguation for exploiting hierarchical thesauri in text classification. In European Conference on Princi-

ples of Data Mining and Knowledge Discovery, pages 181–192. Springer, 2005. Pu Wang and Carlotta Domeniconi. Building semantic kernels for text classification using wikipedia. In Proceedings of the 14th ACM SIGKDD international

conference on Knowledge discovery and data mining, pages 713–721. ACM,

2008.

Bradford Heap, Michael Bain, Wayne Wobcke, Alfred Krzywicki, and Susanne Schmeidl. Word vector enrichment of low frequency words in the bag-of- words model for short text multi-class classification problems. arXiv preprint

arXiv:1709.05778, 2017.

Sam Scott and Stan Matwin. Text classification using wordnet hypernyms.

Usage of WordNet in Natural Language Processing Systems, 1998.

Lee S Jensen and Tony Martinez. Improving text classification by using con-

ceptual and contextual features. PhD thesis, Brigham Young University. De- partment of Computer Science, 2000.

Yoshua Bengio, R ´ejean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155, 2003.

Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. InProceed-

ings of the 25th international conference on Machine learning, pages 160–167. ACM, 2008.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estima- tion of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on

empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.

Yoav Goldberg and Omer Levy. word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722, 2014.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Dis- tributed representations of words and phrases and their compositionality. In

Advances in neural information processing systems, pages 3111–3119, 2013. Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre- training. URL https://s3-us-west-2. amazonaws. com/openai-

assets/researchcovers/languageunsupervised/language understanding

paper. pdf, 2018.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding.

arXiv preprint arXiv:1810.04805, 2018.

Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International Conference on Machine Learning, pages 1188– 1196, 2014.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural informa-

tion processing systems, pages 1097–1105, 2012.

Ye Zhang and Byron Wallace. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint

Steve Lawrence, C Lee Giles, Ah Chung Tsoi, and Andrew D Back. Face recognition: A convolutional neural-network approach. IEEE transactions on neural

networks, 8(1):98–113, 1997.

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Suk- thankar, and Li Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision

and Pattern Recognition, pages 1725–1732, 2014.

Yoon Kim. Convolutional neural networks for sentence classification. arXiv

preprint arXiv:1408.5882, 2014.

Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188, 2014.

Mark Hughes, I Li, Spyros Kotoulas, and Toyotaro Suzumura. Medical text classification using convolutional neural networks. Stud Health Technol In-

form, 235:246–250, 2017.

Serkan Kiranyaz, Turker Ince, and Moncef Gabbouj. Real-time patient-specific ecg classification by 1-d convolutional neural networks. IEEE Transactions on

Biomedical Engineering, 63(3):664–675, 2015.

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Rus- lan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.

In document Northeastern University (Page 53-56)