Identifying schools from authors' texts using data mining

Document Type : Original Article

Authors

1 Faculty of Engineering Department of Computer, Shahrood Branch-Islamic Azad University

2 Shahrood University, Iran

Abstract

In the field of knowledge of philosophical schools, each person's thinking is different according to the type of attitude he can have regarding different schools. Recognizing the author's attitude and its similarity to each of the philosophical schools has always been one of the important issues in the field of humanities. In this article, a method based on deep learning is proposed to distinguish philosophical schools from text. In the proposed method, first the texts are normalized and redundant and meaningless words are removed. After the normalization stage, the text is broken into sentences and words, and then using the fasttext library, each word is converted into a numerical vector, after that, the features of the texts are extracted using the designed network, and finally, the system has learned and is ready to extract data, and by giving a new sentence, its similarity to each school is expressed. Based on the evaluation, the accuracy of the proposed method is 94%.

Keywords

Main Subjects


Deepa Yogish N. Manjunath , Ravindra S. Hegadi. “Review on Natural Language Processing Trends and Techniques Using NLTK”: 17 July 2019
RyanSpring ; MatthewJohnson. “The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools”: June 2022
Piotr SembereckiHenryk Maciejewski , All Authors. “Deep learning methods for subject text classification of articles”: 3-6 Sept. 2017
Mikolov, et al. “Efficient Estimation of Word Representations in Vector Space.” ArXiv.org, 7 Sept. 2013
Tejas Menon. “Empirical Analysis of CBOW and Skip Gram NLP Models” Summer 2020
Takamune OnishiHiromitsu Shiina. “Distributed Representation Computation Using CBOW Model and Skip–gram Model”: 1-15 Sept. 2020
Lucky Agarwal , Kartik Thakral , Gaurav Bhatt , Ankush Mittal. “Authorship Clustering using TF-IDF weighted Word-Embeddings”: 12 December 2019
Shervin Minaee , Nal Kalchbrenner , All Authors. “Deep Learning--based Text Classification: A Comprehensive Review” 17 April 2021
Pranav MalikAditi AggrawalDinesh K. “Toxic Speech Detection using Traditional Machine Learning Models and BERT and fastText Embedding with Deep Neural Networks”: April 2021
Tengjun YaoZhengang Zhai. “Text Classification Model Based on fastText”:  September 2020
Armand JoulinEdouard GravePiotr BojanowskiTomas Mikolov. “Bag of Tricks for Efficient Text Classification” ; [Submitted on 6 Jul 2016 (v1), last revised 9 Aug 2016 (this version, v3)]
Tomas MikolovKai ChenGreg CorradoJeffrey Dean. “Efficient Estimation of Word Representations in Vector Space” ; [Submitted on 16 Jan 2013 (v1), last revised 7 Sep 2013 (this version, v3)]
Ye ZhangByron Wallace. “A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification” ; [Submitted on 13 Oct 2015 (v1), last revised 6 Apr 2016 (this version, v4)]
Marek Słoński, Krzysztof Schabowicz, Ewa Krawczyk “Detection of Flaws in Concrete Using Ultrasonic Tomography and Convolutional Neural Networks”; [27 March 2020]
Dabal Pedamonti. “Comparison of non-linear activation functions for deep neural networks on MNIST classification task” ; [8 Apr 2018]
FlorentinBiederRobin SandkühlerPhilippe C. Cattin. “Comparison of Methods Generalizing Max- and Average-Pooling” ; 2 Mar 2021
Diederik , P. Kingma,, Jimmy Ba. “Adam: A Method for Stochastic Optimization” ;[Submitted on 22 Dec 2014 (v1), last revised 30 Jan 2017 (this version, v9)]