• No results found

Results and Evaluation 4

5.2 Future work

In this section, we present the following suggestions for future work as this project holds high potential to investigate furthermore. Regarding optimization, one can consider the bigger dataset and test the models in order to explore how the Machine Learning methods behave with respect to their performance for classification and time-consuming.

Unlike classification and regression, which analyze class-labeled (training) datasets, Clustering analyzes data objects without consulting class labels and can be used to gener-ate class labels for a group of data [6].

Another suggestion to extend the project will be considering the Clustering methods such as the k-nearest neighbor, hierarchical clustering, Gaussian mixture models, hidden Markov models and etc, as we will be employing this concept in the article related to this thesis later on. Basically, clustering involves grouping data with respect to their sim-ilarities, the distance measures and clustering algorithms will be captured to calculate the difference between instances and group them consequently.

Supervised and unsupervised learning methods have traditionally focused on data con-sisting of independent instances of a single type. However, many real-world domains are best described by relational models in which instances of multiple types are related to each other in complex ways. For instance, in the experimental dataset containing scientific paper, they are related to each other via citation and are also related to their authors [12].

One interesting task to investigate for future work will be discovering a loop in a net-work of citations with regard to explore the accurate relevancy between two papers cited each other as it has illustrated as a small network of citations in figureFig. 5.1, however, this method demands availability of the paper’s citations’ actual context. .

Another extensions to this work, will be selecting different classification models other than mentioned methods have been developed, like super vector machine, neural networks, Gaussian mixture and so on, any improvement to the performance

Regarding the feature set, one can extend the set to more vectors like considering the ”Author” or ”Title” as well as considering different pair of feature set instead of the aforementioned feature set.

The resulting analysis can be expanded as well, the Machine Learning algorithms ap-plied to classify the dataset into two classes ”Barely related” and ”Related”, however for the citations classified as ”Barely related” group, with the help of accessing to the ref-erences context and authors, one can consider text mining techniques like Term-Based Method, Phrase-Based Method and Concept-Based Method [4], to explore any citations that are not related to the paper cited those particular references at all and the reason for this might be author selected those citations from his or her scientific network intentionally or chose those citations by misinterpreting. We will investigate this concept furthermore in the article related to thesis’s topic subsequently.

Page 37

Chapter 5. Conclusion

Figure 5.1:Sample of small network of citations having a loop

Bibliography

[1] Hanadi Alfraidi, Won-Sook Lee, and David Sankoff. Literature visualization and similarity measurement based on citation relations. InInformation Visualisation (iV), 2015 19th International Conference on, pages 217–222. IEEE, 2015.

[2] Tim Bray, Jean Paoli, C Michael Sperberg-McQueen, Eve Maler, and Franc¸ois Yergeau. Extensible markup language (xml) 1.0, 2008.

[3] Alexandru Constantin, Steve Pettifer, and Andrei Voronkov. Pdfx: fully-automated pdf-to-xml conversion of scientific literature. InProceedings of the 2013 ACM sym-posium on Document engineering, pages 177–180. ACM, 2013.

[4] Sonali Vijay Gaikwad, Archana Chaugule, and Pramod Patil. Text mining methods and techniques.International Journal of Computer Applications, 85(17), 2014.

[5] Bela Gipp, J¨oran Beel, and Christian Hentschel. Scienstein: A research paper recom-mender system. InProceedings of the international conference on emerging trends in computing (icetic’09), pages 309–315, 2009.

[6] Jiawei Han, Jian Pei, and Micheline Kamber.Data mining: concepts and techniques.

Elsevier, 2011.

[7] Haifeng Liu, Zhuo Yang, Ivan Lee, Zhenzhen Xu, Shuo Yu, and Feng Xia. Car: In-corporating filtered citation relations for scientific article recommendation. InSmart City/SocialCom/SustainCom (SmartCity), 2015 IEEE International Conference on, pages 513–518. IEEE, 2015.

[8] Tetsuya Nakatoh, Hayato Nakanishi, Kensuke Baba, and Sachio Hirokawa. Focused citation count: a combined measure of relevancy and quality. InAdvanced Applied Informatics (IIAI-AAI), 2015 IIAI 4th International Congress on, pages 166–170.

IEEE, 2015.

[9] Yuliant Sibaroni, Dwi Hendratmo Widyantoro, and Masayu Leylia Khodra. Survey on research paper’s relations. InInformation Technology Systems and Innovation (ICITSI), 2015 International Conference on, pages 1–6. IEEE, 2015.

[10] Kritsada Sriphaew and Thanaruk Theeramunkong. Measuring the validity of docu-ment relations discovered from frequent itemset mining. InComputational Intelli-gence and Data Mining, 2007. CIDM 2007. IEEE Symposium on, pages 293–299.

IEEE, 2007.

39

[11] Pang-Ning Tan, Michael Steinbach, Vipin Kumar, and ZhaoHui Tang. Introduction to data mining.

[12] Benjamin Taskar, Eran Segal, and Daphne Koller. Probabilistic classification and clustering in relational data. InInternational Joint Conference on Artificial Intelli-gence, volume 17, pages 870–878. Lawrence Erlbaum Associates LTD, 2001.

[13] Mark Ware and Michael Mabe. The stm report: An overview of scientific and schol-arly journal publishing. 2015.

[14] Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. Data Mining: Prac-tical machine learning tools and techniques. Morgan Kaufmann, 2016.

[15] Yan Yang and Long Yun. Literature recommendation based on reference graph.

InAdvanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on, volume 3, pages V3–400. IEEE, 2010.

[16] Hou-kui Zhou, Hui-min Yu, and Roland Hu. Topic discovery and evolution in scien-tific literature based on content and citations.Frontiers of Information Technology &

Electronic Engineering, 18(10):1511–1524, 2017.