3.10 Trakassering og seksuell trakassering
3.10.2 Trakassering og seksuell trakassering (saker som ikke tar opp
No que ao processo de crowdsourcingdiz respeito, seria de relevo complementar os incentivos proveni- entes dos componentes degamification, como pontos ebadges, com incentivos de caráter monetário, de modo a determinar se existiria um impacto no número de usuários a participar ativamente no processo de
crowdsourcing. Na temática dos modelos de classificação, e em concreto os direcionados para a classifi- cação de frases deabstracts, poderão ser desenvolvidos modelos que tenham arquiteturas semelhantes e que possam ser comparados.
Além do mais, também existem várias outras áreas que podem ser exploradas, nomeadamente no que diz respeito a modelos e técnicas deText Miningpara auxiliar as etapas da revisão sistemática, como a etapa de extração de dados.
Na plataforma, a integração dos componentes de crowdsourcinge de classificação de abstracts, de forma a criar um sistema de aprendizagem ativa, é uma possibilidade que poderá ser explorada.
Durante a finalização da escrita desta dissertação, em Outubro de 2018, foi descoberto o artigo de Jin e Szolovits (2018), que seguindo uma investigação independente, também abordaram o mesmo conjunto de dados, propondo, tal como neste trabalho, uma rede neuronal que utiliza uma camada recorrente (mas distinta, do tipo LSTM) para modelar interdependências entre frases dosabstracts. Nesse sentido, seria relevante no futuro efetuar comparações de resultados com este modelo.
Referências Bibliográficas
Amado, A., Cortez, P., Rita, P., & Moro, S. (2018). Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis. European Research on Management and Business Economics,24(1), 1–7. Retrieved from http://www.sciencedirect.com/science/article/ pii/S2444883417300268 doi: https://doi.org/10.1016/j.iedeen.2017.06.002
Barbounis, T. G., Theocharis, J. B., Alexiadis, M. C., & Dokopoulos, P. S. (2006). Long-term wind speed and power forecasting using local recurrent neural network models. IEEE Transactions on Energy Conversion,21(1), 273–284.
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks,5(2), 157–166.
Biba, M., & Gjati, E. (2014). Boosting text classification through stemming of composite words. InRecent advances in intelligent informatics(pp. 185–194). Springer.
Biolchini, J., Mian, P. G., Natali, A. C. C., & Travassos, G. H. (2005). Systematic review in software engineering. System Engineering and Computer Science Department COPPE/UFRJ, Technical Report ES,679(05), 45.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research,3, 993–1022.
Boudin, F., Nie, J.-Y., Bartlett, J. C., Grad, R., Pluye, P., & Dawes, M. (2010). Combining classifiers for robust pico element detection. BMC medical informatics and decision making,10(1), 29.
Brabham, D. C. (2008). Crowdsourcing as a model for problem solving: An introduction and cases.
Convergence,14(1), 75–90.
Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., & Khalil, M. (2007). Lessons from applying the systematic literature review process within the software engineering domain. Journal of systems and software,80(4), 571–583.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 Step-by-step data mining guide.
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoder–decoder for statistical machine translation. In
Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp)
(pp. 1724–1734).
networks on sequence modeling. InNips 2014 workshop on deep learning, december 2014.
Cohen, A. M., & Hersh, W. R. (2005, mar). A survey of current work in biomedical text mining. Briefings in Bioinformatics,6(1), 57–71. Retrieved from http://dx.doi.org/10.1093/bib/6.1.57
Cohen, S. (2016). Bayesian Analysis in Natural Language Processing. Morgan & Claypool Publishers. Cook, D. J., Mulrow, C. D., & Haynes, R. B. (1997). Systematic reviews: synthesis of best evidence for
clinical decisions. Annals of internal medicine,126(5), 376–380.
Delen, D., & Crossland, M. D. (2008). Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications,34(3), 1707–1720.
Dernoncourt, F. (2017). Sequential short-text classification with neural networks. Unpublished doctoral dissertation, Massachusetts Institute of Technology.
Dernoncourt, F., & Lee, J. Y. (2017). Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. InProceedings of the eighth international joint conference on natural language processing (volume 2: Short papers)(Vol. 2, pp. 308–313).
Dernoncourt, F., Lee, J. Y., & Szolovits, P. (2017). Neural networks for joint sentence classification in medical paper abstracts. In Proceedings of the 15th conference of the european chapter of the association for computational linguistics: Volume 2, short papers(Vol. 2, pp. 694–700).
Deterding, S., Dixon, D., Khaled, R., & Nacke, L. (2011). From game design elements to gamefulness: defining gamification. In Proceedings of the 15th international academic mindtrek conference: Envisioning future media environments(pp. 9–15).
Dias, Á. M., Mansur, C. G., Myczkowski, M., & Marcolin, M. (2011). Whole field tendencies in trans- cranial magnetic stimulation: A systematic review with data and text mining. Asian Journal of Psychiatry, 4(2), 107–112. Retrieved from http://www.sciencedirect.com/science/article/pii/ S1876201811000372 doi: https://doi.org/10.1016/j.ajp.2011.03.003
Dillon, T., Wu, C., & Chang, E. (2010). Cloud computing: issues and challenges. InAdvanced information networking and applications (aina), 2010 24th ieee international conference on(pp. 27–33). Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic
optimization. Journal of Machine Learning Research,12(Jul), 2121–2159.
Dumitrache, A., Aroyo, L., Welty, C., Sips, R., & Levas, A. (2013). Dr. detective: combining gamification techniques and crowdsourcing to create a gold standard for the medical domain. Crowdsourcing the Semantic Web.
Ekbal, A., & Bandyopadhyay, S. (2010). Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engi-
neering,4(2), 155–170.
Fawcett, T. (2006). An introduction to roc analysis. Pattern recognition letters,27(8), 861–874.
Feinerer, I., Buchta, C., Geiger, W., Rauch, J., Mair, P., & Hornik, K. (2013). The textcat package for n-gram based text categorization in r. Journal of statistical software,52(6), 1–17.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. InProceedings of the fourteenth international conference on artificial intelligence and statistics(pp. 315–323).
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research,57, 345–420.
Grishman, R., & Sundheim, B. (1996). Message understanding conference-6: A brief history. InColing 1996 volume 1: The 16th international conference on computational linguistics(Vol. 1).
Guerreiro, J., Rita, P., & Trigueiros, D. (2016). A text mining-based review of cause-related marketing literature. Journal of Business Ethics,139(1), 111–128.
Gupta, V., & Lehal, G. S. (2009). A survey of text mining techniques and applications. Journal of emerging technologies in web intelligence,1(1), 60–76.
Hashimoto, K., Kontonatsios, G., Miwa, M., & Ananiadou, S. (2016). Topic detection using para- graph vectors to support active learning in systematic reviews. Journal of Biomedical Informa- tics, 62(Supplement C), 59–65. Retrieved from http://www.sciencedirect.com/science/article/ pii/S1532046416300442 doi: https://doi.org/10.1016/j.jbi.2016.06.001
He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management,33(3), 464–472.
Hossain, M. (2012). Users’ motivation to participate in online crowdsourcing platforms. In Innovation management and technology research (icimtr), 2012 international conference on(pp. 310–315). Hsieh, T.-J., Hsiao, H.-F., & Yeh, W.-C. (2011). Forecasting stock markets using wavelet transforms and
recurrent neural networks: An integrated system based on artificial bee colony algorithm. Applied soft computing,11(2), 2510–2525.
Huang, A. (2008). Similarity measures for text document clustering. In Proceedings of the sixth new zealand computer science research student conference (nzcsrsc2008), christchurch, new zealand
(pp. 49–56).
Jin, D., & Szolovits, P. (2018). Hierarchical neural networks for sequential sentence classification in medical scientific abstracts. arXiv preprint arXiv:1808.06161.
Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014, June). A convolutional neural network for modelling sentences.Proceedings of the 52nd Annual Meeting of the Association for Computational
Linguistics.
Kim, Y. (2014). Convolutional neural networks for sentence classification. InProceedings of the 2014 conference on empirical methods in natural language processing (emnlp)(pp. 1746–1751). Kiritchenko, S., de Bruijn, B., Carini, S., Martin, J., & Sim, I. (2010). ExaCT: automatic extraction of clinical
trial characteristics from journal publications.BMC medical informatics and decision making,10(1), 56.
Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University,
33(2004), 1–26.
Korde, V., & Mahender, C. N. (2012). Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications,3(2), 85.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. InAdvances in neural information processing systems(pp. 1097–1105).
Kuhrmann, M., Fernández, D. M., & Daneva, M. (2017). On the pragmatic design of literature studies in software engineering: an experience-based guideline. Empirical software engineering, 22(6), 2852–2891.
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. InProceedings of naacl-hlt(pp. 260–270).
Lan, M., Sung, S.-Y., Low, H.-B., & Tan, C.-L. (2005). A comparative study on term weighting schemes for text categorization. InNeural networks, 2005. ijcnn’05. proceedings. 2005 ieee international joint conference on(Vol. 1, pp. 546–551). IEEE.
LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010). Convolutional networks and applications in vision. In
Circuits and systems (iscas), proceedings of 2010 ieee international symposium on(pp. 253–256). Liao, W., & Veeramachaneni, S. (2009). A Simple Semi-supervised Algorithm For Named Entity Recogni-
tion..
Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information.
IBM Journal of research and development,1(4), 309–317.
Maralte BV, & Publishing Research Consortium. (2016). Text Mining of Journal Literature 2016
(Tech. Rep.). Retrieved 06-01-2018, from http://publishingresearchconsortium.com/index.php/ prc-documents/prc-research-projects/54-prc-text-mining-of-journal-literature-2016
Marshall, I. J., Kuiper, J., & Wallace, B. C. (2014). Automating risk of bias assessment for clinical trials. In Proceedings of the 5th acm conference on bioinformatics, computational biology, and health informatics(pp. 88–95). ACM.
Massung, E., Coyle, D., Cater, K. F., Jay, M., & Preist, C. (2013). Using crowdsourcing to support pro- environmental community activism. InProceedings of the sigchi conference on human factors in computing systems(pp. 371–380).
McNamara, D. S., Kintsch, E., Songer, N. B., & Kintsch, W. (1996). Are good texts always better? interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cognition and instruction,14(1), 1–43.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. InAdvances in neural information processing systems(pp. 3111–3119).
Millard, L. A. C., Flach, P. A., & Higgins, J. P. T. (2015). Machine learning to assist risk-of-bias assessments in systematic reviews. International journal of epidemiology,45(1), 266–277.
Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., & Delen, D. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Academic Press.
Miwa, M., Thomas, J., O’Mara-Eves, A., & Ananiadou, S. (2014). Reducing systematic review workload through certainty-based screening. Journal of biomedical informatics,51, 242–253.
Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications,42(3), 1314–1324.
Moro, S., Cortez, P., & Rita, P. (2016). An automated literature analysis on data mining applications to credit risk assessment. In In artificial intelligence in financial markets(pp. 161–177). London: Palgrave Macmillan UK. Retrieved from https://doi.org/10.1057/978-1-137-48880-0{_}6 doi: 10.1057/978-1-137-48880-0_6
Moro, S., & Rita, P. (2017). Brand strategies in social media in hospitality and tourism. International Journal of Contemporary Hospitality Management.
Moro, S., Rita, P., & Cortez, P. (2017). A text mining approach to analyzing Annals literature. Annals of Tourism Research,66, 208–210. Retrieved from http://www.sciencedirect.com/science/article/ pii/S0160738317300968 doi: https://doi.org/10.1016/j.annals.2017.07.011
Morschheuser, B., Hamari, J., & Koivisto, J. (2016). Gamification in crowdsourcing: a review. InSystem sciences (hicss), 2016 49th hawaii international conference on(pp. 4375–4384).
zation. InInternational joint conference on artificial intelligence(pp. 343–354). Springer.
Moura, B. C. (2014). Inteligência coletiva para análise de sentimento sobre mensagens da plataforma stocktwits.
Ng, A. (2017). Machine learning yearning.
Ng, L., Pitt, V., Huckvale, K., Clavisi, O., Turner, T., Gruen, R., & Elliott, J. H. (2014). and Abstract Screening and Evaluation in Systematic Reviews (TASER): a pilot randomised controlled trial of title and abstract screening by medical students. Systematic reviews,3(1), 121.
Nielsen, J. (2006). The 90-9-1 rule for participation inequality in social media and online communities.
Retrieved 18-06-2018, from https://www.nngroup.com/articles/participation-inequality
Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence,194, 151–175.
Oliveira, N., Cortez, P., & Areal, N. (2013). On the predictability of stock market behavior using StockTwits sentiment and posting volume. In Progress in artificial intelligence: 16th portuguese conference on artificial intelligence, epia 2013: proceedings(pp. 355–365). Springer.
Oliveira, N., Cortez, P., & Areal, N. (2017). The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Systems with Applications,73, 125–144. Retrieved from http://www.sciencedirect.com/science/ article/pii/S0957417416307187 doi: https://doi.org/10.1016/j.eswa.2016.12.036
Olorisade, B. K., de Quincey, E., Brereton, P., & Andras, P. (2016). A Critical Analysis of Studies That Address the Use of Text Mining for Citation Screening in Systematic Reviews. InProceedings of the 20th international conference on evaluation and assessment in software engineering (pp. 14:1—- 14:11). New York, NY, USA: ACM.
Panettieri, J. (2017). Cloud Market Share 2017: Amazon AWS, Microsoft Azure, IBM, Google. Retri- eved 11-06-2018, from https://www.channele2e.com/channel-partners/csps/cloud-market-share -2017-amazon-microsoft-ibm-google
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In
Proceedings of the 2014 conference on empirical methods in natural language processing (emnlp)
(pp. 1532–1543).
Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137. Retrieved from http:// www.emeraldinsight.com/doi/10.1108/eb046814 doi: 10.1108/eb046814
Porter, M. F. (2001). Snowball: A language for stemming algorithms. Retrieved from http://snowball .tartarus.org/texts/introduction.html
Ram, K., & Broman, K. (2017). arxiv: Interface to the arxiv api [Computer software manual]. Retrieved from https://github.com/ropensci/aRxiv (R package version 0.5.17)
Ramasubramanian, C., & Ramya, R. (2013). Effective pre-processing activities in text mining using improved porter’s stemming algorithm. International Journal of Advanced Research in Computer and Communication Engineering,2(12).
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). Cnn features off-the-shelf: an astounding baseline for recognition. InComputer vision and pattern recognition workshops (cvprw), 2014 ieee conference on(pp. 512–519).
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. InProceedings of the ieee conference on computer vision and pattern recognition(pp. 779–788).
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ”why should I trust you?”: Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, san francisco, ca, usa, august 13-17, 2016(pp. 1135–1144).
Rußwurm, M., & Körner, M. (2018). Multi-temporal land cover classification with sequential recurrent encoders. ISPRS International Journal of Geo-Information,7(4), 129.
Saha, T. K., Ouzzani, M., Hammady, H. M., & Elmagarmid, A. K. (2016). A large scale study of SVM based methods for abstract screening in systematic reviews. arXiv preprint arXiv:1610.00192.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management,24(5), 513–523.
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing.Communications of the ACM,18(11), 613–620.
Seyler, D., Li, L., & Zhai, C. (2018). Identifying compromised accounts on social media using statistical text analysis. arXiv preprint arXiv:1804.07247.
Shemilt, I., Khan, N., Park, S., & Thomas, J. (2016). Use of cost-effectiveness analysis to compare the efficiency of study identification methods in systematic reviews. Systematic reviews,5(1), 140. Shi, S., Wang, Q., Xu, P., & Chu, X. (2016). Benchmarking state-of-the-art deep learning software tools.
InCloud computing and big data (ccbd), 2016 7th international conference on(pp. 99–104). Singh, J., & Gupta, V. (2017). A systematic review of text stemming techniques. Artificial Intelligence
Review,48(2), 157–217. Retrieved from https://doi.org/10.1007/s10462-016-9498-2 doi: 10 .1007/s10462-016-9498-2
Journal of documentation,28(1), 11–21.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research,15(1), 1929–1958.
Suciu, G., Scheianu, A., & Vochin, M. (2017). Disaster early warning using time-critical iot on elastic cloud workbench. InBlack sea conference on communications and networking (blackseacom), 2017 ieee international(pp. 1–5).
Szarvas, G., Farkas, R., & Kocsor, A. (2006). A multilingual named entity recognition system using boosting and c4. 5 decision tree learning algorithms. InInternational conference on discovery science(pp. 267–278).
Tan, A.-H. (1999). Text mining: The state of the art and the challenges. InProceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases(Vol. 8, pp. 65–70). sn.
Tanabe, L., Xie, N., Thom, L. H., Matten, W., & Wilbur, W. J. (2005). GENETAG: a tagged corpus for gene/protein named entity recognition. BMC bioinformatics,6(1), S3.
Toman, M., Tesar, R., & Jezek, K. (2006). Influence of word normalization on text classification. Procee- dings of InSciT,4, 354–358.
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics.Journal of artificial intelligence research,37, 141–188.
Wallace, B. C., Kuiper, J., Sharma, A., Zhu, M., & Marshall, I. J. (2016). Extracting pico sentences from clinical trial reports using supervised distant supervision. Journal of machine learning research : JMLR,17.
Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2010). Active learning for biomedical citation screening. InProceedings of the 16th acm sigkdd international conference on knowledge discovery and data mining(pp. 173–182). ACM.
Wang, Z., Asi, N., Elraiyah, T. A., Dabrh, A. M. A., Undavalli, C., Glasziou, P., … Murad, M. H. (2014). Dual computer monitors to increase efficiency of conducting systematic reviews. Journal of clinical epidemiology,67(12), 1353–1357.
Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: Writing a literature review.
MIS quarterly, xiii–xxiii.
Wei, X., & Croft, W. B. (2006). Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international acm sigir conference on research and development in information retrieval(pp. 178–185).
Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science,330 6004, 686-8.
Yang, X., Macdonald, C., & Ounis, I. (2017). Using word embeddings in twitter election classification.
Information Retrieval Journal, 1–25.
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709.
Zhang, S., & Elhadad, N. (2013). Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. Journal of Biomedical Informatics,46, 1088–1098.
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In
Advances in neural information processing systems(pp. 649–657).
Zheng, H., Li, D., & Hou, W. (2011). Task design, motivation, and participation in crowdsourcing contests.