Bilingual Summarization of English and Arabic Genetic Diseases Texts

Authors

DOI:

https://doi.org/10.31686/ijier.vol9.iss9.3349

Keywords:

NLP, medical texts, RNN

Abstract

Health Literacy aims at empowering patients to take better decisions about their health. The quality of Health Literacy for patients with genetic diseases can be enriched via facilitating the bilingual retrieval of summaries about the genetic diseases texts from the net. This paper proposes helps translator to achieve this task by utilizing NLP and Recurrent Neural Network (RNN) techniques for two tasks: generating abstractive summarizations and making Arabic-English translation. Both summarization and translation tasks require training sets that can be built from English summaries corpus and Arabic-English parallel corpora. The English summaries corpus is built from Orphadata while the parallel corpora is built from Wiki articles. The corpus is utilized for generating the English summaries from the Wiki articles, and the corpora is utilized for translating these summaries into Arabic. This paper defines the research problem. Then, it investigates a set of objectives to solve the problem. After that, it presents a literature review of the tasks in the objectives. Finally, it discusses the proposed solution for the problem from the following aspects: the required corpora, the system architecture, the RNN memory cell components architectures, the proposed software for the implementation, and the system evaluation.

Downloads

Download data is not yet available.

Author Biography

  • Zainab Almugbel, Imam Abdulrahman Bin Faisal University

     Lecturer, Computer Science Department, Community College

References

Meilleur, K. G. & Littleton-Kearney, M. T. Interventions to improve patient education regarding multifactorial genetic conditions: a systematic review. American Journal of Medical Genetics Part A 149, 819–830 (2009). DOI: https://doi.org/10.1002/ajmg.a.32723

Howerton, D. A. Medical Information on the Internet. Journal of Pastoral Care & Counseling 73, 52–54 (2019). DOI: https://doi.org/10.1177/1542305019833319

Bustard, D. & Liu, W. Soft-Ware 2002: Computing in an Imperfect World: First International Conference, Soft-Ware 2002 Belfast, Northern Ireland, April 8-10, 2002 Proceedings (Springer, 2003). DOI: https://doi.org/10.1007/3-540-46019-5

Silla, C. N., Pappa, G. L., Freitas, A. A. & Kaestner, C. A. Automatic text summarization with genetic algorithm-based attribute selection in Ibero-American Conference on Artificial Intelligence (2004), 305–314. DOI: https://doi.org/10.1007/978-3-540-30498-2_31

Jaykumar, N. ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization PhD thesis (Wright State University, 2016).

Mantas, J., Hasman, A. & Househ, M. S. Enabling Health Informatics Applications (IOS Press, 2015).

Afantenos, S., Karkaletsis, V. & Stamatopoulos, P. Summarization from medical documents: a survey. Artificial intelligence in medicine 33, 157–177 (2005). DOI: https://doi.org/10.1016/j.artmed.2004.07.017

Moratanch, N. & Chitrakala, S. A survey on abstractive text summarization in 2016 International Conference on Circuit, power and computing technologies (ICCPCT) (2016), 1–7. DOI: https://doi.org/10.1109/ICCPCT.2016.7530193

Mishra, R. et al. Text summarization in the biomedical domain: a systematic review of recent research. Journal of biomedical informatics 52, 457–467 (2014). DOI: https://doi.org/10.1016/j.jbi.2014.06.009

Moawad, I. F. & Aref, M. Semantic graph reduction approach for abstractive Text Summarization in 2012 Seventh International Conference on Computer Engineering & Systems (ICCES) (2012), 132–138. DOI: https://doi.org/10.1109/ICCES.2012.6408498

Le, H. T. & Le, T. M. An approach to abstractive text summarization in 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR) (2013), 371–376. DOI: https://doi.org/10.1109/SOCPAR.2013.7054161

Zhang, H., Fiszman, M., Shin, D., Wilkowski, B. & Rindflesch, T. C. Clustering cliques for graph-based summarization of the biomedical research literature. BMC bioinformatics 14, 182 (2013). DOI: https://doi.org/10.1186/1471-2105-14-182

Bhargava, R., Sharma, Y. & Sharma, G. Atssi: Abstractive text summarization using sentiment infusion. Procedia Computer Science 89, 404–411 (2016). DOI: https://doi.org/10.1016/j.procs.2016.06.088

Khan, A. et al. Abstractive text summarization based on improved semantic graph approach. International Journal of Parallel Programming 46, 992–1016 (2018). DOI: https://doi.org/10.1007/s10766-018-0560-3

Kishore, K., Gopal, G. N. & Neethu, P. Document Summarization in Malayalam with sentence framing in 2016 International Conference on Information Science (ICIS) (2016), 194–200. DOI: https://doi.org/10.1109/INFOSCI.2016.7845326

Azadani, M. N., Ghadiri, N. & Davoodijam, E. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of biomedical informatics 84, 42–58 (2018). DOI: https://doi.org/10.1016/j.jbi.2018.06.005

Gigioli, P., Sagar, N., Rao, A. & Voyles, J. Domain-Aware Abstractive Text Summarization for Medical Documents in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2018), 2338–2343. DOI: https://doi.org/10.1109/BIBM.2018.8621457

Yao, K. et al. Dual encoding for abstractive text summarization. IEEE transactions on cybernetics (2018).

Jose, J. M. et al. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II (Springer Nature, 2020). DOI: https://doi.org/10.1007/978-3-030-45442-5

Song, S., Huang, H. & Ruan, T. Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools and Applications 78, 857–875 (2019). DOI: https://doi.org/10.1007/s11042-018-5749-3

Iwasaki, Y., Yamashita, A., Konno, Y. & Matsubayashi, K. Japanese abstractive text summarization using BERT in 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) (2019), 1–5. DOI: https://doi.org/10.1109/TAAI48200.2019.8959920

Sotudeh, S., Goharian, N. & Filice, R. W. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization. arXiv preprint arXiv:2005.00163 (2020).

Hassan, S. & Mihalcea, R. Cross-lingual semantic relatedness using encyclopedic knowledge in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (2009), 1192–1201. DOI: https://doi.org/10.3115/1699648.1699665

Lee, Y.-Y., Ke, H., Huang, H.-H. & Chen, H.-H. Combining word embedding and lexical database for semantic relatedness measurement in Proceedings of the 25th International Conference Companion on World Wide Web (2016), 73–74. DOI: https://doi.org/10.1145/2872518.2889395

Navigli, R. & Ponzetto, S. P. BabelRelate! a joint multilingual approach to computing semantic relatedness in Twenty-Sixth AAAI Conference on Artificial Intelligence (2012).

Bhingardive, S., Redkar, H., Sappadla, P., Singh, D. & Bhattacharyya, P. Indowordnet:: similarity computing semantic similarity and relatedness using indowordnet in Global WordNet Conference (2016), 39.

Camacho-Collados, J., Pilehvar, M. T., Collier, N. & Navigli, R. Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) (2017), 15–26. DOI: https://doi.org/10.18653/v1/S17-2002

Speer, R. & Lowry-Duda, J. Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. arXiv preprint arXiv:1704.03560 (2017). DOI: https://doi.org/10.18653/v1/S17-2008

Yu, Z., Wallace, B. C., Johnson, T. & Cohen, T. Retrofitting concept vector representations of medical concepts to improve estimates of semantic similarity and relatedness. Studies in health technology and informatics 245, 657 (2017).

Abdedda"ım, S., Vimard, S. & Soualmia, L. F. The MeSH-gram Neural Network Model: Extending word embedding vectors with MeSH concepts for UMLS semantic similarity and relatedness in the biomedical domain. arXiv preprint arXiv:1812.02309 (2018).

Henry, S., Cuffy, C. & McInnes, B. T. Vector representations of multi-word terms for semantic relatedness. Journal of biomedical informatics 77, 111–119 (2018). DOI: https://doi.org/10.1016/j.jbi.2017.12.006

Heo, G. E. & Xie, Q. A Hybrid Semantic Relatedness Algorithm by Entity CoOccurrence and Specialized Word Embeddings in 2019 IEEE International Conference on Healthcare Informatics (ICHI) (2019), 1–2. DOI: https://doi.org/10.1109/ICHI.2019.8904663

Glasgow, K., Roos, M., Haufler, A., Chevillet, M. & Wolmetz, M. Evaluating semantic models with word-sentence relatedness. arXiv preprint arXiv:1603.07253 (2016).

Siblini, R. & Kosseim, L. CLaC: Semantic relatedness of words and phrases. arXiv preprint arXiv:1708.05801 (2017).

He, H., Gimpel, K. & Lin, J. Multi-perspective sentence similarity modeling with convolutional neural networks in Proceedings of the 2015 conference on empirical methods in natural language processing (2015), 1576–1586. DOI: https://doi.org/10.18653/v1/D15-1181

Tian, J., Zhou, Z., Lan, M. & Wu, Y. Ecnu at semeval-2017 task 1: Leverage kernelbased traditional nlp features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity in Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017) (2017), 191–197. DOI: https://doi.org/10.18653/v1/S17-2028

GOMAA, W. H. A MULTI-LAYER SYSTEM FOR SEMANTIC RELATEDNESS EVAL-

UATION. Journal of Theoretical and Applied Information Technology 97 (2019).

Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I. & Specia, L. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv: 1708.00055 (2017). DOI: https://doi.org/10.18653/v1/S17-2001

Hu, Y., Ye, X. & Shaw, S.-L. Extracting and analyzing semantic relatedness between cities using news articles. International Journal of Geographical Information Science 31, 2427–2451 (2017). DOI: https://doi.org/10.1080/13658816.2017.1367797

Khan, M., Ramzan, S., Khan, S., Hassan, S. & Saeed, K. Measuring Text-Based Semantics Relatedness Using WordNet. International Journal of Cognitive and Language Sciences 13, 316–319 (2019).

Al-Ajmi, H. A new English–Arabic parallel text corpus for lexicographic applications. Lexikos 14 (2004). DOI: https://doi.org/10.5788/14-0-696

Alotaibi, H. M. Arabic-English parallel corpus: a new resource for translation training and language teaching. Arab World English Journal (AWEJ) Volume 8 (2017). DOI: https://doi.org/10.2139/ssrn.3053572

Zeroual, I. & Lakhouaja, A. MulTed: A multilingual aligned and tagged parallel corpus. Applied Computing and Informatics (2020). DOI: https://doi.org/10.1016/j.aci.2018.12.003

Ahmad, A. A.-S., Hammo, B. & Yagi, S. ENGLISH-ARABIC POLITICAL PARALLEL CORPUS: CONSTRUCTION, ANALYSIS AND A CASE STUDY IN TRANSLATION

STRATEGIES. Jordanian Journal of Computers and Information Technology (JJCIT) 3 (2017).

Park, J., Kim, K., Hwang, W. & Lee, D. Concept embedding to measure semantic relatedness for biomedical information ontologies. Journal of biomedical informatics 94, 103182 (2019). DOI: https://doi.org/10.1016/j.jbi.2019.103182

Nakamura, T., Shirakawa, M., Hara, T. & Nishio, S. Wikipedia-Based Relatedness Measurements for Multilingual Short Text Clustering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 1–25 (2018). DOI: https://doi.org/10.1145/3276473

Strube, M. & Ponzetto, S. P. WikiRelate! Computing semantic relatedness using Wikipedia in AAAI 6 (2006), 1419–1424.

Morgan, J. T. et al. Are we there yet?: The development of a corpus annotated for social acts in multilingual online discourse. Dialogue & Discourse 4, 1–33 (2013). DOI: https://doi.org/10.5087/dad.2013.201

Kim Jung, J. H. Gender bias in natural language processing: BioCorpus-5, a preliminary multilingual Gender-Balanced Corpus of in-domain wikipedia biographies B.S. thesis (Universitat Politécnica de Catalunya, 2019).

Frej, J., Schwab, D. & Chevallet, J.-P. MLWIKIR: A Python toolkit for building largescale Wikipedia-based Information Retrieval Datasets in Chinese, English, French, Italian, Japanese, Spanish and more.

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016). DOI: https://doi.org/10.1162/tacl_a_00051

Řehůřek, R. & Sojka, P. Software Framework for Topic Modelling with Large Corpora English. in Proceedings of the LREC 2010 Workshop on New Challenges for NLP

Frameworks http://is.muni.cz/publication/884893/en (ELRA, Valletta, Malta, May 2010), 45–50.

Navigli, R. & Ponzetto, S. P. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence 193, 217–250 (2012). DOI: https://doi.org/10.1016/j.artint.2012.07.001

Speer, R., Chin, J. & Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, 4444–4451. http://aaai.org/ocs/index.php/AAAI/AAAI17/ paper/view/14972 (2017). DOI: https://doi.org/10.1609/aaai.v31i1.11164

Goldberg, Y. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies 10, 1–309 (2017). DOI: https://doi.org/10.2200/S00762ED1V01Y201703HLT037

Farzad, A., Mashayekhi, H. & Hassanpour, H. A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Computing and Applications 31, 2507–2521 (2019). DOI: https://doi.org/10.1007/s00521-017-3210-6

Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). DOI: https://doi.org/10.3115/v1/D14-1179

Cohan, A. et al. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018). DOI: https://doi.org/10.18653/v1/N18-2097

Yang, S., Wang, Y. & Chu, X. A Survey of Deep Learning Techniques for Neural Machine Translation. ArXiv abs/2002.07526 (2020).

Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997). DOI: https://doi.org/10.1109/78.650093

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural computation 9, 1735–1780 (1997). DOI: https://doi.org/10.1162/neco.1997.9.8.1735

Gers, F. A., Schmidhuber, J. & Cummins, F. Learning to forget: Continual prediction with LSTM (1999). DOI: https://doi.org/10.1049/cp:19991218

Xu, K. et al. Show, attend and tell: Neural image caption generation with visual attention in International conference on machine learning (2015), 2048–2057.

Kalchbrenner, N. & Blunsom, P. Recurrent continuous translation models in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013), 1700–1709.

Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

Freitag, M. & Al-Onaizan, Y. Beam search strategies for neural machine translation. arXiv preprint arXiv:1702.01806 (2017). DOI: https://doi.org/10.18653/v1/W17-3207

Nwankpa, C., Ijomah, W., Gachagan, A. & Marshall, S. Activation functions: Comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378 (2018).

Nguyen, G. et al. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artificial Intelligence Review 52, 77–124 (2019). DOI: https://doi.org/10.1007/s10462-018-09679-z

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12, 2825–2830 (2011).

Oliphant, T. E. Python for scientific computing. Computing in Science & Engineering 9, 10–20 (2007). DOI: https://doi.org/10.1109/MCSE.2007.58

Bird, S., Klein, E. & Loper, E. Natural language processing with Python: analyzing text with the natural language toolkit (" O’Reilly Media, Inc.", 2009).

Řehůřek, R. & Sojka, P. Gensim—statistical semantics in python. Retrieved from genism. org (2011).

Srinivasa-Desikan, B. Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras (Packt Publishing Ltd, 2018).

McKinney, W. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython (" O’Reilly Media, Inc.", 2012).

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation in Proceedings of the 40th annual meeting of the Association for Computational Linguistics (2002), 311–318. DOI: https://doi.org/10.3115/1073083.1073135

Dew, K. N., Turner, A. M., Choi, Y. K., Bosold, A. & Kirchhoff, K. Development of machine translation technology for assisting health communication: A systematic review. Journal of biomedical informatics 85, 56–67 (2018). DOI: https://doi.org/10.1016/j.jbi.2018.07.018

Downloads

Published

2021-09-01

How to Cite

Almugbel, Z. (2021). Bilingual Summarization of English and Arabic Genetic Diseases Texts . International Journal for Innovation Education and Research, 9(9), 342-373. https://doi.org/10.31686/ijier.vol9.iss9.3349
Received 2021-07-28
Accepted 2021-08-13
Published 2021-09-01