Machine Learning and Finance: A Review using Latent Dirichlet Allocation Technique (LDA)

Machine Learning and Finance

A Review using Latent Dirichlet Allocation Technique (LDA)


  • Ahmed Sameer El Khatib Centro Universitário Fundação Assis Gurgacz image/svg+xml



Machine Learning, topic modelling, structuring finance research, latent dirichlet allocation


The aim of this paper is provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modelling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation Technique (LDA), we can extract the 14 coherent research topics that are the focus of the 6,148 academic articles during the years 1990-2019 analysed. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modelling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors.


Download data is not yet available.

Author Biography

Ahmed Sameer El Khatib, Centro Universitário Fundação Assis Gurgacz

Professor, Deptartament of Accounting


Abdou, H. A., Alam, S. T.; Mulkeen, J. (2014). Would credit scoring work for islamic finance? A neural network approach. International Journal of Islamic and Middle Eastern Finance and Management, 7(1):112–125. DOI:

Abraham, A. (2002). Analysis of hybrid soft and hard computing techniques for forex monitoring systems. In 2002 IEEE World Congress on Computational Intelligence, volume 2, pages 1616–1621. IEEE. DOI:

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4):589–609. DOI:

Altman, E. I., Marco, G.; Varetto, F. (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). Journal of Banking & Finance, 18(3):505–529. DOI:

AmirAskari, M.; Menhaj, M. B. (2016). A modified fuzzy relational model ap- proach to prediction of foreign exchange rates. In 2016 4th International Conference on Control, Instrumentation, and Automation (ICCIA), pages 457–461. IEEE. DOI:

Araújo, R. d. A., de Oliveira, A. L.; Soares, S. C. (2010). A quantum-inspired hybrid methodology for financial time series prediction. In The 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE. DOI:

Athey, S. (2018). The impact of machine learning on economics. In Ajay K. Agrawal, J. G. and Goldfarb, A., editors, The Economics of Artificial Intelligence: An Agenda. University of Chicago Press.

Aziz, S., Michael D., Helmi H.; A. Piepenbrink. Machine learning in finance: A topic modellng approach. In: 1st International Banking and Finance Research Conference, Agadir, Morocco, October 2019. DOI:

Bhattacharya, S.; Ghosh, S. (2007). An artificial intelligence based approach for risk management using attack graph. In 2007 International Conference on Computational Intelligence and Security, pages 794–798. IEEE. DOI:

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77–84. DOI:

Blei, D. M., Ng, A. Y.; Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022.

Bollen, J., Mao, H.; Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8. DOI:

Boyd-Graber, J., Hu, Y.; Mimno, D. (2017). Applications of topic models. Foundations and Trends in Information Retrieval, 11(2-3):143–296. DOI:

Cerchiello, P., Giudici, P.; Nicola, G. (2017). Twitter data models for bank risk contagion. Neurocomputing, 264:50–56. DOI:

Chavarnakul, T.; Enke, D. (2008). Intelligent technical analysis based equiv- olume charting for stock trading using neural networks. Expert Systems with Applications, 34(2):1004–1017. DOI:

Chellaboina, V., Bhatia, A.; Bhat, S. P. (2013). Explicit formulas for optimal hedging stratergies for European contingent claims. In 2013 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pages 122–127. IEEE. DOI:

Cheng, D.; Cirillo, P. (2018). A reinforced urn process modelling of recovery rates and recovery times. Journal of Banking & Finance, 96:1–17. DOI:

Dempster, M. A., Payne, T. W., Romahi, Y.; Thompson, G. W. (2001). Computational learning techniques for intraday FX trading using popular technical indicators. IEEE Transactions on neural networks, 12(4):744–754. DOI:

Dyer, T., Lang, M.; Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from latent dirichlet allocation. Journal of Accounting and Economics, 64(2-3):221–245. DOI:

Ferreira, J. Z., Rodrigues, J., Cristo, M.; de Oliveira, D. F. (2014). Multi-entity polarity analysis in financial documents. In Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, pages 115–122. ACM. DOI:

Figini, S., Bonelli, F.; Giovannini, E. (2017). Solvency prediction for small and medium enterprises in banking. Decision Support Systems, 102:91–97. DOI:

Ghasemiyeh, R., Moghdani, R.; Sana, S. S. (2017). A hybrid artificial neural network with metaheuristic algorithms for predicting stock price. Cybernetics and Systems, 48(4):365–392. DOI:

Goh, Y. M.; Chua, D. (2009). Case-based reasoning approach to construction safety hazard identification: Adaptation and utilization. Journal of Construction Engineering and Management, 136(2):170–178. DOI:

Griffiths, T. L.; Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101:5228–5235. DOI:

Harvey, C. R., Liechty, J. C., Liechty, M. W.; Müller, P. (2010). Portfolio selection with higher moments. Quantitative Finance, 10(5):469–485. DOI:

Hawley, D. D., Johnson, J. D.; Raina, D. (1990). Artificial neural systems: A new tool for financial decision-making. Financial Analysts Journal, 46(6):63–72. DOI:

Heaton, J., Polson, N. G.; Witte, J. H. (2016). Deep learning in finance. arXiv preprint arXiv:1602.06561.

Hornik, K.; Grün, B. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13):1–30. DOI:

Hossain, A.; Nasser, M. (2011). Comparison of the finite mixture of ARMA- GARCH, back propagation neural networks and support-vector machines in fore- casting financial returns. Journal of Applied Statistics, 38(3):533–551. DOI:

Huang, D., Thottan, M.; Feather, F. (2013). Designing customized energy services based on disaggregation of heating usage. In 2013 IEEE PES Innovative Smart Grid Technologies (ISGT), pages 1–6. IEEE. DOI:

Hussain, A. J., Al-Jumeily, D., Al-Askar, H.; Radi, N. (2016). Regularized dynamic self-organized neural network inspired by the immune algorithm for financial time series prediction. Neurocomputing, 188:23–30. DOI:

Ince, H.; Trafalis, T. B. (2008). Short term forecasting with support vector ma- chines and application to stock price prediction. International Journal of General Systems, 37(6):677–687. DOI:

Ito, T., Sakaji, H., Izumi, K., Tsubouchi, K.; Yamashita, T. (2017). Development of sentiment indicators using both unlabeled and labeled posts. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–8. IEEE. DOI:

Junyou, B. (2007). Stock price forecasting using PSO-trained neural networks. In IEEE Congress on Evolutionary Computation, pages 2879–2885. IEEE.

Kampouridis, M.; Otero, F. E. (2017). Heuristic procedures for improving the predictability of a genetic programming financial forecasting algorithm. Soft Computing, 21(2):295–310. DOI:

Kampouridis, M. and Tsang, E. (2010). EDDIE for investment opportunities forecast- ing: Extending the search space of the GP. In 2010 IEEE Congress on Evolutionary Computation (CEC), pages 1–8. IEEE. DOI:

Kaplan, S.; Vakili, K. (2015). The double-edged sword of recombination in breakthrough innovation. Strategic Management Journal, 36(10):1435–1457. DOI:

Khandani, A. E., Kim, A. J.; Lo, A. W. (2010). Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787. DOI:

Kim, S. (1998). Time-delay recurrent neural network for temporal correlations and prediction. Neurocomputing, 20(1-3):253–263. DOI:

Kim, Y. S.; Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with Applications, 26(4):567–573. DOI:

Kodogiannis, V.; Lolis, A. (2002). Forecasting financial time series using neural network and fuzzy system-based techniques. Neural Computing & Applications, 11(2):90–102. DOI:

Krippendorff, K. (1970). Estimating the reliability, systematic error and random error of interval data. Educational and Psychological Measurement, 30(1):61–70. DOI:

Liao, Z.; Wang, J. (2010). Forecasting model of global stock index by stochastic time effective neural network. Expert Systems with Applications, 37(1):834–841. DOI:

Liu, F.; Wang, J. (2012). Fluctuation prediction of stock market index by Legen- dre neural network with random time strength function. Neurocomputing, 83:12– 21. DOI:

Lumezanu, C., Feamster, N.; Klein, H. (2012). # bias: Measuring the tweeting behavior of propagandists. In Sixth International AAAI Conference on Weblogs and Social Media.

Ma, Y., Gong, X.; Tian, G. (2014). A mean-semi-variance portfolio opti- mization model with full transaction costs. In 2014 International Conference on Computational Intelligence and Communication Networks (CICN), pages 623–627. IEEE. DOI:

Mahalingam, P.; Vivek, S. (2016). Predicting financial savings decisions using sigmoid function and information gain ratio. Procedia Computer Science, 93:19–25. DOI:

Marmier, F., Ioana, F. D., and Didier, G. (2014). Strategic decision-making in NPD projects according to risk: Application to satellites design projects. Computers in Industry, 65(8):1107 – 1114. DOI:

Medeiros, C. M.; Barreto, G. A. (2007). Pruning the multilayer percep- tron through the correlation of backpropagated errors. In Seventh International Conference on Intelligent Systems Design and Applications, pages 64–69. IEEE. DOI:

Michaud, R. O. (1989). The Markowitz optimization enigma: Is ‘optimized’ optimal? Financial Analysts Journal, 45(1):31–42. DOI:

Miglietta, N.; Remondino, M. (2009). Modelling cognitive distortions of be- havioural finance. In International Conference on Computational Intelligence, Modelling and Simulation, 2009., pages 204–209. IEEE. DOI:

Mishra, A., Irwin, D., Shenoy, P., Kurose, J.; Zhu, T. (2012). Smartcharge: Cutting the electricity bill in smart homes with energy storage. In Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet, page 29. ACM. DOI:

Moerland, T. M., Broekens, J., and Jonker, C. M. (2018). Emotion in reinforcement learning agents and robots: A survey. Machine Learning, 107(2):443–480. DOI:

Mogre, R., D’Amico, F., et al. (2016). A decision framework to mitigate supply chain risks: An application in the offshore-wind industry. IEEE Transactions on Engineering Management, 63(3):316–325. DOI:

Moosa, I. A. (2007). Operational Risk Management. Springer. DOI:

Moro, S., Cortez, P.; Rita, P. (2015). Business intelligence in banking: A liter- ature analysis from 2002 to 2013 using text mining and latent dirichlet allocation. Expert Systems with Applications, 42(3):1314–1324. DOI:

Mukwazvure, A.; Supreethi, K. (2015). A hybrid approach to sentiment analysis of news comments. In 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), pages 1–6. IEEE. DOI:

Mullainathan, S.; Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2):87–106. DOI:

Ng, W. W., Liang, X.-L., Chan, P. P.; Yeung, D. S. (2011). Stock investment decision support for Hong Kong market using RBFNN based candlestick models. In 2011 International Conference on Machine Learning and Cybernetics (ICMLC), volume 2, pages 538–543. IEEE. DOI:

Nian, K., Coleman, T. F.; Li, Y. (2018). Learning minimum variance discrete hedging directly from the market. Quantitative Finance, 18(7):1115–1128. DOI:

Niranjan, M. (1996). Sequential tracking in pricing financial options using model based and neural network approaches. In M.C. Mozer, M.I. Jordan, T. P., editor, Advances in Neural Information Processing Systems, pages 960–966. Cambridge: MIT Press.

Oprea, S. (2015). Informatics solutions for electricity consumption optimization. In 2015 16th IEEE International Symposium on Computational Intelligence and Informatics (CINTI), pages 193–198. IEEE. DOI:

Parida, A., Bisoi, R., Dash, P., and Mishra, S. (2015). Financial time series prediction using a hybrid functional link fuzzy neural network trained by adaptive unscented kalman filter. In 2015 IEEE Power, Communication and Information Technology Conference (PCITC), pages 568–575. IEEE. DOI:

Piepenbrink, A.; Gaur, A. S. (2017). Topic models as a novel approach to identify themes in content analysis. In Academy of Management Proceedings, volume 2017, page 11335. Academy of Management. DOI:

Piepenbrink, A.; Nurmammadov, E. (2015). Topics in the literature of transition economies and emerging markets. Scientometrics, 102(3):2107–2130. DOI:

Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137. DOI:

Rather, A. M., Agarwal, A.; Sastry, V. (2015). Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6):3234–3241. DOI:

Renault, T. (2017). Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance, 84:25–40. DOI:

Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523. DOI:

Sezer, O. B.; Ozbayoglu, A. M. (2018). Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach. Applied Soft Computing, 70:525–538. DOI:

Shen, W.; Wang, J. (2017). Portfolio selection via subset resampling. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pages 1517–1523. DOI:

Smailović, J., Grčar, M., Lavrač, N.; Žnidaršič, M. (2014). Stream-based ac- tive learning for sentiment analysis in the financial domain. Information Sciences, 285:181–203. DOI:

Son, Y., Byun, H.; Lee, J. (2016). Nonparametric machine learning models for predicting the credit default swaps: An empirical study. Expert Systems with Applications, 58:210–220. DOI:

Steiner, M.; Wittkemper, H.-G. (1997). Portfolio optimization with a neural network implementation of the coherent market hypothesis. European Journal of Operational Research, 100(1):27–40. DOI:

Suganuma, M., Shirakawa, S., and Nagao, T. (2017). A genetic programming ap- proach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 497–504. ACM. DOI:

Tilakaratne, C. D., Mammadov, M. A.; Morris, S. A. (2007). Effectiveness of using quantified intermarket influence for predicting trading signals of stock markets. In Proceedings of the sixth Australasian conference on Data mining and analytics, pages 171–179. Australian Computer Society.

Tirunillai, S; Tellis, G. J. (2014). Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent dirichlet allocation. Journal of Marketing Research, 51(4):463–479. DOI:

Tsang, E., Yung, P.; Li, J. (2004). EDDIE-Automation, a decision support tool for financial forecasting. Decision Support Systems, 37(4):559–565. DOI:

Varetto, F. (1998). Genetic algorithms applications in the analysis of insolvency risk. Journal of Banking & Finance, 22(10-11):1421–1439. DOI:

Wang, K.; Huang, S. (2010). Using fast adaptive neural network classifier for mutual fund performance evaluation. Expert Systems with Applications, 37(8):6007– 6011. DOI:

Wang, Y.; Huang, L. (2009). Risk assessment of supply chain based on BP neural network. In KAM’09. Second International Symposium on Knowledge Acquisition and Modelling, 2009, volume 2, pages 186–188. IEEE. DOI:

Weng, B., Lu, L., Wang, X., Megahed, F. M., and Martinez, W. (2018). Predicting short-term stock prices using ensemble methods and online data sources. Expert Systems with Applications, 112:258–273. DOI:

Wong, B. K.; Selvi, Y. (1998). Neural network applications in finance: A review and analysis of literature (1990–1996). Information & Management, 34(3):129–139. DOI:

Worasucheep, C. (2015). Forecasting currency exchange rates with an Artificial Bee Colony-optimized neural network. In 2015 IEEE Congress on Evolutionary Computation (CEC), pages 3319–3326. IEEE. DOI:

Xu, W., Zhang, Z., Gong, D.; Guan, X. (2014). Neural network model for the risk prediction in cold chain logistics. International Journal of Multimedia and Ubiquitous Engineering, 9(8):111–124. DOI:

Yao, J.; Tan, C. L. (2000). A case study on using neural networks to perform technical forecasting of forex. Neurocomputing, 34(1-4):79–98. DOI:

Yiwen, Y., Guizhong, L.; Zongping, Z. (2000). Stock market trend prediction based on neural networks, multiresolution analysis and dynamical reconstruction. In Proceedings of the IEEE/IAFE/INFORMS 2000 Conference on Computational Intelligence for Financial Engineering, pages 155–156. IEEE. DOI:

Yu, Y. (2011). Risk management game method of the weapons project based on bp neural network. In 2011 International Conference on Information Technology, Computer Engineering and Management Sciences (ICM), volume 1, pages 113–117. IEEE. DOI:

Zetzsche, Dirk Andrea; Arner, Douglas W. and Buckley, Ross P. and Tang, Brian, Artificial Intelligence in Finance: Putting the Human in the Loop (February 1, 2020). CFTE Academic Paper Series: Centre for Finance, Technology and Entrepreneurship, no. 1., University of Hong Kong Faculty of Law Research Paper No. 2020/006, Available at SSRN: .




How to Cite

Sameer El Khatib, A. (2021). Machine Learning and Finance: A Review using Latent Dirichlet Allocation Technique (LDA). International Journal for Innovation Education and Research, 9(4), 29–55.
Received 2021-02-25
Accepted 2021-03-21
Published 2021-04-01