Machine Learning and Finance: A Review using Latent Dirichlet Allocation Technique (LDA)

Ahmed Sameer El Khatib

doi:10.31686/ijier.vol9.iss4.3016

Authors

Ahmed Sameer El Khatib Centro Universitário Fundação Assis Gurgacz

DOI:

https://doi.org/10.31686/ijier.vol9.iss4.3016

Keywords:

Machine Learning, topic modelling, structuring finance research, latent dirichlet allocation

Abstract

The aim of this paper is provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modelling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation Technique (LDA), we can extract the 14 coherent research topics that are the focus of the 6,148 academic articles during the years 1990-2019 analysed. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modelling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors.

Downloads

Download data is not yet available.

Author Biography

Ahmed Sameer El Khatib, Centro Universitário Fundação Assis Gurgacz

Professor, Deptartament of Accounting

References

Abdou, H. A., Alam, S. T.; Mulkeen, J. (2014). Would credit scoring work for islamic finance? A neural network approach. International Journal of Islamic and Middle Eastern Finance and Management, 7(1):112–125. DOI: https://doi.org/10.1108/IMEFM-03-2013-0038

Abraham, A. (2002). Analysis of hybrid soft and hard computing techniques for forex monitoring systems. In 2002 IEEE World Congress on Computational Intelligence, volume 2, pages 1616–1621. IEEE. DOI: https://doi.org/10.1109/FUZZ.2002.1006749

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4):589–609. DOI: https://doi.org/10.1111/j.1540-6261.1968.tb00843.x

Altman, E. I., Marco, G.; Varetto, F. (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). Journal of Banking & Finance, 18(3):505–529. DOI: https://doi.org/10.1016/0378-4266(94)90007-8

AmirAskari, M.; Menhaj, M. B. (2016). A modified fuzzy relational model ap- proach to prediction of foreign exchange rates. In 2016 4th International Conference on Control, Instrumentation, and Automation (ICCIA), pages 457–461. IEEE. DOI: https://doi.org/10.1109/ICCIAutom.2016.7483206

Araújo, R. d. A., de Oliveira, A. L.; Soares, S. C. (2010). A quantum-inspired hybrid methodology for financial time series prediction. In The 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE. DOI: https://doi.org/10.1109/IJCNN.2010.5604601

Athey, S. (2018). The impact of machine learning on economics. In Ajay K. Agrawal, J. G. and Goldfarb, A., editors, The Economics of Artificial Intelligence: An Agenda. University of Chicago Press.

Aziz, S., Michael D., Helmi H.; A. Piepenbrink. Machine learning in finance: A topic modellng approach. In: 1st International Banking and Finance Research Conference, Agadir, Morocco, October 2019. DOI: https://doi.org/10.2139/ssrn.3327277

Bhattacharya, S.; Ghosh, S. (2007). An artificial intelligence based approach for risk management using attack graph. In 2007 International Conference on Computational Intelligence and Security, pages 794–798. IEEE. DOI: https://doi.org/10.1109/CIS.2007.145

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77–84. DOI: https://doi.org/10.1145/2133806.2133826

Blei, D. M., Ng, A. Y.; Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022.

Bollen, J., Mao, H.; Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8. DOI: https://doi.org/10.1016/j.jocs.2010.12.007

Boyd-Graber, J., Hu, Y.; Mimno, D. (2017). Applications of topic models. Foundations and Trends in Information Retrieval, 11(2-3):143–296. DOI: https://doi.org/10.1561/1500000030

Cerchiello, P., Giudici, P.; Nicola, G. (2017). Twitter data models for bank risk contagion. Neurocomputing, 264:50–56. DOI: https://doi.org/10.1016/j.neucom.2016.10.101

Chavarnakul, T.; Enke, D. (2008). Intelligent technical analysis based equiv- olume charting for stock trading using neural networks. Expert Systems with Applications, 34(2):1004–1017. DOI: https://doi.org/10.1016/j.eswa.2006.10.028

Chellaboina, V., Bhatia, A.; Bhat, S. P. (2013). Explicit formulas for optimal hedging stratergies for European contingent claims. In 2013 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pages 122–127. IEEE. DOI: https://doi.org/10.1109/CIFEr.2013.6611707

Cheng, D.; Cirillo, P. (2018). A reinforced urn process modelling of recovery rates and recovery times. Journal of Banking & Finance, 96:1–17. DOI: https://doi.org/10.1016/j.jbankfin.2018.08.014

Dempster, M. A., Payne, T. W., Romahi, Y.; Thompson, G. W. (2001). Computational learning techniques for intraday FX trading using popular technical indicators. IEEE Transactions on neural networks, 12(4):744–754. DOI: https://doi.org/10.1109/72.935088

Dyer, T., Lang, M.; Stice-Lawrence, L. (2017). The evolution of 10-K textual disclosure: Evidence from latent dirichlet allocation. Journal of Accounting and Economics, 64(2-3):221–245. DOI: https://doi.org/10.1016/j.jacceco.2017.07.002

Ferreira, J. Z., Rodrigues, J., Cristo, M.; de Oliveira, D. F. (2014). Multi-entity polarity analysis in financial documents. In Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, pages 115–122. ACM. DOI: https://doi.org/10.1145/2664551.2664574

Figini, S., Bonelli, F.; Giovannini, E. (2017). Solvency prediction for small and medium enterprises in banking. Decision Support Systems, 102:91–97. DOI: https://doi.org/10.1016/j.dss.2017.08.001

Ghasemiyeh, R., Moghdani, R.; Sana, S. S. (2017). A hybrid artificial neural network with metaheuristic algorithms for predicting stock price. Cybernetics and Systems, 48(4):365–392. DOI: https://doi.org/10.1080/01969722.2017.1285162

Goh, Y. M.; Chua, D. (2009). Case-based reasoning approach to construction safety hazard identification: Adaptation and utilization. Journal of Construction Engineering and Management, 136(2):170–178. DOI: https://doi.org/10.1061/(ASCE)CO.1943-7862.0000116

Griffiths, T. L.; Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101:5228–5235. DOI: https://doi.org/10.1073/pnas.0307752101

Harvey, C. R., Liechty, J. C., Liechty, M. W.; Müller, P. (2010). Portfolio selection with higher moments. Quantitative Finance, 10(5):469–485. DOI: https://doi.org/10.1080/14697681003756877

Hawley, D. D., Johnson, J. D.; Raina, D. (1990). Artificial neural systems: A new tool for financial decision-making. Financial Analysts Journal, 46(6):63–72. DOI: https://doi.org/10.2469/faj.v46.n6.63

Heaton, J., Polson, N. G.; Witte, J. H. (2016). Deep learning in finance. arXiv preprint arXiv:1602.06561.

Hornik, K.; Grün, B. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13):1–30. DOI: https://doi.org/10.18637/jss.v040.i13

Hossain, A.; Nasser, M. (2011). Comparison of the finite mixture of ARMA- GARCH, back propagation neural networks and support-vector machines in fore- casting financial returns. Journal of Applied Statistics, 38(3):533–551. DOI: https://doi.org/10.1080/02664760903521435

Huang, D., Thottan, M.; Feather, F. (2013). Designing customized energy services based on disaggregation of heating usage. In 2013 IEEE PES Innovative Smart Grid Technologies (ISGT), pages 1–6. IEEE. DOI: https://doi.org/10.1109/ISGT.2013.6497863

Hussain, A. J., Al-Jumeily, D., Al-Askar, H.; Radi, N. (2016). Regularized dynamic self-organized neural network inspired by the immune algorithm for financial time series prediction. Neurocomputing, 188:23–30. DOI: https://doi.org/10.1016/j.neucom.2015.01.109

Ince, H.; Trafalis, T. B. (2008). Short term forecasting with support vector ma- chines and application to stock price prediction. International Journal of General Systems, 37(6):677–687. DOI: https://doi.org/10.1080/03081070601068595

Ito, T., Sakaji, H., Izumi, K., Tsubouchi, K.; Yamashita, T. (2017). Development of sentiment indicators using both unlabeled and labeled posts. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–8. IEEE. DOI: https://doi.org/10.1109/SSCI.2017.8280918

Junyou, B. (2007). Stock price forecasting using PSO-trained neural networks. In IEEE Congress on Evolutionary Computation, pages 2879–2885. IEEE.

Kampouridis, M.; Otero, F. E. (2017). Heuristic procedures for improving the predictability of a genetic programming financial forecasting algorithm. Soft Computing, 21(2):295–310. DOI: https://doi.org/10.1007/s00500-015-1614-8

Kampouridis, M. and Tsang, E. (2010). EDDIE for investment opportunities forecast- ing: Extending the search space of the GP. In 2010 IEEE Congress on Evolutionary Computation (CEC), pages 1–8. IEEE. DOI: https://doi.org/10.1109/CEC.2010.5586094

Kaplan, S.; Vakili, K. (2015). The double-edged sword of recombination in breakthrough innovation. Strategic Management Journal, 36(10):1435–1457. DOI: https://doi.org/10.1002/smj.2294

Khandani, A. E., Kim, A. J.; Lo, A. W. (2010). Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787. DOI: https://doi.org/10.1016/j.jbankfin.2010.06.001

Kim, S. (1998). Time-delay recurrent neural network for temporal correlations and prediction. Neurocomputing, 20(1-3):253–263. DOI: https://doi.org/10.1016/S0925-2312(98)00018-6

Kim, Y. S.; Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with Applications, 26(4):567–573. DOI: https://doi.org/10.1016/j.eswa.2003.10.013

Kodogiannis, V.; Lolis, A. (2002). Forecasting financial time series using neural network and fuzzy system-based techniques. Neural Computing & Applications, 11(2):90–102. DOI: https://doi.org/10.1007/s005210200021

Krippendorff, K. (1970). Estimating the reliability, systematic error and random error of interval data. Educational and Psychological Measurement, 30(1):61–70. DOI: https://doi.org/10.1177/001316447003000105

Liao, Z.; Wang, J. (2010). Forecasting model of global stock index by stochastic time effective neural network. Expert Systems with Applications, 37(1):834–841. DOI: https://doi.org/10.1016/j.eswa.2009.05.086

Liu, F.; Wang, J. (2012). Fluctuation prediction of stock market index by Legen- dre neural network with random time strength function. Neurocomputing, 83:12– 21. DOI: https://doi.org/10.1016/j.neucom.2011.09.033

Lumezanu, C., Feamster, N.; Klein, H. (2012). # bias: Measuring the tweeting behavior of propagandists. In Sixth International AAAI Conference on Weblogs and Social Media.

Ma, Y., Gong, X.; Tian, G. (2014). A mean-semi-variance portfolio opti- mization model with full transaction costs. In 2014 International Conference on Computational Intelligence and Communication Networks (CICN), pages 623–627. IEEE. DOI: https://doi.org/10.1109/CICN.2014.139

Mahalingam, P.; Vivek, S. (2016). Predicting financial savings decisions using sigmoid function and information gain ratio. Procedia Computer Science, 93:19–25. DOI: https://doi.org/10.1016/j.procs.2016.07.176

Marmier, F., Ioana, F. D., and Didier, G. (2014). Strategic decision-making in NPD projects according to risk: Application to satellites design projects. Computers in Industry, 65(8):1107 – 1114. DOI: https://doi.org/10.1016/j.compind.2014.06.001

Medeiros, C. M.; Barreto, G. A. (2007). Pruning the multilayer percep- tron through the correlation of backpropagated errors. In Seventh International Conference on Intelligent Systems Design and Applications, pages 64–69. IEEE. DOI: https://doi.org/10.1109/ISDA.2007.156

Michaud, R. O. (1989). The Markowitz optimization enigma: Is ‘optimized’ optimal? Financial Analysts Journal, 45(1):31–42. DOI: https://doi.org/10.2469/faj.v45.n1.31

Miglietta, N.; Remondino, M. (2009). Modelling cognitive distortions of be- havioural finance. In International Conference on Computational Intelligence, Modelling and Simulation, 2009., pages 204–209. IEEE. DOI: https://doi.org/10.1109/CSSim.2009.17

Mishra, A., Irwin, D., Shenoy, P., Kurose, J.; Zhu, T. (2012). Smartcharge: Cutting the electricity bill in smart homes with energy storage. In Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet, page 29. ACM. DOI: https://doi.org/10.1145/2208828.2208857

Moerland, T. M., Broekens, J., and Jonker, C. M. (2018). Emotion in reinforcement learning agents and robots: A survey. Machine Learning, 107(2):443–480. DOI: https://doi.org/10.1007/s10994-017-5666-0

Mogre, R., D’Amico, F., et al. (2016). A decision framework to mitigate supply chain risks: An application in the offshore-wind industry. IEEE Transactions on Engineering Management, 63(3):316–325. DOI: https://doi.org/10.1109/TEM.2016.2567539

Moosa, I. A. (2007). Operational Risk Management. Springer. DOI: https://doi.org/10.1057/9780230591486

Moro, S., Cortez, P.; Rita, P. (2015). Business intelligence in banking: A liter- ature analysis from 2002 to 2013 using text mining and latent dirichlet allocation. Expert Systems with Applications, 42(3):1314–1324. DOI: https://doi.org/10.1016/j.eswa.2014.09.024

Mukwazvure, A.; Supreethi, K. (2015). A hybrid approach to sentiment analysis of news comments. In 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), pages 1–6. IEEE. DOI: https://doi.org/10.1109/ICRITO.2015.7359282

Mullainathan, S.; Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2):87–106. DOI: https://doi.org/10.1257/jep.31.2.87

Ng, W. W., Liang, X.-L., Chan, P. P.; Yeung, D. S. (2011). Stock investment decision support for Hong Kong market using RBFNN based candlestick models. In 2011 International Conference on Machine Learning and Cybernetics (ICMLC), volume 2, pages 538–543. IEEE. DOI: https://doi.org/10.1109/ICMLC.2011.6016839

Nian, K., Coleman, T. F.; Li, Y. (2018). Learning minimum variance discrete hedging directly from the market. Quantitative Finance, 18(7):1115–1128. DOI: https://doi.org/10.1080/14697688.2017.1413245

Niranjan, M. (1996). Sequential tracking in pricing financial options using model based and neural network approaches. In M.C. Mozer, M.I. Jordan, T. P., editor, Advances in Neural Information Processing Systems, pages 960–966. Cambridge: MIT Press.

Oprea, S. (2015). Informatics solutions for electricity consumption optimization. In 2015 16th IEEE International Symposium on Computational Intelligence and Informatics (CINTI), pages 193–198. IEEE. DOI: https://doi.org/10.1109/CINTI.2015.7382921

Parida, A., Bisoi, R., Dash, P., and Mishra, S. (2015). Financial time series prediction using a hybrid functional link fuzzy neural network trained by adaptive unscented kalman filter. In 2015 IEEE Power, Communication and Information Technology Conference (PCITC), pages 568–575. IEEE. DOI: https://doi.org/10.1109/PCITC.2015.7438229

Piepenbrink, A.; Gaur, A. S. (2017). Topic models as a novel approach to identify themes in content analysis. In Academy of Management Proceedings, volume 2017, page 11335. Academy of Management. DOI: https://doi.org/10.5465/AMBPP.2017.141

Piepenbrink, A.; Nurmammadov, E. (2015). Topics in the literature of transition economies and emerging markets. Scientometrics, 102(3):2107–2130. DOI: https://doi.org/10.1007/s11192-014-1513-2

Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137. DOI: https://doi.org/10.1108/eb046814

Rather, A. M., Agarwal, A.; Sastry, V. (2015). Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6):3234–3241. DOI: https://doi.org/10.1016/j.eswa.2014.12.003

Renault, T. (2017). Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance, 84:25–40. DOI: https://doi.org/10.1016/j.jbankfin.2017.07.002

Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523. DOI: https://doi.org/10.1016/0306-4573(88)90021-0

Sezer, O. B.; Ozbayoglu, A. M. (2018). Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach. Applied Soft Computing, 70:525–538. DOI: https://doi.org/10.1016/j.asoc.2018.04.024

Shen, W.; Wang, J. (2017). Portfolio selection via subset resampling. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pages 1517–1523. DOI: https://doi.org/10.1609/aaai.v31i1.10728

Smailović, J., Grčar, M., Lavrač, N.; Žnidaršič, M. (2014). Stream-based ac- tive learning for sentiment analysis in the financial domain. Information Sciences, 285:181–203. DOI: https://doi.org/10.1016/j.ins.2014.04.034

Son, Y., Byun, H.; Lee, J. (2016). Nonparametric machine learning models for predicting the credit default swaps: An empirical study. Expert Systems with Applications, 58:210–220. DOI: https://doi.org/10.1016/j.eswa.2016.03.049

Steiner, M.; Wittkemper, H.-G. (1997). Portfolio optimization with a neural network implementation of the coherent market hypothesis. European Journal of Operational Research, 100(1):27–40. DOI: https://doi.org/10.1016/S0377-2217(95)00339-8

Suganuma, M., Shirakawa, S., and Nagao, T. (2017). A genetic programming ap- proach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 497–504. ACM. DOI: https://doi.org/10.1145/3071178.3071229

Tilakaratne, C. D., Mammadov, M. A.; Morris, S. A. (2007). Effectiveness of using quantified intermarket influence for predicting trading signals of stock markets. In Proceedings of the sixth Australasian conference on Data mining and analytics, pages 171–179. Australian Computer Society.

Tirunillai, S; Tellis, G. J. (2014). Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent dirichlet allocation. Journal of Marketing Research, 51(4):463–479. DOI: https://doi.org/10.1509/jmr.12.0106

Tsang, E., Yung, P.; Li, J. (2004). EDDIE-Automation, a decision support tool for financial forecasting. Decision Support Systems, 37(4):559–565. DOI: https://doi.org/10.1016/S0167-9236(03)00087-3

Varetto, F. (1998). Genetic algorithms applications in the analysis of insolvency risk. Journal of Banking & Finance, 22(10-11):1421–1439. DOI: https://doi.org/10.1016/S0378-4266(98)00059-4

Wang, K.; Huang, S. (2010). Using fast adaptive neural network classifier for mutual fund performance evaluation. Expert Systems with Applications, 37(8):6007– 6011. DOI: https://doi.org/10.1016/j.eswa.2010.02.003

Wang, Y.; Huang, L. (2009). Risk assessment of supply chain based on BP neural network. In KAM’09. Second International Symposium on Knowledge Acquisition and Modelling, 2009, volume 2, pages 186–188. IEEE. DOI: https://doi.org/10.1109/KAM.2009.232

Weng, B., Lu, L., Wang, X., Megahed, F. M., and Martinez, W. (2018). Predicting short-term stock prices using ensemble methods and online data sources. Expert Systems with Applications, 112:258–273. DOI: https://doi.org/10.1016/j.eswa.2018.06.016

Wong, B. K.; Selvi, Y. (1998). Neural network applications in finance: A review and analysis of literature (1990–1996). Information & Management, 34(3):129–139. DOI: https://doi.org/10.1016/S0378-7206(98)00050-0

Worasucheep, C. (2015). Forecasting currency exchange rates with an Artificial Bee Colony-optimized neural network. In 2015 IEEE Congress on Evolutionary Computation (CEC), pages 3319–3326. IEEE. DOI: https://doi.org/10.1109/CEC.2015.7257305

Xu, W., Zhang, Z., Gong, D.; Guan, X. (2014). Neural network model for the risk prediction in cold chain logistics. International Journal of Multimedia and Ubiquitous Engineering, 9(8):111–124. DOI: https://doi.org/10.14257/ijmue.2014.9.8.10

Yao, J.; Tan, C. L. (2000). A case study on using neural networks to perform technical forecasting of forex. Neurocomputing, 34(1-4):79–98. DOI: https://doi.org/10.1016/S0925-2312(00)00300-3

Yiwen, Y., Guizhong, L.; Zongping, Z. (2000). Stock market trend prediction based on neural networks, multiresolution analysis and dynamical reconstruction. In Proceedings of the IEEE/IAFE/INFORMS 2000 Conference on Computational Intelligence for Financial Engineering, pages 155–156. IEEE. DOI: https://doi.org/10.1109/CIFER.2000.844615

Yu, Y. (2011). Risk management game method of the weapons project based on bp neural network. In 2011 International Conference on Information Technology, Computer Engineering and Management Sciences (ICM), volume 1, pages 113–117. IEEE. DOI: https://doi.org/10.1109/ICM.2011.32

Zetzsche, Dirk Andrea; Arner, Douglas W. and Buckley, Ross P. and Tang, Brian, Artificial Intelligence in Finance: Putting the Human in the Loop (February 1, 2020). CFTE Academic Paper Series: Centre for Finance, Technology and Entrepreneurship, no. 1., University of Hong Kong Faculty of Law Research Paper No. 2020/006, Available at SSRN: https://ssrn.com/abstract=3531711 .