Machine learning for financial transaction classification across companies using character-level word embeddings of text fields
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
Machine learning for financial transaction classification across companies using character-level word embeddings of text fields. / Jørgensen, Rasmus Kær; Igel, Christian.
I: Intelligent Systems in Accounting, Finance and Management, Bind 28, Nr. 3, 2021, s. 159-172.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Machine learning for financial transaction classification across companies using character-level word embeddings of text fields
AU - Jørgensen, Rasmus Kær
AU - Igel, Christian
N1 - Publisher Copyright: © 2021 John Wiley & Sons, Ltd.
PY - 2021
Y1 - 2021
N2 - An important initial step in accounting is mapping financial transfers to the corresponding accounts. We devised machine-learning-based systems that automate this process. They use word embeddings with character-level features to process transaction texts. When considering 473 companies independently, our approach achieved an average top-1 accuracy of 80.50%, outperforming baselines that exclude the transaction texts or rely on a lexical bag-of-words text representation. We extended the approach to generalizes across companies and even across different corporate sectors. After standardization of the account structures and careful feature engineering, a single classifier trained on 44 companies from 28 sectors achieved a test accuracy of more than 80%. When trained on 43 companies and tested on the remaining one, the system achieved an average performance of 64.62%. This rate increased to nearly 70% when considering only the largest sector.
AB - An important initial step in accounting is mapping financial transfers to the corresponding accounts. We devised machine-learning-based systems that automate this process. They use word embeddings with character-level features to process transaction texts. When considering 473 companies independently, our approach achieved an average top-1 accuracy of 80.50%, outperforming baselines that exclude the transaction texts or rely on a lexical bag-of-words text representation. We extended the approach to generalizes across companies and even across different corporate sectors. After standardization of the account structures and careful feature engineering, a single classifier trained on 44 companies from 28 sectors achieved a test accuracy of more than 80%. When trained on 43 companies and tested on the remaining one, the system achieved an average performance of 64.62%. This rate increased to nearly 70% when considering only the largest sector.
KW - accounting
KW - finance
KW - financial transactions
KW - multiclass classification
KW - random forest
KW - word embedding
U2 - 10.1002/isaf.1500
DO - 10.1002/isaf.1500
M3 - Journal article
AN - SCOPUS:85114320023
VL - 28
SP - 159
EP - 172
JO - Intelligent Systems in Accounting, Finance and Management
JF - Intelligent Systems in Accounting, Finance and Management
SN - 1550-1949
IS - 3
ER -
ID: 280029661