Moniepoint welcomed Dr Victor Odumuyiwa, Research Lead, Machine Intelligence Research Group, UNILAG.
In his presentation, he explored how transformer models are applied to financial texts to uncover customer insights.
Objective and Introduction
The fundamental challenge in leveraging machine intelligence is semantics. How do we ensure machines understand not just the text, but the context surrounding that information? A vast array of possibilities opens up if a machine can truly understand human communication.
For organisations dealing with high volumes of customer interactions, this context is crucial. Natural Language Processing (NLP), a subfield of Artificial Intelligence (AI), focuses on programming computers to process and analyse human language data. This guide will detail how Transformer models, the current state-of-the-art in NLP, can be applied to financial texts to generate actionable insights automatically.
Prerequisites / What You'll Need
Familiarity with foundational AI/ML concepts (RNN, LSTM, Neural Networks).
Basic understanding of NLP tasks (Classification, Translation, Generation).
Knowledge of metrics used for model performance evaluation (Precision, Recall/Sensitivity, Accuracy).
The Foundation: The Rise of the Transformer Architecture
The journey toward contextual understanding evolved through several stages: starting with Recurrent Neural Networks (RNN), moving to Long Short-Term Memory (LSTM), and culminating in the Transformer architecture.
A key limitation of earlier models like LSTM was the difficulty in maintaining context over long sequences (the vanishing gradient problem), even with improvements like Bi-LSTM (processing in both directions).
The breakthrough came in 2017 when Google published the paper, "Attention is all you need". Attention mechanisms changed the flow of NLP by enabling the model to remember and connect words to past words, understanding relationships across the entire sequence, which is critical for human language comprehension.
Core Components of the Transformer
The Transformer relies on several mechanisms to capture semantic meaning and positional context:
Tokenisation and Embedding: Input text is broken down into numerical tokens (words or subwords), which are then converted into numerical vectors (embeddings) to capture their semantic meaning.
Positional Encoding: Since the model processes input in parallel, vectors are added to the input embeddings to provide information about each token's position, ensuring the model understands the order of the sequence.
Self-Attention Mechanism: This is the core component that determines the relationship between all tokens in a sequence simultaneously, using three vectors for each token:
Query (Q): Represents the current token being processed.
Key (K): Acts as a label for every other token in the sequence.
Value (V): Contains the actual content of the tokens. By computing attention scores between the Query and all Keys, the model learns how much attention to pay to each Value, forming a weighted, contextual representation.
BERT (Bidirectional Encoder Representations from Transformers): Introduced in 2018, BERT further refined this by processing text in more directions, leading to better contextual representations. This model started the trend of predicting the next word using masked input, enabling effective content generation.
Key NLP Tasks in Financial Services
These advanced models are applied across several critical NLP tasks, moving far beyond simple text processing to semantic understanding.
Sentiment Analysis (Customer Opinion)
Sentiment analysis determines the opinion expressed in a piece of text (e.g., positive, negative, neutral, angry, or hateful). This is essential for understanding customer intimacy and public perception, especially regarding new products or services.
In testing various models on financial datasets (including customer complaints and financial news), BERT consistently showed better Accuracy, Precision, and Recall results compared to traditional machine learning models like SVM (using TF-IDF) and older neural networks like LSTM. For example, on one financial sentiment analysis dataset (Dataset 2), BERT achieved 83% accuracy, significantly outperforming SVM (68%) and LSTM (70%).
Financial Named Entity Recognition (NER)
NER automatically identifies a document's names and specific entities (like countries, people, or organisations). This capability is critical for relationship extraction and synthesising information from millions of business news articles and documents, giving financial professionals a concise summary and insight.
NER models, using optimisers like AdamW and loss functions like Cross-Entropy, can accurately tag specific entities in sentences, whether using BERT or Llama models.
Machine Translation
Significant progress has made machine translation highly effective today, unlike the difficulties encountered years ago. Systems leveraging Transformer architectures (like Google's and OpenAI's models) can now translate complex financial documents, such as news concerning an $800 million World Bank grant to the Nigerian government, moving beyond literal translations to retain the general literary meaning.
Financial Question Answering (QA)
Financial QA involves training a model on specific data, such as company financial reports (10-K filings). The trained machine can then accurately respond to particular questions, reducing the need for humans to handle routine queries. Models like RoBERTa have been trained to answer questions regarding financial details, such as the value of a one-time transaction tax.
Generating Actionable Customer Insights
The true value of these NLP tasks lies in combining them to generate comprehensive customer insights, rather than treating them as scattered tasks.
A system can take unstructured data (like a customer complaint), perform several tasks such as category detection (e.g., Debt collection), priority detection (e.g., Urgent), and sentiment detection (e.g., neutral) and then plug this classified information into an LLM or Transformer to generate detailed summaries.
Example Insight Generation from a Customer Complaint: Based on a complaint detailing an unrecognised debt, an unauthorised account levy, and communication failures due to an outdated address, the combined system generates multi-faceted insights:
Disputed Debt Handling Issues: The required process (affidavit of fraud, police report) was overly burdensome, leading to unresolved disputes.
Communication and Notification Failures: Legal correspondence was sent to an outdated address, despite the company having the correct, current address on file from a recent payment.
Data Management Concerns: This situation reveals a clear breakdown in maintaining and cross-referencing up-to-date customer information, suggesting poor data management.
Customer Impact: The resulting judgment caused a financial loss ($830 levied) and significant emotional distress for the customer.
Finally, the system suggests Process Improvement Opportunities and Actionable Insights, such as implementing robust address verification protocols before legal actions and simplifying the debt dispute process.
Troubleshooting & FAQ
Question: When is it most efficient to fine-tune an LLM versus prompt engineering?
Answer: Fine-tuning and prompt engineering serve different goals, but both depend heavily on the context.
Fine-tuning is primarily used to adapt a general model (pre-trained on broad language) to a specific domain or form (e.g., financial jargon, JSON output format). You are modifying the model’s parameters to understand your particular data better. For fine-tuning to be effective, you need a substantial dataset—hundreds of thousands of examples, as smaller datasets may yield inaccurate results.
Prompt Engineering (or prompt tuning) helps provide specific facts or dictate behaviour, but the size of the context window limits it. If your context is extremely large (e.g., many data sources), prompt tuning may not be viable due to token limits.
Question: For Financial Question Answering, should we use Retrieval-Augmented Generation (RAG) or train from scratch?
Answer: While you can train a small language model entirely from scratch if you have sufficient data and compute power, RAG is generally recommended for integrating real-time or proprietary financial data.
RAG augments the knowledge of a pre-trained model (which is limited by the information available during its training, perhaps six months ago) by fetching current, external, or private information. This is necessary to integrate new data into the model’s knowledge base without retraining the expensive base model every day. RAG also provides a mechanism to allow an LLM to interact with sensitive resources, like your internal database, by creating a system that extracts context-specific information for generation, while keeping the database secure.
Conclusion: The Road Ahead for African AI
Transformer models provide unprecedented power for extracting insight from text. However, the future of AI in regions like Africa requires a focus on AI localism.
Since most African languages are spoken rather than written, focusing on multimodal data (spoken and acoustic data) is essential to drive financial inclusion. Currently, most African languages (like Yorùbá, Igbo, and Hausa) are poorly represented in global AI tools due to factors like the absence of standardised orthographies and the paucity or absence of digital corpora (web-scale data).
To maximise the benefits of AI for African populations, there must be intentionality and strong collaboration between industry, academia, and philanthropy. The strategic direction should focus on building and using Small Language Models (SLMs) tailored to local contexts, rather than trying to compete in the race for Artificial General Intelligence (AGI). This focus will ensure that ordinary people, such as small business owners, can benefit from AI by receiving truly contextual and relevant advice.