SCB DataX Co., Ltd (DataX), a business data analysis service provider under the SCBX Group, is dedicated to enhancing competitiveness through data science, artificial intelligence (AI), and Large Language Models (LLMs). By harnessing the power of AI, Big Data, and LLMs, they aim to create more efficient and accurate financial and banking products and services. Their groundbreaking research articles on AI, LLMs, and Fintech have garnered global attention, having been featured at prestigious international seminars such as the Data-centric Machine Learning Research (DMLR) Workshop at ICLR 2024 in Vienna, Austria; the International Workshop on Semantic Evaluation (SemEval-2024) in Mexico City, Mexico; and Financial Technology and Natural Language Processing (FinNLP) events. This reflects the company's expertise and the wealth of knowledge possessed by its personnel in AI and Machine Learning, aligning with the SCBX Group's vision of pioneering new technologies to establish long-term competitiveness and position itself as a leading regional financial technology group.
Here are the details of the three research studies conducted by SCB DataX Co., Ltd:
"Birbal: An Efficient 7B Instruct-Model Fine-Tuned with Curated Datasets": At the Data-centric Machine Learning Research (DMLR) Workshop held during ICLR 2024 in Vienna, Austria, SCB DataX presented their groundbreaking research on the Birbal model. This winning model leverages Mistral-7B technology and was further improved through additional training on a single RTX 4090 for 16 hours. The result showed a remarkable 35% improvement over the baseline performance. Another noteworthy model, Qwen-14B, was also introduced, capable of generating high-quality commands across a wide range of tasks. Mr. Pawan Rajpoot, a Senior AI Scientist at DataX, co-authored this study.
"Team NP_PROBLEM at SemEval-2024 Task 7: Numerical Reasoning in Headline Generation with Preference Optimization": Presented at the International Workshop on Semantic Evaluation (SemEval-2024) in Mexico City, Mexico, this research focuses on numerical reasoning in headline generation. The team achieved an impressive numerical accuracy of over 73.49%. The study not only analyzes numbers but also delves into system design and common error patterns. By integrating numerical analysis into Large Language Models (LLMs), the researchers shed light on the potential and challenges of creating informative headlines. Co-authored by Mr. Pawan Rajpoot and Mr. Nut Chukamphaeng, both Senior AI Scientists at DataX, this study contributes significantly to the field.
- https://sites.google.com/view/numeval/numeval
"Adapting LLM to Multi-lingual ESG Impact and Length Prediction using In-context Learning and Fine-Tuning with Rationale": Presented at the Financial Technology and Natural Language Processing (FinNLP) event during COLING 2024, this research addresses a critical area: predicting the impact and timing of environmental, social, and governance (ESG) events from news articles. SCB DataX leveraged LLMs, including GPT-4 with In-context Learning (ICL) and Mistral (7B) LLM, to achieve accurate predictions. The main author of this study is Mr. Pawan Rajpoot, a Senior AI Scientist at DataX.
- https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-kdf-2024/accepted-papers