Work & Project Experience
Xtracta, Auckland, New Zealand
07/22 – now
Investigated and implemented alternative attention mechanisms to extend the effective sequence length in multi-modal document processing models such as LayoutLMv3 and ERNIE-LayoutX.
By applied the sliding window technique and a global attention mask from Longformer to extend the maximum sequence length from 512 to 4096, which model among LayoutLMv3 and ERNIE-LayoutX achieves a higher F1 score on the XFUND, FUNSD and other company internal datasets without significantly increasing GPU memory usage.
Replicated the multi-task, multimodal pre-training code for LayoutLMv3, which Microsoft did not open source, including masked language modeling, masked image modeling, and word-patch alignment.
Integrated deepspeed and adapters into ERNIE-LayoutX and LayoutLMv3, which can reduce training costs, result in a smaller model size, and make it easier to deploy in the production environment.
Successfully applied for the Research & Development Tax Incentive (RDTI) grants from Callaghan Innovation (New Zealand's Innovation Agency) for both 2022 and 2023, each offering a tax credit equal to 15% of eligible R&D expenditure. This credit can be utilised to reduce the income tax payable by the company.
Integrated Flash-Attention 2 into Self-Attention can help ERNIE-LayoutX reduce maximum training GPU memory usage by up to 50% under FP16.
Applied affine transformations for data augmentation to train the model and improve the robustness of line alignment issues for document extraction.
By using the PEFT adapter to train the large language submodel Qwen2 of the multimodal large model InternVL2, and combining it with continuous training, it is possible to train the 1-billion-parameter InternVL2 multimodal large model on a single A6000 GPU.
UoA, Auckland, New Zealand
02/20 – 03/24
We have developed an iterative enhancement framework based on LLM for generating explanations. The framework iteratively interacts between an explanation generation module and an explanation evaluation module to enhance the quality of the generated explanations. Our paper has been accepted by AGI@ICLR 2024. paper and source code.
Our method "AMR-LDA" (GPT-4 + AMR-LDA Prompt Augmentation) achieved #1 on the ReClor leaderboard, and we are the first group scored above 90% on the hidden test set around the world. Our paper has been accepted by the Findings of ACL-24 and LLM@IJCAI'23. paper, source code and model weights.
We evaluated generative and discriminative large language models on out-of-distribution logical reasoning tasks. While they excel in standard tasks, minor changes lead to notable performance drops, indicating insufficient reasoning capabilities. Our paper has been accepted by LLM@IJCAI'23. paper and source code.
To address depth imbalance in multi-step reasoning datasets and enhance model performance, we created the IMA-GloVe-GA model, combining DeepLogic with Gate Attention. Additionally, we developed a larger dataset, PARARULE-Plus, for deep multi-step reasoning over natural language. We published the paper, code and data and presentation recording on IJCLR-NeSy-22.
We built up a dataset called AbductionRules to increase the Transformer's performance on the tasks requiring abduction reasoning. We published the paper, code and data on the Findings of ACL-22.
PARARULE Plus (Multi-step deductive reasoning) and AbductionRules (Abductive reasoning) datasets are collected and merged as part of LogiTorch.ai, ReasoningNLP, Prompt4ReasoningPapers and OpenAI/Evals.
Advanced Institute of Information Technology, Peking University, Hangzhou, China
11/19 – 02/20
We developed and researched a robot-based system including automatic abstract extraction, text segmentation, theme prediction, and multi-turn question answering.
Investigation and standard documentation of robot-related technologies.
We built a well-encapsulated API to implement meeting record document processing based on the abstract extraction, text segmentation, and theme prediction.
Precision Driven Health & Orion Health, Auckland, New Zealand
11/18 – 04/19
We developed a medical text similarity algorithm called HBAM using Pre-trained Language Model and Knowledge Graph.
Compared with BERT and MaLSTM models, HBAM performs higher test accuracy than the two Deep Learning models respectively code (#star: 85+), news, recording and published paper (#citation: 60+) on ACSW-20.