Qiming Bao

AI Research Scientist / Research Engineer

AI Research Scientist and Research Engineer specializing in LLMs, long-context VLMs, logical reasoning, multimodal document AI, and intelligent document processing. PhD in Computer Science, University of Auckland.

AI Researcher and Engineer @ Xtracta

LLMs + Long-Context VLMs + Logical Reasoning + Intelligent Document Processing

Ex-AIIT, Peking University, MSRA, Samsung AI UK

PhD, Strong AI Lab & NAOInstitute, University of Auckland

LinkedIn    GitHub    Gmail    Google Scholar    ResearchGate    DBLP    DBLP    Twitter    CV    简历

Personal Details

Current Role

Qiming Bao is an AI Research Scientist and Engineer at Xtracta in Auckland, New Zealand, working on multimodal document AI and intelligent document processing, including continual training of vision-language models such as Qwen3-VL with PEFT adapters, FlashAttention 2, and GPTQ int4 quantization, as well as long-context optimization for models such as LayoutLMv3 and ERNIE-LayoutX through efficient attention mechanisms, multimodal pre-training replication, and production-oriented deployment improvements.

PhD & Research

Qiming Bao received his PhD in Computer Science from the Strong AI Lab and NAOInstitute at the University of Auckland, supervised by Professor Michael Witbrock and Associate Professor Jiamou Liu. His research focuses on large language models, logical reasoning, neural-symbolic AI, multimodal document AI, and intelligent document processing. Before joining Xtracta, he worked at AIIT at Peking University on automatic abstract generation and GPT-2 based dialogue systems, and also contributed to early medical AI research through projects with Precision Driven Health & Orion Health.

Selected Achievements

Qiming has published in leading AI and NLP venues including ACL, AAAI, IJCAI, EACL, and IJCLR-NeSy. His AMR-LDA method achieved the #1 ranking on the ReClor leaderboard, and his datasets, including PARARULE-Plus and AbductionRules, have been adopted by multiple reasoning benchmark projects. Since 2025, he has also served as an Adjunct Associate Professor at Beijing International Studies University (BISU) and has delivered invited talks or academic visits at institutions including Microsoft Research Asia, Samsung AI Center Cambridge UK, Zhejiang University, the University of Melbourne, the Chinese Academy of Sciences, the University of Massachusetts Amherst, Penn State University, Tsinghua University, Max Planck Institute for Software Systems, and the Technical University of Munich.

Education

Research Interests

Large Language Models (LLMs), Vision-Language Models (VLMs), Long-Context Modeling, Logical Reasoning, Neural-Symbolic AI, Multimodal Document AI, Intelligent Document Processing, Document Understanding, OCR, and Information Extraction.

Publications

Work & Project Experience

Enhancing Max Sequence Length in Large Multimodal Models Xtracta, Auckland, New Zealand
Artificial Intelligence Researcher/Engineer 07/2022 – Present
  • Investigated and implemented alternative attention mechanisms to extend the effective sequence length in multi-modal document processing models such as LayoutLMv3 and ERNIE-LayoutX.
  • Applied the sliding window technique and a global attention mask from Longformer to extend the maximum sequence length from 512 to 4096, enabling LayoutLMv3 and ERNIE-LayoutX to achieve higher F1 scores on XFUND, FUNSD, and internal datasets without significantly increasing GPU memory usage.
  • Replicated the multi-task, multimodal pre-training code for LayoutLMv3, which Microsoft did not open source, including masked language modeling, masked image modeling, and word-patch alignment.
  • Integrated deepspeed and adapters into ERNIE-LayoutX and LayoutLMv3, which can reduce training costs, result in a smaller model size, and make it easier to deploy in the production environment.
  • Successfully applied for the Research & Development Tax Incentive (RDTI) grants from Callaghan Innovation (New Zealand's Innovation Agency) for both 2022 and 2023, each offering a tax credit equal to 15% of eligible R&D expenditure. This credit can be utilised to reduce the income tax payable by the company.
  • Integrated Flash-Attention 2 into Self-Attention can help ERNIE-LayoutX reduce maximum training GPU memory usage by up to 50% under FP16.
  • Applied affine transformations for data augmentation to train the model and improve the robustness of line alignment issues for document extraction.
  • By using the PEFT adapter, Flash-Attention 2 and GPTQ int4 quantization to continually train the Qwen3-VL-8B and make Qwen3-VL-8B training on 8*H200 GPUs.
  • Adding page embeddings to vision-language models (Qwen3-VL-8B and ERNIE-LayoutX) can improve their performance on fields that frequently appear on each page of a multi-page document (more than 15%), such as supplier names or bank names.
Head of AI — Cognitive ChatGPT Product MVP (GenAI) Kerrio.ai (Invested by the son of Otto Happel, a German billionaire and former CEO of GEA), Auckland, New Zealand
Head of AI / Lead Engineer (Hands-on MVP Builder) 11/2025 – Present
  • Owned end-to-end delivery of a cognitive ChatGPT-style AI product MVP, from use-case definition and rapid prototyping to implementation and iteration based on user feedback.
  • Built a coaching-oriented conversational AI pipeline leveraging LLM prompting and structured training data to improve response quality and user outcomes, focusing on maintainability and fast experimentation.
  • Established an evaluation and iteration loop (data collection → prompt/logic refinement → regression checks) to deliver predictable quality improvements across MVP releases.
  • Collaborated with product and stakeholders to translate requirements into technical milestones, delivering in agile sprint cycles.
  • Project reference: github.com/14H034160212/gptcoaching_mi_training
Adjunct Associate Professor Beijing International Studies University (BISU), Beijing, China
Adjunct Associate Professor 2025 – Present
  • Adjunct appointment since 2025; involved in academic collaboration and mentoring on AI/NLP/multimodal systems.
  • Research focus (BISU): Multimodal Experiments for Short Drama Translation (Subtitle Recognition + Translation + TTS). Built a multimodal subtitle translation system for short dramas, covering Subtitle Recognition (VLM/OCR), Subtitle Translation (LoRA fine-tuning), and Japanese TTS (zero-shot voice cloning), with scripted evaluation and reproducible environments. Project reference: github.com/14H034160212/translation
  • Subtitle recognition benchmarking across Qwen2-VL / Qwen3-VL / InternVL2 and traditional OCR baselines (EasyOCR, RapidOCR); ran FPS sensitivity ablation (1fps/2fps/5fps) and temporal deduplication to improve subtitle recall for fast-paced dialogue.
  • TTS comparison (GPT-SoVITS v3 / F5-TTS / EdgeTTS) using Whisper-based WER/CER; implemented Adaptive Fusion (ASR + OCR) to correct ASR hallucinations using visual context.
  • Status: Manuscript is under submission. In parallel, actively preparing proposals for the National Natural Science Foundation of China (NSFC).
Applied AI Systems & Decision Optimisation (Selected Projects) Auckland, New Zealand
Personal / Mentored Engineering Projects 2025 – Present
  • AlphaTrader — OpenClaw + DeepSeek stock prediction / signal + backtesting platform; includes Docker, GitHub Actions, unit tests, automated regression tests, and reproducible experiment pipelines. Project reference: github.com/14H034160212/AlphaTrader
  • AuroraBid — RL + Bandit ad bidding / auction demo (OAG Career mentor); includes reproducible simulation runs with CI/CD automation, tests, and experiment tracking patterns. Project reference: github.com/14H034160212/AuroraBid
Large Language Model and Logical Reasoning (Ph.D. Main Topic) UoA, Auckland, New Zealand
Research & Development Project Leader/Developer 02/2020 – 09/2025
  • Recipient of research funding for the project Strong AI Lab (Grant No. 5000675), awarded by the Tertiary Education Commission under the Entrepreneurial Research Funding program, with a total grant amount of NZD 9.6 million. Qiming Bao was primarily responsible for the logical reasoning research direction within this project.
  • We have developed an iterative enhancement framework based on LLM for generating explanations. The framework iteratively interacts between an explanation generation module ad an explanation evaluation module to enhance the quality of the generated explanations. Our paper has been accepted by AAAI Proceedings (2025) and AGI@ICLR (2024). paper and source code.
  • Our method "AMR-LDA" (GPT-4 + AMR-LDA Prompt Augmentation) achieved #1 on the ReClor leaderboard. We are the first group scored above 90% on the hidden test set around the world. Our paper has been accepted by the Findings of ACL-24 and LLM@IJCAI'23 respectively. paper, source code and model weights.
  • We evaluated generative and discriminative large language models on out-of-distribution logical reasoning tasks. While these models perform well on standard benchmarks, even minor changes in the input lead to significant performance drops, highlighting their limited reasoning capabilities. Our paper was accepted at LLM@IJCAI'23 paper and IJCAI 2024 paper (cited over 100 times), and the corresponding source code is available on GitHub.
  • To address depth imbalance in multi-step reasoning datasets and enhance model performance, we created the IMA-GloVe-GA model, combining DeepLogic with Gated Attention. Additionally, we developed a larger dataset, PARARULE-Plus, for deep multi-step reasoning over natural language. We published the paper, code and data and presentation recording on IJCLR-NeSy-22.
  • We built up a dataset called AbductionRules to increase the Transformer's performance on the tasks requiring abduction reasoning. We published the paper, code and data on the Findings of ACL-22.
  • PARARULE Plus (Multi-step deductive reasoning) and AbductionRules (Abductive reasoning) datasets are collected and merged as part of LogiTorch.ai, ReasoningNLP, Prompt4ReasoningPapers, OpenAI/Evals, A Survey on Evaluation of Large Language Models and Reasoning Language Models: A Blueprint.
  • OpenAI Evals contributions: github.com/openai/evals#648, github.com/openai/evals#651.
  • Additional reinforcement learning projects: github.com/14H034160212/Explanation-Generation and github.com/14H034160212/lemo.
Abstract Extraction and Multi-Turn Dialogue System Advanced Institute of Information Technology, Peking University, Hangzhou, China
Research and Development Engineer 11/2019 – 02/2020
  • We developed and researched a robot-based system including automatic abstract extraction, text segmentation, theme prediction, and multi-turn question answering.
  • Investigation and standard documentation of robot-related technologies.
  • We built a well-encapsulated API to implement meeting record document processing based on the abstract extraction, text segmentation, and theme prediction.
HHH: An Online Medical Chatbot System Precision Driven Health & Orion Health, Auckland, New Zealand
Research Project Leader and Developer 11/2018 – 04/2019
  • We developed a medical text similarity algorithm called HBAM using Pre-trained Language Model and Knowledge Graph.
  • Compared with BERT and MaLSTM models, HBAM performs higher test accuracy than the two Deep Learning models respectively code (#star: 90+), news, recording and published paper (#citation: 80+) on ACSW-20.
  • NEW: Built a medical PII detection and redaction pipeline using spaCy, automatically identifying and replacing sensitive information (e.g., names, addresses, IDs) for privacy-preserving training and inference; project backed by AUT Venture investment (NZD 20,000).

Invited Speaker/Visiting Scholar

Conference Reviewer

Journal Reviewer

Magazine Guest Editor

Teaching/Grant Experience and Other Achievements

The University of Auckland

Monash University & Southeast University Joint Graduate School (Monash-SEU JGS)