Back to Jobs

Binance

We're hiring

LLM Applied Data Scientist (RAG/ NLP)

Taiwan, Taipei

About this role

About the Role

We are seeking a highly skilled Research Scientist/Engineer to advance the reasoning and planning capabilities of large foundation models. In this role, you will enhance model performance across the entire development lifecycle—including data acquisition, supervised fine-tuning (SFT), reward modelling, and reinforcement learning—while driving innovations in reasoning and decision-making. You will synthesise large-scale, high-quality datasets through rewriting, augmentation, and generation techniques to strengthen foundation models during pretraining, SFT, and RL stages. A key part of the role involves solving complex tasks using System 2 thinking and applying advanced decoding strategies such as MCTS and A*. You will design and implement robust evaluation methodologies, teach models to interact with external tools, APIs, and code interpreters, and build agents and multi-agent systems capable of addressing sophisticated real-world problems.

Responsibilities

  • Design, develop, and optimize data processing and retrieval pipelines for enterprise-level generative tasks and mode training applications (Customer Service, Token Report, Web3 Domain Models). This includes embedding, reranking, context engineering, and query rewriting models.
  • Research and evaluate advanced AI-native retrieval algorithms (e.g., low-latency, multimodal retrieval, hierarchical retrieval, GraphRAG) to strengthen large-scale LLM/VLM/Agentic AI capabilities in Binance products.
  • Collaborate with infrastructure and application teams to integrate RAG pipelines into production systems, ensuring scalability, reliability, and measurable business impact.
  • Develop and optimize retrieval and ranking pipelines (indexing, vector search, retrieval scoring, reranking) to improve user experience.
  • Participate in LLM training and RAG system, staying current with techniques such as pre-training, SFT, and reinforcement learning, and apply them to retrieval and generation tasks.
  • Apply NLP, CV, and multimodal methods to analyze user-generated content (classification, quality evaluation, trend detection, comment analysis).

Requirements

  • Master’s in Information Retrieval, NLP, Machine Learning, Computer Vision, Multimodal Learning, or related fields.
  • Proficient in PyTorch with strong coding skills in Python or C++.
  • Strong communication skills, intellectual curiosity, and passion for lifelong learning. Able to identify opportunities and drive cutting-edge retrieval & RAG technologies into real-world applications.
  • Solid theoretical foundation in information retrieval, NLP, and deep learning (experience with embeddings, reranking, query understanding preferred).
  • Hands-on experience with RAG, vector databases, multimodal/graph retrieval, or large-scale AI systems.
  • Strong engineering ability to translate research into scalable, production-level systems.
  • Self-driven, able to own projects end-to-end (design → implementation → deployment).
  • Publications in top-tier conferences/journals (NeurIPS, ICML, ACL, CVPR, SIGIR, KDD, WWW) are a plus; awards in ACM/ICPC or similar competitions preferred.

Check Your ATS Score

See how well your resume matches this LLM Applied Data Scientist (RAG/ NLP) position and get instant optimization tips.

Check ATS Score →