About the Role
As a Senior Data Warehouse Engineer, you’ll architect and maintain a highly flexible, enterprise-scale data warehouse that powers both Web3 analytics and real-time online services. Leveraging deep expertise in data modeling, governance, and Big Data technologies (Hadoop, Spark, Hive, Flink, Kafka, etc.), you will design low-latency, high-throughput pipelines that ensure high quality, stability, and sub-second freshness for critical business applications. Your role spans building scalable ETL and streaming data pipelines, optimizing data serving for end users, and ensuring the platform meets the demands of petabyte-scale, Web3-native environments. You’ll collaborate across business and technical teams, mentor peers, and continuously evolve our data platform to support high-reliability services and innovative AI/Web3 applications.
Responsibilities
- Architect and implement a flexible, scalable data warehouse that supports Web3 analytics, on-chain/off-chain data integration, and real-time online services, accelerating delivery and reducing redundant development.
- Design, develop, test, deploy, and monitor both batch and streaming pipelines (ETL/ELT), ensuring <1s to minutes latency with strong SLAs for availability and stability.
- Build and optimize data models for transactions, assets, and social data; deliver pipelines with high concurrency and query performance to power Web3 insights.
- Lead governance initiatives by establishing metadata management and 100% DQC coverage, ensuring compliance, consistency, and trustworthiness of Web3 data assets.
- Partner with product and business teams to translate Web3 research, trading, and monitoring needs into robust data services, enabling AI-driven applications (e.g., LLM-based semantic query, intelligent monitoring).
- Mentor and guide peers in real-time data engineering best practices, fostering a culture of technical excellence and innovation.
Requirements
- 5+ years of hands-on experience designing and developing data lakes and data warehouse solutions.
- Deep expertise in data warehouse modeling and governance, including dimensional modeling, information factory (data vault) methodologies and “one data” principles.
- Proficiency in at least one of Java, Scala or Python, plus strong Hive & Spark SQL programming skills.
- Practical experience with OLAP engines (e.g., Apache Kylin, Impala, Presto, Druid) and real-time serving systems.
- Proven track record both high-throughput batch pipelines and low-latency streaming pipelines (e.g., Flink, Kafka), with production SLAs for stability, availability, and sub-second freshness.
- Familiarity with core Big Data technologies (Hadoop, Hive, Spark, Flink, Delta Lake, Hudi, Presto, HBase, Kafka, Zookeeper, Airflow, Elasticsearch, Redis).
- Experience in Web3 data domains (on-chain/off-chain data, token/transaction/holder analytics) and ability to design data services powering online applications.
- AWS Big Data service experience is a plus.
- Strong analytical and system-design capabilities, with a clear understanding of business requirements into scalable, high-quality data architectures.
- Collaborative mindset, skilled at building partnerships across teams and stakeholders.
- Preferred: Experience managing petabyte-scale data in Internet environments and resolving critical real-time production incidents.
- Bilingual English/Mandarin is required to be able to coordinate with overseas partners and stakeholders.