Focusing on data systems
and open-source tooling.

I specialize in engineering large-scale data infrastructure, backend services, and generative AI systems. My work includes authoring the first native C++ Spark Connect client for the Apache Spark ecosystem and deploying production AI agents that support thousands of enterprise operators.

I prioritize system correctness, performance, and developer experience, working across components like gRPC transport layers, custom RAG pipelines, and high-throughput data pipelines.

3k+ Operators using deployed AI infrastructure
1st Native C++ Spark Connect client implementation
bn+ Row datasets optimized for low-latency queries
4+ Open-source repositories authored or maintained

Work History

06/2024 — Present Open Source

Contributor

Apache Spark Ecosystem

  • Designed and open-sourced the first native C++ client for Spark Connect, implementing the gRPC transport layer and Apache Arrow-based serialization to achieve API parity with the Python implementation.
  • Contributed new APIs and stability updates to the Spark Connect Rust connector, expanding test coverage and cross-language interoperability.
  • Authored and maintained technical documentation for the .NET connector and Kafka Delta Ingest integrations.
C++RustgRPCApache ArrowSpark Connect
11/2023 — Present Full-time

Software Engineer

Griffin Global Technologies · Nyeri, Kenya

  • Built and deployed production AI Agent infrastructure for enterprise search, data lineage automated tracking, and support automation supporting 3,000+ operators using Databricks, Vertex AI, LangChain, and AWS.
  • Developed QA retrieval systems that reduced internal investigation times by automating report parsing and eliminating manual multi-hour video reviews.
  • Integrated MLflow lifecycle management into automated agent workflows to support experiment tracking and multi-environment testing.
  • Optimized data delivery architectures to serve low-latency dashboard experiences over multi-billion row datasets using Delta Lake and Tableau.
  • Designed high-throughput ingestion setups using ClickHouse and Apache Druid for processing WSPR telemetry analytics.
  • Automated pipelines and operational KPI alerting using SQL, Python, and Databricks Workflows for near real-time anomaly detection.
DatabricksLangChainVertex AIAWSClickHouseDelta LakeMLflow
11/2023 — Present Open Source

Author

Render Cloud Development Kit · Rust & C++

  • Developed the open-source Render CDK for Rust and C++ to provide Infrastructure-as-Code automation capabilities for cloud-native applications prior to official provider support.
RustC++IaCCloud Infrastructure
03/2021 — 04/2023 Previous

Software & Audio Engineer

Rogue Radio · Nairobi, Kenya

  • Engineered a low-latency audio streaming system using JavaScript and open-source broadcast tooling, resolving critical stability bugs to ensure reliable 24/7 web streaming.
  • Improved browser-based tools for real-time monitoring and remote hardware control.
JavaScriptAudio StreamingWebRTC

Open Source Repositories

001

Spark Connect C++

The native C++ client implementation for Apache Spark Connect. Built the fundamental architecture, gRPC transport layers, Apache Arrow data serialization, and automated CI/CD validation setups to handle high-throughput workloads.

C++Apache SparkgRPCApache Arrow
002

Spark Connect Rust Connector

Contributor to the official Apache Spark Rust connector library. Built out new API endpoints, increased structural reliability by expanding integration and unit test coverage, and improved overall ecosystem interoperability.

RustApache SparkTesting
003

Rocket CLI

A command-line tool built in Rust for scaffolding Rocket web applications. Provides predictable directory layouts and options for setting up database backends like PostgreSQL or MongoDB.

RustCLIRocketPostgreSQLMongoDB
004

Render CDK for Rust & C++

An Infrastructure-as-Code development kit allowing teams to handle cloud deployments programmatically through native systems languages.

RustC++IaCDevOps

Technical Strengths

Languages

Rust
C++
Python
SQL
JavaScript

Data & AI Systems

Apache Spark
Databricks
Delta Lake
ClickHouse
Apache Druid

Generative AI & MLOps

LangChain / RAG
Vertex AI
MLflow
AWS (SageMaker, S3, EC2)
LLM Agent Design

Infrastructure & Systems

gRPC / Protobuf
Apache Arrow
CI/CD Pipelines
Infrastructure-as-Code

Get in touch.

Open to technical collaborations, open-source development roles, or engineering discussions regarding distributed systems, AI infrastructure, and systems languages.