Inference Engineer

Remote / SF / Europe · Mirai · Partners · Vacancies

About us

Mirai builds the fastest on-device inference engine for Apple Silicon. In under a year, a 14-person team built a full stack, from model optimization to a proprietary runtime, outperforming MLX and llama.cpp on supported models.

We’re making local inference practical, fast, and reliable for real products.

Why us?

Mirai is founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (200M+ users, backed by Andreessen Horowitz) and Prisma (100M+ users). Our team is small (14 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Responsibilities

You'll work across our inference engine and model conversion toolkit, implementing new model architectures, supporting new modalities, writing optimized kernels, and building a wide range of features such as function calling and batch decoding.

This role is ideal for someone who reads papers for fun, enjoys writing high-performance code, and gets excited about constant learning.

Requirements

JAX / Equinox / Pallas stack
Rust systems programming with a focus on developer experience
Writing Metal / Vulkan kernels
Neural codecs and voice model architectures
Trellis-based quantization approaches
Advanced speculative decoding methods, such as EAGLE
Deep understanding of Transformer / SSM / Diffusion / Vision language models
Benchmarking inference performance and model quality
Strong linear algebra, optimization methods, and probability theory

Conditions

Remote / SF / Europe

Share this job opening

Application:

First name

Last name

Phone number

Cover letter

Link to CV (If You Have One)

Upload CV

I agree to the processing of my personal data in accordance with the AlumniHub Privacy Policy