AI Researcher

Location: Seattle, WA (On-site)

Company Description

BluePill AI builds AI Consumers — digital twins of real audiences trained on social, survey, and research data that replicate how humans actually think, decide, and behave. Brands use BluePill to test product concepts, packaging, messaging, and strategy in minutes instead of months.

Our models are grounded in real human data and validated against live human panels, achieving up to 93% accuracy in replicating human responses. We obsess over model behavior, failure modes, and cultural drift — continuously refining systems so they stay aligned with how people actually think in the real world.

We’re not building demos. We’re building a new category of consumer intelligence.

Role Description

We’re looking for an AI Researcher who deeply understands LLMs from the inside — someone who doesn’t just call APIs, but tinkers, probes, breaks, fine-tunes, builds and rebuilds models to understand how and why they behave the way they do.

This role sits at the intersection of research and production. You’ll experiment aggressively, turn insights into shipped systems, and help define how AI Consumers are modeled, evaluated, and improved over time.

If you’ve spent nights testing prompt structures, tweaking sampling strategies, building eval harnesses, or chasing down weird model behaviors just because you were curious — you’ll feel at home here.

What You’ll Do

Design, experiment with, and optimize LLM-based systems for simulating human judgment, preference, and decision-making
Go beyond “prompting” — work with fine-tuning, embeddings, retrieval, memory, reasoning scaffolds, and evaluation frameworks
Design and run rigorous experiments to measure model improvements using sound experimental design and statistical analysis
Build and iterate on model evaluation pipelines to measure realism, consistency, bias, drift, calibration, and alignment with human data
Analyze LLM failure modes and edge cases, including issues related to uncertainty, truthfulness, and overconfidence, and design interventions to fix them
Translate research insights into production-ready systems used by real customers
Collaborate closely with product, behavioral science, and engineering to ship end-to-end features
Stay close to the frontier: experiment with new models, papers, and techniques — and decide what’s actually worth using

What We’re Looking For

Must-Have

Deep hands-on experience working with LLMs (OpenAI, Anthropic, open-source, or similar)
Strong intuition for how LLMs behave internally — not just how to use them
Experience building real products or systems with LLMs in production
Strong foundation in NLP, neural networks, and machine learning fundamentals
Proficiency with Python and modern ML tooling
Demonstrated proficiency in experimental design and statistical analysis for evaluating and improving models
Understanding of uncertainty estimation, calibration, and truthfulness in model outputs
Comfort moving between messy experiments and clean, scalable implementations

Bonus Points

Experience working in deep tech environments (hard problems, long feedback loops, non-obvious failure modes)
Experience training LLMs from scratch or at significant scale
Experience with fine-tuning, RLHF-style techniques, or large-scale evaluation systems
Familiarity with PyTorch, TensorFlow, JAX, or distributed training setups
Experience working with noisy, real-world human data

How We Think About This Role

This is not a purely academic research role
This is not a “AI engineer” position
This is a builder–researcher role for someone who loves understanding models by playing with them
Curiosity, taste, and judgment matter as much as credentials

Education

(Not required) Bachelor’s or Master’s degree in Computer Science, AI, ML, or a related field preferred
Exceptional self-taught engineers with strong real-world experience are encouraged to apply

Salary: $160K – $200K, Meaningful Equity, 100% health benefits

Apply for job