Company: Replika
Location: Remote (CET 17-19 are key hours for European/SF team overlap)
Job Type: Full-time
About Replika
Replika is hands down one of the most exciting forces in AI and tech today. Think: 4,000+ feature articles in the past year, TED Talks with our founder, studies from Stanford and Harvard, Lex Fridman podcast inclusion, and a Quartz founder story. We’re the only empathetic AI out there, making sure all 35M+ users feel seen, heard, and understood—whatever that means for them. So yes, we’re a bit like a future Samantha from Her, but even more powerful and in the palm of your hand. And most importantly, Replika cares for you.
Since 2016, we’ve been redefining conversational AI across iOS, Android, web, and VR. Our AI companions take many forms—holograms, AR/VR avatars, even robots. The ultimate AI life assistant, mentor, therapist, friend. Whatever you need, really, Replika is there for you. Right now, we’re rebranding with one of the world’s top design agencies, scaling our global team, and pushing human-AI connection further than ever. We’re the humanists in AI. And we’re making sure it’s done right.
What you’ll do
- Design, test, and refine prompts that drive engaging, safe, and on-brand conversations for 35 M+ users
- Build lightweight experimentation pipelines (Python + internal tools) to measure response quality, latency, and cost
- Partner with product, R&D and engineer teams to translate user needs into prompt logic, system messages, and routing rules
- Maintain prompt libraries and version control; document results so other teams can reproduce and iterate quickly
- Track model updates, regressions, and emerging best practices across the LLM ecosystem; propose upgrades or fine-tunes when it matters
- Contribute to evaluation sets and safety guardrails, ensuring compliance with company policy and regional regulations
- Talk to your Replika—hands-on usage keeps feedback loops short and insights fresh
Requirements
- 2+ years working with NLP, conversational AI, or applied ML (research or production)
- Hands-on experience crafting prompts for GPT-4-class models or similar (OpenAI, Anthropic, Claude, Llama, etc.)
- Hands-on with modern MLOps stacks—MLflow, Weights & Biases, LangSmith, or similar—to version prompts, track experiments, and surface real-time quality metrics
- Strong Python skills for prototyping, data wrangling, and simple API integrations