Machine Learning Operations Engineer (AI / LLM / MLOps)
Location: Remote
Employment Type: Full-time (Part-time possible with option to transition to full-time)
About Us
CoeFont provides high-quality AI voice synthesis products and AI real-time translation products powered by artificial intelligence. One of our flagship products, Oshaberi Hiroyuki Maker, achieved over 400 million characters of voice output and more than 10.9 million video generations within just one week of its release.
Both are products with AI at their core, and we are seeking an MLOps Engineer to take a leading role in their development.
Employee Value Proposition
At CoeFont, we offer the following career opportunities for Engineers of Product (MLOps):
- Deep involvement in product development
Work closely with product managers and UX designers, engaging consistently from planning and design through to release of our AI voice synthesis and AI real-time translation products. You will gain hands-on experience in directly contributing to product growth. In addition, you will collaborate with the research team to deliver AI models to users quickly and reliably.
- Acquisition of diverse technical skills
Beyond improving reliability and availability through site reliability engineering and infrastructure design, you will gain a wide range of technical expertise spanning backend development using Python and Golang, GPU inference platform design and operation, and batch workflow development.
- Exercising leadership across the engineering organization
Take responsibility for leading architectural design, providing technical guidance, and conducting code reviews for engineers inside and outside your team—helping accelerate the growth of the entire engineering organization.
Performance Goals
After joining, MLOps Engineers are expected to:
- Maximize product value
Integrate features like voice-quality enhancement and real-time translation models into the roadmap. Participate from planning through PoCs, ensuring fast iteration, testing, and continuous delivery of user value.
- Lead development of a scalable MLOps platform
Own MLOps infrastructure, including GPU inference servers (ECS) and workflows (AWS Step Functions/Batch). Design monitoring, alerting, automation, and cost management from an SRE perspective, while also contributing to backend and infrastructure development.
- Provide technical leadership
Demonstrate leadership across teams, promote AI adoption, and support workshops and enablement efforts to raise AI literacy and overall organizational productivity.
Technical Stack
Languages
- Python (main)
- Golang (sub)
Infrastructure & Tools
- AWS ( EC2, ECS, Lambda, Step Functions, AWS Batch, ECR, VPC, WAF, DynamoDB, Aurora )
- GCP ( BigQuery )
- Datadog, Sentry
- Github Actions