Back to Careers

AI Engineer, Multimodal (Vision + Speech)

Full-time Hybrid Frisco, TX (Hybrid)

Responsibilities

  • Build multimodal AI features involving vision, speech, and text understanding.
  • Fine-tune and evaluate vision and speech models for domain-specific tasks.
  • Optimize inference pipelines for real-time interaction.
  • Collaborate with product teams to design multimodal user experiences.
  • Contribute to internal tooling and evaluation frameworks.

Requirements

  • 3+ years of experience working with computer vision or speech models.
  • Hands-on experience with PyTorch, TensorFlow, or JAX.
  • Familiarity with multimodal model architectures (e.g., vision-language models).
  • Experience with real-time audio or video processing pipelines is a plus.
  • Strong math background (linear algebra, probability, optimization).

Technologies & Skills

Computer VisionSpeech AIMultimodal ModelsPythonPyTorch

Ready to apply?

Join the Aionyx team and help us build the future of intelligent engineering.