Back to Careers

AI Engineer, Multimodal (Vision + Speech)

Full-time Hybrid Frisco, TX (Hybrid)

Responsibilities

Build multimodal AI features involving vision, speech, and text understanding.
Fine-tune and evaluate vision and speech models for domain-specific tasks.
Optimize inference pipelines for real-time interaction.
Collaborate with product teams to design multimodal user experiences.
Contribute to internal tooling and evaluation frameworks.

Requirements

3+ years of experience working with computer vision or speech models.
Hands-on experience with PyTorch, TensorFlow, or JAX.
Familiarity with multimodal model architectures (e.g., vision-language models).
Experience with real-time audio or video processing pipelines is a plus.
Strong math background (linear algebra, probability, optimization).

Technologies & Skills

Computer VisionSpeech AIMultimodal ModelsPythonPyTorch

Ready to apply?

Join the Aionyx team and help us build the future of intelligent engineering.

Questions?

careers@aionyxtech.com