AI Engineer-LLM Integration/Platform Observability ()
Job Skills
Job Description
Job Description--
We are looking for a hands-on AI Engineer with strong expertise in LLM integration, platform observability, performance optimization, and API development. The ideal candidate will work on critical platform enhancements, including LLM API integrations, observability pipelines, structured search algorithms, and performance scaling for customer's AI platform and related components.
You will collaborate with cross-functional teams to develop robust, scalable solutions, modernize our logging and monitoring infrastructure, and integrate advanced AI capabilities into production workflows.
Key Responsibilities:
1. LLM Integration & API Development
- Develop and maintain LLM API integration test cases for core model availability.
- Refactor and reorganize LLM API code (e.g., __init__.py) for better maintainability.
- Add support for Vertex AI batch generation and batch transcription processing.
- Implement multi-step structured search algorithms and tie model IDs to relevant endpoints.
- Explore and integrate emerging technologies like LightRAG, SurrealDB, Neo4j, and Puppygraph for structured search.
2. Platform Observability & Performance
- Implement Splunk OpenTelemetry (OTel) integration for monitoring and metrics.
- Evaluate and integrate Arize AI for observability and model evaluation frameworks.
- Optimize logging decorators, memory profiling for unit tests, and enhance APM (Application Performance Monitoring) solutions.
- Drive scaling and performance optimization for the JedAI platform.
3. Platform Integration & Testing
- Implement platform integration and availability testing frameworks.
- Centralize Postman test cases for integration testing.
- Clean up outdated tests and modernize Docker Compose setups for KB API development.
- Develop harness configurations for automated testing pipelines.
4. Architecture & Research Spikes
- Support JedAI architecture consulting efforts.
- Conduct spike investigations on new technologies and frameworks for performance and scalability.
- Explore MCP design options for multi-agent orchestration and AI-enhanced workflows.
Required Skills & Experience:
- Programming: Python (must-have), Node.js/Java (good to have)
- AI/ML Integration: Hands-on experience with LLM APIs (OpenAI, Vertex AI, etc.)
- Observability & Logging: Experience with Splunk, OpenTelemetry (OTel), Arize AI
- Testing & CI/CD: Proficiency with Postman, Pytest, Docker Compose
- Data & Search: Exposure to structured search techniques (Neo4j, LightRAG, Graph DBs)
- Performance Tuning: Familiarity with memory profiling, performance optimization, and scaling techniques
- Cloud Platforms: GCP (Vertex AI), Azure, or AWS experience preferred
- Collaboration Tools: GitHub, Jira, Confluence
Preferred Qualifications:
- Bachelor's or Master's in Computer Science, AI/ML, or related fields
- 3–6 years of experience in AI/ML engineering or platform development
- Prior experience in AI observability or model evaluation pipelines
- Knowledge of Agentic AI frameworks and multi-step reasoning systems