Job description
AI Data & Knowledge Engineer (Vector + Semantic Intelligence)
Location: Hyderabad, India
Employment Type: Full-Time; Salaried
Compensation: Base Salary, Bonus, Stock Options, Medical
Job Description
About Us
Innovapptive is a category-defining SaaS company building the world’s most advanced AI-powered Connected Worker Platform for the industrial sector. Headquartered in Houston, TX, and backed by Vista Equity Partners, Innovapptive is on a mission to reimagine how industrial work gets done — connecting frontline workers, back-office systems, and assets in real time to drive higher safety, reliability, and productivity.
Our platform is trusted by some of the world’s largest enterprises — including Shell, Hess, Cenovus Energy, W.R.Grace, Westlake Chemicals, UNICEF, Kimberly Clark, Scott Miracle Gro, and Newmont Mining to name a few, and across chemicals, energy, mining, and manufacturing industries. These customers have achieved tangible, hard-dollar results, such as:
- Over $40 million in annual EBITDA savings at a single enterprise.
- 10× frontline productivity improvements through mobile-first and AI-enabled workflows.
- 15–20% maintenance cost reductions and significant uptime gains across plants and sites.
Innovapptive is recognized by leading industry analysts for its differentiated value creation model:
- Named a Leader in the Frost & Sullivan Industrial Connected Worker Radar.
- Featured by Gartner in “Connected Factory Worker” research and the Hype Cycle for Manufacturing Operations Strategy.
- Cited by LNS Research for delivering 3–5× greater value than point solutions through unified OT–IT execution, AI/vision innovation, and field-proven ROI.
Our growth is backed by Tiger Global Management and Vista Equity Partners, one of the world’s premier software investors, known for scaling high-performance SaaS companies. Together, we are building the next-generation AI Powered Connected Worker Platform, combining cutting-edge technologies — SLM’s, Generative AI, Computer Vision, and Intelligent Orchestration — into a unified system that transforms plant operations globally.
Today, with over 300+ employees across the U.S., India, and ANZ, Innovapptive stands at a pivotal inflection point: with an ambition to scale to $100M ARR within the next 3-4 years by industrializing every aspect of our product, engineering, and customer delivery systems.
The Role
- The AI Data & Knowledge Engineer will architect and operationalize Innovapptive’s semantic data intelligence layer — building the pipelines, vector stores, and retrieval frameworks that supply contextual understanding to AI Agents and enterprise workflows.
- Reporting to the VP of Technology & Architecture, this role is responsible for designing the RAG (Retrieval-Augmented Generation) and Vector Embedding pipelines that connect industrial data (SOPs, manuals, logs, SCADA readings, SAP records) with Innovapptive’s AI runtime.
- This is a hands-on, cross-disciplinary engineering role, blending data architecture, ML engineering, and semantic search design to make Innovapptive’s AI Agents contextually aware, accurate, and reliable.
How You Will Make An Impact
1. Architect the AI Knowledge and Data Layer
- Design and implement data ingestion and embedding pipelines to convert structured and unstructured content into vectorized representations.
- Build a unified data schema connecting maintenance, production, and safety data across SAP, Maximo, OSI PI, and SCADA systems.
- Integrate vector databases (Pinecone, Weaviate, Qdrant, or Chroma) into the AI Platform (MCP) to enable context-aware retrieval.
- Optimize query efficiency and relevance through hybrid search (semantic + keyword) and metadata tagging.
2. Operationalize RAG (Retrieval-Augmented Generation)
- Implement document chunking, embedding, and retrieval pipelines for PDFs, work orders, shift logs, and incident reports.
- Develop automated retraining and re-indexing mechanisms to ensure freshness of data.
- Collaborate with AI Platform Architect to link retrieval flows into agent orchestration layers.
- Validate precision, recall, and latency metrics for semantic retrieval using real production workloads.
3. Build AI Data Governance and Observability
- Define data lineage, quality metrics, and access control for AI knowledge repositories.
- Embed telemetry for data latency, embedding drift, and retrieval accuracy into Datadog/Sentry dashboards.
- Partner with the Chief AI Architect to enforce compliance, explainability, and prompt context versioning standards.
4. Collaborate Across Product and Engineering
- Work with Product Managers and Solution Architects to identify key use cases for AI-driven search and knowledge retrieval.
- Partner with QA to build automated test frameworks for semantic accuracy and retrieval reliability.
- Collaborate with industrial data teams to extract and normalize sensor, historian, and SAP data for RAG integration.
5. Drive Continuous Innovation
- Evaluate emerging frameworks for knowledge graphs, embeddings, and contextual caching (e.g., LlamaIndex, LangChain, FAISS).
- Tune embeddings and hybrid retrieval strategies for domain-specific industrial vocabulary.
- Mentor developers on data preparation and retrieval design for AI-integrated product features.
What You Bring to The Team
- 8 – 12 + years of data or ML engineering experience, with 3 + years in semantic search, RAG, or vector database architecture.
- Proficiency with Python, SQL, and frameworks such as LangChain, LlamaIndex, or Haystack.
- Hands-on experience with vector databases (Pinecone, Weaviate, Qdrant, Chroma) and cloud data stores (AWS S3, DynamoDB, Redshift).
- Deep understanding of embedding models (OpenAI, Cohere, Sentence Transformers) and performance tuning for large-scale retrieval.
- Strong data pipeline experience (Airflow, Kafka, Temporal) and understanding of MLOps fundamentals.
- Familiarity with industrial data (SAP, Maximo, OSI PI, SCADA, MES) preferred.
- Excellent communication and documentation skills — able to translate data architecture into business and engineering language.
Success Metrics (FIRST 90-180 Days)
- Vector Data Layer deployed with initial knowledge embeddings across 2 core domains (Maintenance + Safety).
- RAG pipelines operational, delivering ≥ 90 % retrieval precision for selected test datasets.
- Telemetry dashboards live, showing retrieval latency, accuracy, and data freshness.
- Data-to-Agent API integrated into MCP and adopted by 2+ AI Agent families.
- Knowledge Playbook published — reusable design patterns for data ingestion, embeddings, and retrieval governance.
Why does this Role Matter?
- The AI Data & Knowledge Engineer is the intelligence enabler behind every AI Agent.
Without a robust, governed, and high-precision knowledge layer, AI features remain shallow and disconnected.
- This role transforms Innovapptive’s platform into a contextually aware, continuously learning system — where every AI decision is grounded in trusted enterprise and field data.
What We Offer
- Competitive compensation and equity tied to measurable impact on AI accuracy and performance.
- A platform to shape the semantic intelligence layer of a category-defining industrial SaaS company.
- Hybrid work model — Hyderabad or remote with periodic travel to Houston HQ.
- Access to cutting-edge AI, data, and observability toolchains for continuous learning and innovation.
Innovapptive does not accept and will not review unsolicited resumes from search firms.
Innovapptive is an equal opportunity employer and is committed to a diverse and inclusive workplace. Qualified applicants will receive consideration for employment without regard to race, color, religion or creed, alienage or citizenship status, political affiliation, marital or partnership status, age, national origin, ancestry, physical or mental disability, medical condition, veteran status, gender, gender identity, pregnancy, childbirth (or related medical conditions), sex, sexual orientation, sexual and other reproductive health decisions, genetic disorder, genetic predisposition, carrier status, military status, familial status, or domestic violence victim status and any other basis protected under federal, state, or local laws.








