AI Data Specialist

  • SGS Consulting
  • Menlo Park, California
  • Full Time
Location (mandatory): Menlo Park, CA Despite rapid advancements in generative AI, achieving high-quality generation remains a challenge. This is primarily due to the scarcity of high-quality training data and the lack of reliable and robust evaluation metrics that can effectively capture subtle details in model outputs, which can significantly impact user experience. Our team is dedicated to developing comprehensive data curation and evaluation solutions to enhance our model across various quality dimensions. These include visual quality, prompt adherence, identity preservation, naturalness, and visual text generation, among others. We employ diverse approaches, such as sourcing billions of images and identifying suitable ones through a combination of manual annotations and signals from machine learning models. We also utilize both manual and automated evaluation methods to pinpoint quality gaps and data requirements. Job Responsibilities: Data Curation: Manage data labeling workflows, including data enqueueing for labeling, UI for labeling, and extracting labels into datasets for the modeling team. Data Engineering (Pipelines): Maintain large-scale, efficient, and reliable data processing pipelines (billions of images). This includes data sourcing, running machine learning models to understand content, and using LLMs to clean data. Data Engineering (Governance): Maintain our portfolio of datasets, ensuring governance of access, retention, and privacy compliance. Annotations Spend time manually annotating training data based on modeling team requirements. Use of LLMs and other models to annotate training data or to evaluate generated content. Then apply auditing to understand this model performance. Analysis: Collaborate with engineers to identify and summarize model gaps based on evaluations. Utilize these findings to identify necessary data, and then mine and prepare that data for subsequent model training iterations. Skills: Verbal and written communication skills, problem solving skills, and interpersonal skills. Attention to details and an aptitude to experimental investigations Basic ability to work independently and manage one's time. Basic knowledge of Python, and SQL. Basic knowledge of computer vision and generative models. Basic knowledge with data ETL workflows & pipelines. Usage of LLM for data labeling related work. Verbal and written communication skills, problem solving skills, and interpersonal skills. Basic knowledge of Python, Unix, and SQL. Basic knowledge of computer vision and generative models Education/Experience: Associate's degree or equivalent training required in Computer Science, Electronic Engineering, Physics, Bioinformatics, or other STEM subjects. Prior industrial experience in software development and testing and / or research experience in human computer interaction are preferred. Worked at MAANG before is preferred 1-2 years experience
Job ID: 518607524
Originally Posted on: 4/24/2026

Want to find more Admin opportunities?

Check out the 83,713 verified Admin jobs on iHireAdmin