Our research team publishes peer-reviewed work on annotation methodology, data quality metrics, and human-AI collaboration frameworks.
Peer-reviewed papers from top venues including NeurIPS, ICML, ACL, and CVPR.
We introduce a novel consensus mechanism that dynamically weights annotator contributions based on demonstrated expertise and agreement patterns, achieving 12% improvement in label accuracy.
This paper presents a multi-axis preference framework that captures nuanced human judgments across helpfulness, accuracy, safety, and style dimensions simultaneously.
We demonstrate conditions under which carefully validated synthetic data achieves superior downstream performance compared to purely human-generated datasets across 8 NLP benchmarks.
We propose an active learning strategy specifically designed for LiDAR annotation that intelligently selects the most informative frames for human labeling.
Rather than treating annotator disagreement as noise, we show how modeling disagreement distributions improves model calibration and uncertainty estimation.
We present a comprehensive framework for identifying demographic biases introduced during the annotation process and propose mitigation strategies that preserve data utility.
Developing better frameworks for task design, annotator training, quality measurement, and consensus mechanisms that scale to millions of labels.
Advancing preference learning, reward modeling, and human feedback collection methods for safer, more helpful AI systems.
Understanding when and how synthetic data can augment or replace human-generated training data while maintaining quality and diversity.
Detecting, measuring, and mitigating biases that arise during data collection and annotation processes across different demographic groups.
Designing optimal workflows where AI assists human annotators, studying the effects of AI pre-labeling on human judgment and efficiency.
Developing new metrics and statistical methods for measuring data quality, annotator reliability, and label confidence at scale.
We believe in open science. Our tools and datasets are available for the research community.
Open-source library for computing annotation quality metrics including inter-annotator agreement, consensus scores, and label confidence estimation.
A standardized benchmark for evaluating RLHF preference data quality, including human agreement baselines and automated quality predictors.
Toolkit for validating synthetic training data quality. Includes decontamination checks, diversity scoring, and automated quality filtering.
Framework for detecting and mitigating demographic biases in annotation data. Includes bias metrics, visualization tools, and mitigation strategies.
PhD researchers from top institutions working on the hardest problems in data quality and human-AI collaboration.
We partner with academic institutions and research labs. Reach out to discuss collaboration opportunities.
Contact research team