The AI Interview: Alice Xiang, Sony
Summary
Alice Xiang, Global Head of AI Governance at Sony, advocates for a fundamental shift in responsible AI development, emphasizing the critical importance of training data over model outputs. She argues that biases and flaws in AI systems almost always originate in the underlying datasets, often due to a lack of clear standards for responsible data collection. Xiang highlights the ethical and technical risks of web-scraped data, which frequently lacks consent and embeds cultural stereotypes, leading to discriminatory patterns and privacy violations. As an alternative, Sony AI developed FHIBE (Fair Human-Centric Image Benchmark), a dataset built on direct participant consent and broader demographic representation. While acknowledging the significant resources and perceived trade-offs in visual diversity required for such ethical sourcing, Xiang asserts that these efforts strengthen trust, reduce regulatory risks, and improve model performance, anticipating increased pressure for responsibly-sourced datasets as regulation tightens.
Key takeaway
For Directors of AI/ML evaluating data sourcing strategies, you should prioritize investing in consensually-sourced datasets over traditional web scraping. This approach, though resource-intensive, significantly mitigates ethical and technical risks like embedded biases and privacy violations, while strengthening customer trust and improving model performance. Anticipate increasing regulatory scrutiny, making ethical data collection a strategic imperative for long-term viability and responsible AI deployment.
Key insights
Responsible AI demands a deeper focus on ethically sourced training data, not just model outputs, to prevent inherent biases.
Principles
- Bias in AI systems originates in training data.
- Ethical data practices build trust and reduce risk.
- Technical innovation outpaced ethical guidance in data collection.
Method
Develop consensually-sourced datasets like FHIBE, collecting images directly from participants with explicit consent, ensuring broader demographic representation and systematic bias assessment.
In practice
- Prioritize data quality over quantity.
- Coordinate legal, privacy, and technical teams.
- Engage data subjects meaningfully.
Topics
- AI Governance
- Responsible AI
- Training Data Bias
- Data Ethics
- Consent-Based Data
- FHIBE Dataset
- Web Scraping Risks
Best for: CTO, VP of Engineering/Data, Executive, AI Ethicist, Director of AI/ML, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Magazine.