The AI Interview: Alice Xiang, Sony

2026-07-01 · Source: AI Magazine · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cybersecurity & Data Privacy · Depth: Intermediate, medium

Summary

Alice Xiang, Global Head of AI Governance at Sony, advocates for a fundamental shift in responsible AI development, emphasizing the critical importance of training data over model outputs. She argues that biases and flaws in AI systems almost always originate in the underlying datasets, often due to a lack of clear standards for responsible data collection. Xiang highlights the ethical and technical risks of web-scraped data, which frequently lacks consent and embeds cultural stereotypes, leading to discriminatory patterns and privacy violations. As an alternative, Sony AI developed FHIBE (Fair Human-Centric Image Benchmark), a dataset built on direct participant consent and broader demographic representation. While acknowledging the significant resources and perceived trade-offs in visual diversity required for such ethical sourcing, Xiang asserts that these efforts strengthen trust, reduce regulatory risks, and improve model performance, anticipating increased pressure for responsibly-sourced datasets as regulation tightens.

Key takeaway

For Directors of AI/ML evaluating data sourcing strategies, you should prioritize investing in consensually-sourced datasets over traditional web scraping. This approach, though resource-intensive, significantly mitigates ethical and technical risks like embedded biases and privacy violations, while strengthening customer trust and improving model performance. Anticipate increasing regulatory scrutiny, making ethical data collection a strategic imperative for long-term viability and responsible AI deployment.

Key insights

Responsible AI demands a deeper focus on ethically sourced training data, not just model outputs, to prevent inherent biases.

Principles

Bias in AI systems originates in training data.
Ethical data practices build trust and reduce risk.
Technical innovation outpaced ethical guidance in data collection.

Method

Develop consensually-sourced datasets like FHIBE, collecting images directly from participants with explicit consent, ensuring broader demographic representation and systematic bias assessment.

In practice

Prioritize data quality over quantity.
Coordinate legal, privacy, and technical teams.
Engage data subjects meaningfully.

Topics

AI Governance
Responsible AI
Training Data Bias
Data Ethics
Consent-Based Data
FHIBE Dataset
Web Scraping Risks

Best for: CTO, VP of Engineering/Data, Executive, AI Ethicist, Director of AI/ML, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Magazine.