A Privacy-Preserving Framework Using Remote Data Science for Inter-Institutional Student Retention Prediction

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cybersecurity & Data Privacy · Depth: Advanced, quick

Summary

A new privacy-preserving machine learning (PPML) framework, utilizing Remote Data Science (RDS) and the PySyft platform, enables secure inter-institutional student retention prediction. This framework employs a semi-air-gapped architecture with high-side and low-side servers, allowing researchers from three universities to build models on sensitive student data without direct access. Evaluated using historical data from a small private university (N=720), it demonstrated consistent classification performance (Macro F1: 0.690-0.695) while ensuring Family Educational Rights and Privacy Act (FERPA) compliance. The study also introduced Data-Type-Aware Templates, a novel synthetic data method prioritizing privacy. This RDS-based PPML offers a practical alternative to federated learning for small-scale inter-institutional collaborations.

Key takeaway

For data scientists or researchers collaborating on sensitive educational data, this RDS-based PPML framework offers a robust solution for inter-institutional student retention prediction. You can achieve consistent model performance (Macro F1: 0.690-0.695) while maintaining strict FERPA compliance. Consider adopting this approach as a practical alternative to federated learning for your small-scale, privacy-critical collaborations.

Key insights

A remote data science framework enables secure, collaborative student retention prediction without direct data sharing.

Principles

Method

The RDS framework uses PySyft with high-side/low-side servers, allowing researchers to build models on sensitive data without direct access, evaluating synthetic data generation methods.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.