Building bilingual NER for cargo logistics with Amazon Bedrock
Summary
IBS Software developed a bilingual Named Entity Recognition (NER) system for its cargo logistics operations, processing thousands of daily email messages in both English and Japanese. The system extracts 23 critical entity types, including air waybill numbers, flight details, and weights. Facing challenges with manual intervention, accuracy-cost trade-offs, and complex open-source implementations, IBS Software adopted Amazon Bedrock's managed distillation. They successfully distilled knowledge from Amazon Nova Pro into the more efficient Amazon Nova Lite model. This approach achieved a 95.085% F1-Score accuracy, reduced operational costs by 14x, and enabled real-time processing of cargo emails. The development involved annotating 500 bilingual emails and training the student model for 4 epochs over 70 steps, resulting in a loss reduction from 0.05 to 0.008.
Key takeaway
For AI Engineers building bilingual NER systems or optimizing inference costs, Amazon Bedrock's managed distillation offers a robust solution. You can achieve high accuracy, like the 95.085% F1-Score seen, while significantly reducing operational expenses by up to 14x. Consider investing in high-quality bilingual data annotation and exploring Bedrock's distillation capabilities to deploy efficient, real-time models without complex infrastructure management.
Key insights
Amazon Bedrock's managed distillation enables cost-effective, high-accuracy bilingual NER by transferring knowledge from large to smaller models.
Principles
- Distillation retains 98% teacher model performance.
- Managed services simplify complex ML pipelines.
- High-quality bilingual data is crucial for accuracy.
Method
Annotate 500 bilingual emails for 23 entity types. Configure Amazon Bedrock distillation (Nova Pro to Nova Lite) with token-level KL divergence loss. Train student model for 4 epochs, 70 steps. Deploy via S3, Lambda, DynamoDB.
In practice
- Use Bedrock distillation for cost-efficient NER.
- Prioritize high-quality bilingual data annotation.
- Apply post-processing for language-specific errors.
Topics
- Named Entity Recognition
- Amazon Bedrock
- Model Distillation
- Bilingual NLP
- Cargo Logistics
- MLOps
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.