McKenzie Marshall: NLP in Asset Management (Barings)
Summary
McKenzie Marshall of Barings discusses the application of Natural Language Processing (NLP) in active asset management, emphasizing its role in augmenting investment research to generate "alpha" by efficiently processing extensive qualitative data like news and regulatory documents. Barings' approach focuses on assisting analysts, not automating their roles. The productionization of NLP solutions involves three sequential subtasks: identifying companies in documents (a Named Entity Recognition problem), rectifying these entities to internal IDs using algorithmic solutions, and providing an additive ranking metric, initially a sentiment score. Key challenges in NER included training a custom "company" label to filter irrelevant entities like regulatory bodies or platform products used as verbs, and distinguishing company names from general acronyms or product modifiers. Sentiment analysis presented difficulties due to the prevalence of neutral text, the need for polarity conventions, and the variability of journalistic versus regulatory language. Focused annotation management was crucial for specialized, high-value models.
Key takeaway
For Machine Learning Engineers building NLP solutions in finance, prioritize augmenting human analysts over full automation. Your success hinges on meticulously training custom Named Entity Recognition models with focused annotation management. This ensures precise identification of domain-specific entities. Combine NLP outputs with robust rules-based entity rectification. Simplify sentiment scores into clear buckets (good/bad/neutral) to deliver pragmatic, high-value tools that directly support investment research processes.
Key insights
Effective NLP in asset management augments human analysis by precisely identifying and contextualizing company-specific information from diverse text sources.
Principles
- Augment, don't automate, human analytical processes.
- Specialized annotation management drives business value.
- Algorithmic cleaning enhances NLP pipeline pragmatism.
Method
Productionizing NLP for text consumption involves identifying companies via custom NER, rectifying entities to internal IDs with rules-based fuzzy matching, and deriving relevance metrics like sentiment from polar sentences.
In practice
- Train custom NER labels for domain-specific entities.
- Combine NLP with rules-based algorithms for entity resolution.
- Bucket sentiment scores (good/bad/neutral) for clarity.
Topics
- Natural Language Processing
- Asset Management
- Named Entity Recognition
- Sentiment Analysis
- Data Annotation
- Entity Resolution
Best for: NLP Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.