142 - Science Of Science, with Kyle Lo
Summary
Kyle Lo, a lead scientist at the Allen Institute for AI (AI2) Semantic Scholar team, discussed the "Science of Science" (SciSci) on the NLP Highlights podcast. SciSci is a computational sociology field that uses large-scale methods to study how science is conducted, including community interactions, idea transmission, publishing, and funding influences. Lo emphasized that SciSci helps inform the development of tools that genuinely benefit scientists and the broader scientific community, rather than just individual users. He highlighted his team's contributions, such as SciBERT for NLP, S2ORK for open data, and PaperMage, which recently won a Best Paper Award at EMNLP 2023, for manipulating scientific documents. The discussion also covered challenges like data access and the slow iterative loop in scientific data analysis, suggesting NLP's role in creating intermediate artifacts and efficient tools.
Key takeaway
For AI scientists and researchers developing tools for the scientific community, understanding SciSci principles is crucial. Your efforts should focus on building systems that address systemic barriers and promote equitable participation in science, rather than solely optimizing for individual scientist convenience. Consider contributing to open data initiatives and developing efficient, scalable NLP tools that shorten the iterative loop of scientific inquiry, ensuring your work fosters broader scientific progress.
Key insights
SciSci uses computational methods to study scientific processes, guiding tool development for community-wide benefit.
Principles
- Understanding science's process informs effective tool building.
- Large teams develop science, small teams deconstruct it.
- Field-specific SciSci studies offer more actionable insights.
Method
SciSci involves rapid iteration: posing a question, analyzing data, implementing solutions, and refining models, often leveraging NLP for information extraction and summarization at scale.
In practice
- Utilize open-access scientific literature datasets like S2ORK.
- Employ tools like PaperMage for structured PDF parsing.
- Use Spectre embeddings for scientific document similarity.
Topics
- Science of Science
- Scientific NLP
- Open Scientific Data
- Large Language Models
- Scientific Document Processing
Best for: AI Scientist, AI Researcher, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NLP Highlights.