Activated by the right technology, a corpus of STM research literature becomes a dynamic knowledge base, the collective brain of the scientific community – and a platform for providing high-value applications to end users.
The atoms of information – names of authors and institutions, references to articles, grants, and patents, instances of genes, proteins, circuits, and semiconductors – are the building blocks of discovery and knowledge.
Parity’s AI-driven content analytics platform identifies, extracts, and disambiguates these building block entities. It then infers the network of relationships among the entities, automatically creates information-rich entity profiles, and enables a variety of semantic matching and recommendation applications.
Entity Extraction, Disambiguation, and Profiling
An entity can be anything relevant to a particular application, for example disease names in a healthcare application, patents and grants in a funding recommendation application, or brand names in a marketing application. Our approach identifies entities in semi-structured data or extracts them from free text, and then performs fined-grained disambiguation to yield distinct, computable entities for use in downstream applications and workflows.
Parity’s text analytics pipeline begins with ingestion of semi-structured or free form content, such as scientific metadata and articles...
Entities are then tagged, linked and disambiguated to yield
distinct, computable objects for use in downstream
applications and workflows.
Linked entities are the building blocks of profiles...
The Parity Profiler assembles full profiles of people, institutions, and other entities.
Author disambiguation is challenging even for humans due to shared names, topics, geographies, and other properties. For example there are more than 200,000 articles authored by “Y Wang”.
A key public deployment of Parity’s disambiguation and profiling technology is in Elsevier’s Scopus database, which is the world’s largest academic digital library. Scopus uses Parity technology to disambiguate authors, institutions, and citations. Parity author name disambiguation achieves accuracy of 99% precision and 98% recall, which far exceeds the published results of any other commercial or academic system.
Parity’s linking and tagging technology, along with specialized machine learning, rules-based reasoning, and graph analytics, enables disambiguation accuracy that equals or surpasses that of even trained humans.
Parity’s Org Sense™, is an authoritative knowledge base of the world’s research-producing organizations, and a key enabler of institution and author profiling as well as third-party applications.
Relationships are inferred and a cross-entity
semantic network is derived.
The network is then mined for semantic matching and recommendation applications.
Semantic Matching and Recommendation Applications
Parity’s semantic matching and recommendation engine operates on entity networks to make personalized recommendations – for example articles to readers, products to consumers, or people to people – with highly specific relevance to the users’ tasks.
Parity and third-parties have developed various recommendation solutions on the platform...
- A library collection management application to align journal holdings with researcher needs
- A grant optimizer to recommend funding opportunities to researchers
- A sales intelligence application to recommend leads to vendors of laboratory products
Parity Journal Sense powers Edanz Journal Selector
Built on the Parity recommendation engine, Parity’s Journal Sense™ product is in wide deployment for helping researchers find appropriate journals for their new papers.
In a blinded, roll-back-the-clock test, Journal Sense, using only abstract and title as input, correctly predicted the journal in which manuscripts were actually published 40% of the time – far surpassing the accuracy of competitor systems.
Parity’s network analytics powers funding recommendations
For NSF, links to past winners in a program are predictive of winning future grants in the program.
For NIH, distance-2 connections to grant study section members are predictive of winning the grant.
Parity’s semantic matching enables quantitative library collection management
Parity’s semantic matching engine derives researcher topic term vectors from article data in author profiles; then matches them to journal vectors and computes a ranked list of journals most appropriate for the needs of the faculty.
In a test of 5 categories of relevance with users from 62 institutions, the system achieved 91% Precision, with 71% of the recommended journals in the top 2 ratings groups.