Finding Similar Companies
Pre-Acquisition / Rapid Diagnostics
$2 BI
Nulla vitae elit libero
$4.3 MI
Fusce dapibus tellus
73%
Ipsum libero
Goal
A multi-asset investor sought to build a company similarity engine as a foundational building block for origination, competitor analysis, and acquisition strategies.
Approach
• Leveraged WovenLight’s existing IP to handle multi-modal data, deploying and fine-tuning LLMs based on human feedback tailored to investment activities to accelerate deployment.
• Linked datasets and data modalities, fusing textual, tabular, and graph data at the embedding and output levels.
• Combined multiple methods to build an ensemble approach best suited to process different data modalities:
• Text: Using LLMs to calculate company-level embeddings.
• Tabular data: Calculating similarities using k-NN.
• Company-related topics: Clustered into representative groups using BERTopic.
• Graph neural networks: Mapped relationships.
Interventions
The Investment Co-Pilot utilized LLMs, TF-IDF, and GNNs on a multi-modal dataset to generate summaries of financial, legal, and competitor documents, enabling:
• Identification, explanation, and ranking of similar entities with 88% accuracy (versus ~5% in Capital IQ).
Benefits:
• Increased speed of private markets deal origination through faster and improved identification of similar companies.
• Reduced time for company diligence with faster, more precise comparisons.
• Enhanced testing of active trading strategies in public markets.
• Improved understanding of correlations in public markets based on refined similarity definitions.
• Ranked potential deals through the generation of aggregated features, e.g., for ‘Start-up Success’ models in venture and growth markets.