Finding Similar Companies

Pre-Acquisition / Rapid Diagnostics

$2 BI

Nulla vitae elit libero

$4.3 MI

Fusce dapibus tellus

73%

Ipsum libero

Goal

A multi-asset investor sought to build a company similarity engine as a foundational building block for origination, competitor analysis, and acquisition strategies.

Approach

• Leveraged WovenLight’s existing IP to handle multi-modal data, deploying and fine-tuning LLMs based on human feedback tailored to investment activities to accelerate deployment.

• Linked datasets and data modalities, fusing textual, tabular, and graph data at the embedding and output levels.

• Combined multiple methods to build an ensemble approach best suited to process different data modalities:

• Text: Using LLMs to calculate company-level embeddings.

• Tabular data: Calculating similarities using k-NN.

• Company-related topics: Clustered into representative groups using BERTopic.

• Graph neural networks: Mapped relationships.

Interventions

The Investment Co-Pilot utilized LLMs, TF-IDF, and GNNs on a multi-modal dataset to generate summaries of financial, legal, and competitor documents, enabling:

• Identification, explanation, and ranking of similar entities with 88% accuracy (versus ~5% in Capital IQ).

Benefits:

• Increased speed of private markets deal origination through faster and improved identification of similar companies.

• Reduced time for company diligence with faster, more precise comparisons.

• Enhanced testing of active trading strategies in public markets.

• Improved understanding of correlations in public markets based on refined similarity definitions.

• Ranked potential deals through the generation of aggregated features, e.g., for ‘Start-up Success’ models in venture and growth markets.