Finding Similar Companies

November 15, 2023

Multi-asset investor was looking to build a company similarity engine as a foundational building block for origination, competitor analysis, and acquisition strategies.


We leveraged WovenLight’s existing IP in handling multi-modal data, deploying and fine-tuning LLMs based on human feedback tailored to investment activities to accelerate deployment.

We linked a series of datasets and data modalities, fusing textual, tabular and graph data at the embedding and output levels.

We combined multiple methods to build an ensemble approach best suited to process different data modalities:

  • Text using LLMs to calculate company-level embeddings
  • Tabular data by calculating similarities using k-NN
  • Company related topics clustered into representative groups using BERTopic
  • Mapped relationships with graph neural networks


Our Investment Co-Pilot leveraged LLMs, tf-idf and GNNs on a multi-modal dataset to generate summaries of financial, legal docs and competitor analysis to identify, explain and rank similar entities at an accuracy of 88% versus ~5% within Capital IQ.

This delivered the following benefits:

  • Increased speed of private markets deal origination through faster and improved identification of similar companies
  • Reduce time of company diligence through faster and improved comparisons
  • Enhanced testing of active trading strategies in public markets
  • Understanding correlations in public markets based on more ‘precise’ similarity definitions
  • Ranking potential deals through generation of aggregated features, e.g., in venture/growth for ‘Start-up Success’ models