What Is Your Dataset Worth?

Institutional investors want to use your data to train AI to beat the market. Naturally, you want to know: how much is your dataset worth?

To decide what your dataset is worth to a quantitative investor, let's consider what the dataset will be used for: investing. The inputs to a quantitative investment strategy boil down to:

  1. Human capital / talent (e.g. portfolio managers, data scientists, and quantitative engineers)
  2. Compute, and
  3. Data

For the above inputs, value is best measured by the expected increase in investment returns. That's why the best quantitative investors are so well paid!

As such, ignoring talent and compute requirements, a dataset's value add is equal to the incremental expected investment performance generated by the dataset. Improved investment performance is the result of improved forecasts using features / signals / factors engineered from your dataset.

Therefore, to value your dataset, you'd need to:

  1. Engineer features from your dataset that are likely to be directly or indirectly predictive of future security (e.g. stock) returns
  2. Add your features to a group of commonly used features by quantitative investors
  3. Create return (or other) forecasts with features from your dataset. Then simulate investment performance by backtesting portfolios over time based on your forecasts and portfolio construction methodology
  4. Given the AUM and general investment strategy of your prospective clients, calculate the value add of your dataset based on how much simulated investment performance improved with features from your dataset. Then share how you determined the value add of your dataset with your prospective clients (for them to audit / fact-check)

Dataset vendors do not have this capability internally, but we can help!

ForecastOS Prospector - Valuing Datasets

Illuminated offices in Moscow

ForecastOS knows what your data is worth to institutional investors because it was built by them.

ForecastOS was founded (and is advised by) institutional investors and capital markets professionals. The same ones that ideated, built, and maintain InvestOS: open-source software for backtesting and portfolio optimization. Using our backtesting software, +1000 predictive features in our FeatureHub, and our internal ML-powered quantitative investment strategy, we can calculate the value of your dataset!

We value your dataset by taking the following steps:

1. Engineering features from your dataset

Our feature engineers collaborate with you to create the most predictive features from your dataset. This process includes analyzing your dataset, brainstorming sessions, and reviewing relevant academic research.

2. Adding 100 pre-engineered features from our FeatureHub

Our team combines your features with 100 predictive, anonymized features from ForecastOS FeatureHub.

3. Creating forecasts and simulating investment returns

Using our step-forward logic for making time-series return forecasts, we make a series of point in time forecasts for forward returns. We take steps to avoid lookahead bias, apply sensible hyperparameter tuning, and create forecasts from common ML algorithms used for financial forecasting (e.g. XGBoost, LightGBM, etc.).

This allows us to understand how incrementally predictive your dataset is alongside a typical suite of quantitative investment signals.

Next, we simulate (i.e. backtest) the performance of portfolios created with/without features/forecasts from your dataset.

We use many constraint, cost, and risk models to understand how different investment strategies (e.g. long-only, long-extension, market neutral, etc.) would have performed over time with and without your dataset.

Our goal during this step is to understand how incrementally performant a prospective clients investment strategy could be with your dataset.

4. Calculating your dataset's value and sharing our work

Taking into account your data pricing preferences, information about your prospective clients, and forecasted / simulated investment performance, we determine the monetary incremental value your dataset generates.

In our final step, we also share our work.

We send you:

  • A copy of our 100 anonymized features,
  • Samples of (and information about) ML models we used for forecasting returns,
  • The portfolio construction and backtesting code we used to simulate investment returns and value your dataset, and
  • Our detailed Dataset Value Add Study (DVAS); for sharing with your prospective clients

We invite you to audit our assumptions and check our results. You'll understand exactly how we calculated what your data is worth!

Get More Money For Your Data

Fill out our contact form here, or on our landing page at forecastos.com, to set up a call to learn how we can help you value (and get more money for) your data!

Taller than the Trees This image has 98 million views on Unsplash and over 1 million downloads. If you'd like to support me as a creator, please consider sending a donation via Paypal: https://paypal.me/SeanPollockON?country.x=CA&locale.x=en_US

Subscribe to The ForecastOS Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe