A Novel Framework for Analyzing Economic News Narratives Using GPT-3.5: Data

12 Jun 2024


(1) Deborah Miori, Mathematical Institute, University of Oxford, Oxford, UK and 2Oxford-Man Institute of Quantitative Finance, Oxford, UK (Corresponding author: Deborah Miori, deborah.miori@maths.ox.ac.uk);

(2) Constantin Petrov, Fidelity Investments, London, UK.

Abstract and Intro




Conclusions, Acknowledgements, and References

2 Data

2.1 Corpus of news

We download a tractable corpus of news from Factiva[1] data provider by looking for news written in English, which also belong to the “Economics” section of the Wall Street Journal (WSJ). In this way, we aim at having a set of news that carries a low-noise and focused view on the evolution of news themes that can be of help for financial markets understanding. We consider approximately four years of daily news, i.e. from January 2020 to October 2023, and aggregate them at weekly level. After pre-processing them to a standard format, we achieve a overall dataset of 197 weeks with a total of 21, 590 news, with 110 ± 21 data points per week (i.e. average number of articles and its standard deviation). Importantly, we considered the week ending on 14th March 2021 as an outlier and dropped it, since we could only download three associated articles from Factiva.

2.2 Market dislocations

Our aim is to identify and quantify the evolution of narratives within news, but with the further end goal to unravel consequent relationships to the evolution of financial markets. In particular, we are interested in financial market dislocations, which are often recognised as moments when “financial markets, operating under stressful conditions, experience large, widespread asset mispricings” [13]. However, we decide to adopt more a data-driven definition of market dislocations, and consider them as dates when combined shocks to equity, FX, bond, and macro factor risk premium indices occur, i.e. shocks to all the major asset classes.

We begin by downloading the following four indices from Bloomberg L.P. at a weekly frequency, with the proposed descriptions taken from its interface:

  1. VIX Index - “The VIX Index is a financial benchmark designed to be an up-to-the-minute market estimate of the expected volatility of the S&P 500 Index, and is calculated by using the midpoint of real-time S&P 500 Index option bid/ask quotes.”

  2. JPMVXYEM Index (VIX FX) - “J.P. Morgan Emerging Market Currency Implied Volatility Index.”

  3. MRI CITI Index - “The Citi Macro Risk Index measures risk aversion based on prices of assets that are typically sensitive to risk. A reading above (below) 0.5 means that risk aversion is above (below) average.”

  4. MOVE Index - “The MOVE Index measures U.S. bond market volatility by tracking a basket of Over-theCounter options on U.S. interest rate swaps. The Index tracks implied normal yield volatility of a yield curve weighted basket of at-the-money one month options on the 2-year, 5-year, 10-year, and 30-year constant maturity interest rate swaps.”

Then, we compute the related z-scores for a rolling window ∆T of three months, i.e. 13 weeks if we assume one year to be made of 52 weeks. The z-score is defined as

where in our case i is the current value of the index, µ is its mean over the previous time range ∆T, and σ is the related standard deviation. Intuitively, the z-score shows how many standard deviations above the mean the current outcome is. Figure 1 proposes the dates for which all our four indices have positive z-score. According to the strength of these z-scores, this can imply that broad dislocations across asset classes were witnessed.

Figure 1: Weeks on which our volatility indices have all positive z-scores, with respect to a rolling window of length ∆T = 13 weeks. According to the strength of these z-scores, broad market dislocations can be consequently identified.

This paper is available on arxiv under CC0 1.0 DEED license.

[1] https://www.dowjones.com/professional/factiva/