The Similarity Index quantifies the textual differences between a given company's annual or quarterly filings on an "as disclosed" basis. For example, a similarity score is calculated by comparing a company's 2017 10-K with the 2016 10-K.
Intuitively, firms breaking from routine phrasing and content in mandatory disclosures give clues about their future performance which eventually drive stock returns over time. This data set captures significant changes in disclosure texts in the form of low similarity scores.
Academic research has shown that a portfolio that shorts low similarity scores and longs high similarity scores earns non-trivial and uncorrelated returns over a period of 12-18 months.