The Similarity Index quantifies the textual differences between a given company's annual or quarterly filings on an "as disclosed" basis. For example, a similarity score is calculated by comparing a company's 2017 10-K with the 2016 10-K; or a company's 2017 Q3 10-Q compared to the 2016 Q3 10-Q a year ago.
Intuitively, firms breaking from routine phrasing and content in mandatory disclosures give clues about their future performance which eventually drive stock returns over time. This data set captures significant changes in disclosure texts in the form of low similarity scores.
Academic research has shown that a portfolio that shorts low similarity scores and longs high similarity scores earns non-trivial and uncorrelated returns over a period of 12-18 months.