The Sunlight Foundation's Churnalism is based on the UK site by the same name and is driven by open-source search engine technology dubbed SuperFastMatch. Both the original Churnalism site and SuperFastMatch were developed by the Media Standards Trust. We at the Sunlight Foundation have partnered with the Media Standards Trust to adapt the site for an U.S. audience and to build web browser extensions with similar functionality.
Churnalism searches the text you enter against a large corpus of press releases and determines the best matches. Sometimes, exact fragments will match that are clearly not plagiarism. These often include expanded organizational names (such as The United States House of Representatives) or boilerplate copy about a particular company that is usually appended to the end a press release. In order to filter out uninteresting matches and provide the best user experience, we use a relevancy ranking that is derived from the total character overlap and the density of that overlap. And as always, you can view our source code for this project on the Sunlight Labs Github page.
Our corpus of press releases is growing all the time and will soon be available as a standalone API. Our major sources include: