@CommonCrawl (commoncrawl.org), 250tb of crawled web pages per month for 7 years, available for free. Tons of documentation and examples. Even better, there is a separate archive of news text (1,000 sources from @DMOZ, en.wikipedia.org/wiki/DMOZ).
#babelnet, multinlingual dictionary and semantic network. Rate limited, but possibly willing to work with academics. babelnet.org/about
everypolitician (everypolitician.org), from @mySociety, is data on national legislators from what appears to be literally every country. Includes email address, social media handles, party, district ID, gender, duration in power.
• • •
Missing some Tweet in this thread? You can try to
force a refresh