Truenews – A.I Augmented fake news detection

Guide to Detecting and Fighting Fake News using NLP and Machine learning


Natural Language Processing (NLP ) techniques are being used to generate fake articles – a concept called “Neural Fake News” similar to machine learning technique that generates fake videos mimicking famous personalities. Neural fake news (fake news generated by AI) can be a huge issue for our society and it has been doing a lot of harm. Other than that factual fake news is also a widespread issue that even the world’s leading governments and agencies are trying to combat it in their own way.

Existing popular models for detecting fake news


Giant Language model Test Room or GLTR is a tool designed by the great folks at HarvardNLP and the MIT-IBM Watson lab.

2. GPT-2 Detector Model

The GPT-2 detector model is a RoBERTa (a variant of BERT) model that has been fine-tuned to predict whether a given piece of text has been generated by using GPT-2 or not (as a simple classification problem).

RoBERTa is a large language model developed by Facebook AI Research as an improvement over Google’s BERT. That’s why there are large similarities between the two frameworks.

3. Grover (AllenNLP)

Grover by AllenNLP is my favorite tool out of all the options that we have discussed in this article. It is able to identify a piece of text as fake that has been generated by a plethora of Language Models of multiple kinds, unlike GLTR and GPT-2 detector models that were limited to particular models.

4. Model Trained On NYT & The Guardian

This project contains scraped news from NYT API and The Guardian API to have a data set labeled as real news. Whereas, the fake news dataset has been downloaded from kaggle.com. There are 12,000 fake news articles from kaggle.com and 43,000 real news. Real and fake news articles had to be in certain topics and the creators have decided to use: “US News,” “Politics,” “Business,” and “World,” assuming that most fake news would be from these topics.

Check the full code here.

5. Fake News Detection On Twitter Dataset

For this project, a multi-modal feature extractor was used, which extracts the textual and visual features from posts. For this project, adversarial neural networks are implemented, and the feature extractor cooperates with the fake news detector to learn how to detect the key features of fake news. The discriminator network removes the event-specific features and keeps shared features among events. For this project, multimedia datasets from Weibo and Twitter were used.

Check the code here.

BeingDatum TrueNews

We at Beingdatum have come up with a unique well trained model in the form of True News to counter fake news using following steps.

1.Data Gathering

We downloaded data from various Kaggle competitions and Github repos including few mentioned above related to fake news and downloaded them to merge with a scrapped dataset using below custom code for websites like google news, opindia and indiatimes.

Check the code here.

2. Model training

We used all NLP techniques to prepare the data be it regex cleaner, tokenizer, lemmatizer, td idf and count vectorizer.  Then we modelled various models including Logistic regression and XG boost a including doing a lot of hyperparameter tuning using GridSearch CV and some custom code as well.

Check the code here.

3. Pipeline Creation

Using the above trained model , model is saved in form of Scikit pipeline.

Check the code here.

4. Model deployment

Flask deployment with a basic UI creation is done.

Check the demo here.

Limitations of Current Fake News Detection Techniques

The main caveat of the study is that the existing approach that methods like GLTR, Grover etc. use to detect neural fake news is incomplete because just finding whether a piece of text is “machine-generated” or not is not enough, there can be a legitimate piece of news that’s machine-generated with the help of tools like auto-completion, text summarization etc.

One step in the direction of dealing with the issue of neural fake news was when Cambridge University and Amazon released FEVER last year, which is the world’s largest dataset for fact-checking and can be used to train neural networks to detect fake news.

But the above approach also has limitations for actual accuracy being 56% only.

True News Augmented A.I and next steps

True News will be an all in one news chatbot app which will give users one stop news based on following features using A.I :

  • Fact checking on any news and social media
  • Summarized news
  • Context based relevant news

True news will be providing this chatbot feature with following features:

  1. Send news/social media text/ URL to get fake probability in real time
  2. Tightly integrated with Facebook, Whatsapp, Twitter etc
  3. Highly accurate news checking
  4. Submitting fake news data to GoI for further action

True news will be providing this chatbot feature with following features:

  1. News article summary generated by A.I
  2. Relevant Image extracted by A.I
  3. Supports keyword based search 
  4. News Recommendations based on location, language, age, importance and many other parameters

Due to the current limitations, Beingdatum true news has come up with a hybrid approach where a chatbot like UI will be given and for all probability confidence less than 90% the text will go for manual fact checks with a team of experts in real time. Some chatbot screen grabs can be seen below:

.   .

(Visited 195 times, 1 visits today)
July 23, 2020

0 responses on "Truenews - A.I Augmented fake news detection"

    Leave a Message

    Your email address will not be published. Required fields are marked *