Where to get Data, & What are the classic NLP Problems

There are various types of text data:

  1. Structured
  2. Semi-Structured
  3. Unstructured


The different kinds of data being:

  1. Dictionaries
  2. Databases 
  3. User Data


Now, let’s move on to our topic “Where to get the data from?”

Well, below are few sources of data:

  1. Linguistic Data Consortium (https://www.ldc.upenn.edu/)
  2. Web Crawling/Scraping
  3. WordNet
  4. https://lionbridge.ai/datasets/the-best-25-datasets-for-natural-language-processing/
  5. API’s: Twitter, Wordnik etc.
  6. University sites & academic communities.


Below are few Classic NLP Problems:

  1. Linguistically-motivated: Segmentation, Tagging, Parsing etc.
  2. Analytical: Classification, Sentiment Analysis
  3. Transformation: Translation, Correction, Generation
  4. Conversation: Question Answering, Dialog etc.
