NATURAL LANGUAGE PROCESSING (NLP)
Synopsis:
- Working with text data
- Operations with text data
- Text vectorization
- Tokenization
- Lemmatization and Stemming
- Working with text libraries such as Spacy and NLTK
- Named Entity Recognition[Text Wrapping Break]Assertion
Resources:
Introduction to NLP using Spacy
Big Data
Synopsis:
- Working with pyspark
- Use join/filter/select/withColumn/groupBy and other spark operations
- File Load and Save in Parquet(or Delta) format
Resources:
- https://sparkbyexamples.com/pyspark/
- https://www.linkedin.com/learning/apache-spark-essential-training/welcome?u=96343874
- https://www.youtube.com/watch?v=_C8kWso4ne4&list=WL&index=3
- Natural Language Processing with Spark NLP by Thomas Alex