NLP in PySpark's MLlib - Fake Job Posting Predictions

Indeed.com has hired us to create a system that automatically flags suspicious job postings on its website. Due to the high volume of job postings, their employees do not have the capacity to check every posting, so they would like to prioritize which postings to review before deleting them. Our task is to use the attached dataset with NLP to create an algorithm that automatically flags suspicious posts for review.

Dataset

This dataset contains 18K job descriptions out of which about 800 are fake. The data consist of both textual information and meta-information about the jobs.

Data Source: https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-prediction

The dataset has the following columns with their data types:

Column Name	Description
job_id	Unique identifier for each job posting
title	Job title
location	Location of the job
department	Department of the company
salary_range	Salary range of the job
company_profile	Description of the company
description	Description of the job
requirements	Requirements for the job
benefits	Benefits offered by the company
telecommuting	Whether the job allows telecommuting or not
has_company_logo	Whether the company has a logo or not
has_questions	Whether the job has questions for applicants or not
employment_type	Type of employment (full-time, part-time, etc.)
required_experience	Required experience for the job
required_education	Required education for the job
industry	Industry of the company
function	Function of the job
fraudulent	Whether the job posting is fraudulent or not

Prerequisites

Before running the code, you will need to have the following installed:

PySpark: the Python API for Apache Spark
Jupyter Notebook: an interactive development environment for Python

Usage

To run the code, open the Fake_Job_Posting_Predictions.ipynb file in Jupyter Notebook and execute the cells in order. The notebook contains detailed explanations of each step in the code and the results obtained.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
Fake_Job_Posting_Predictions.ipynb		Fake_Job_Posting_Predictions.ipynb
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP in PySpark's MLlib - Fake Job Posting Predictions

Dataset

Prerequisites

Usage

About

Releases

Packages

Languages

License

aehabV/Indeed-fake-job-posting-prediction

Folders and files

Latest commit

History

Repository files navigation

NLP in PySpark's MLlib - Fake Job Posting Predictions

Dataset

Prerequisites

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages