The "A subset of about 1700 labelled email messages" dataset from https://bailando.berkeley.edu/enron_email.html was used for this works. This dataset consists of around 1700 labelled emails. Emails were labelled into 8 different categories. All these labelled emails are contained in 8 different folders which correspond to 8 different labels. Inside each of these folders, there is .cats file present for each email documented into .txt file. Format of each line in .cats file is such that first number, second number and third number correspond to top-level category, second-level category and frequency with which the top-category has been assigned to the mail.
The focus for this work it to classify emails for following classes:
1 Company Business, Strategy, etc
2 Purely Personal
3 Personal but in professional context
4 Logistic Arrangements
5 Employment arrangements
6 Document editing/checking