Skip to content

Py DS_Engineer Lab Report #08

Amy Lin edited this page Jul 29, 2017 · 3 revisions

Python Programming for Data Scientists & Engineers Lab #08

Lab #08 Baby Name Generator using Markov Chains

Popular Baby Names 2016 National Data from Social Security is used as the data source.

Girls and boys names are separates into different groups then breakdown into characters.

Occurrences of each character is counted then characters and counts are parsed into two lists. These two lists are parameters to generate a random series of characters weighted by their occurrences. Meaning, if a letter shows up more in a name, then it has a higher chance of getting selected.

nltk package is used to check if the word exists or not. If not, the program will pick another list of letters until the word is a real word.

User will be asked to enter if they want a boy or girl name and how many characters they want it to be.

For the last part of the code that I commented out, I transform the data into JSON format so I can see how often a name generated under Markov Chains' shows up. Since there's a time constraint, it's yet to be finished down the road.