DataZoom: Developed by the Department of Economics at the Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

DataZoom Social Stata:

For the Portuguese Version, click on the Shield below:

DataZoom Social Stata datazoom_social is a package that enables compatibility with microdata from surveys conducted by the Brazilian Institute of Geography and Statistics (IBGE). With this package, it is possible to read all household surveys conducted by IBGE: Population Census, National Household Sample Survey (PNAD and Continuous PNAD), Monthly Employment Survey (PME), Consumer Expenditure Survey (POF), and Urban Informal Economy Survey (ECINF).

Installation

Enter the code below in Stata's command line to download and install the DataZoom Social package's files:

net install datazoom_social, from("https://raw.githubusercontent.com/datazoompuc/datazoom_social_stata/master/") force

Syntax

The package's syntax can be summarized as follows:

datazoom_social [, options]

Where the [options] can be:

Options	Description
`survey(...)`	Survey selection
`source(...)`	Folder where the raw data is saved
`save(...)`	Folder where you would like to save the dataset generated by Stata
`date(...)`	Reference date of compatibilized data
english	To obtain variables in English, add this option at the end of the command
other options	Specific options for each survey, as detailed below

Attention: Given the number of surveys and their specifically related options, we recommend using our Dialog Box for data harmonization.

Dialog Box

The package's Dialog Box can be accessed using the following command: db datazoom_social. The following window will open:

Simply navigate through the Dialog Box options to read IBGE microdata.

Specific Options for Each Survey

The options survey(...), source(...), save(...), and date(...) are mandatory. Depending on the selected survey, other options will be required.

PNS

datazoom_social, survey(pns) source(...) save(...) date(...)

The available years for PNS are 2013 and 2019. The PNS function can be used directly. See h datazoom_pns.

Demographic Census

datazoom_social, survey(census) source(...) save(...) date(...) state(...) [ops]

The available years for the Demographic Census are those of 1970, 1980, 1991, 2000, and 2010. In the state option, you can enter the initials of as many states as you want for data reading. For example, for the States of Amapá (AP), Mato Grosso do Sul (MS), and Amazonas (AM), type state(AP MS AM). There are also the following options:

Ops	Description
`pes`	Extract data for individuals
`fam`	Extract data for families (available only for the 2000 Census)
`dom`	Extract data for households
`both`	Extract data for individuals and households in the same file
`all`	Extract data for all types of records in the same file
`comp`	Request compatibility of variables to be executed

The Census function can be used directly. See h datazoom_censo.

PNAD Continuous (PNADC) - Annual Release

datazoom_social, survey(pnadcontinua_anual) source(...) save(...) date(...)

This function harmonizes microdata related to a specific year's household visit. The available years for the Annual PNADC are from 2012 to 2023. The function for the Annual PNADC can be used directly. See h datazoom_pnadcont_anual.

Important: Annual Supplementary Surveys are included in their respective visits or quarters. To locate them, visit: https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Anual/Microdados/Visita/PNADC_Pesquisas_Suplementares_Anuais_20230811.pdf

Attention: The source(...) option must contain the specific file path for the microdata.

PNAD Continuous (PNADC) - Quarterly Release

datazoom_social, survey(pnadcontinua) source(...) save(...) date(...) [ops]

The available years for the Quarterly PNADC are from 2012 to 2023. To create the panel, there are the following options:

Ops	Description
`nid`	No Identification
`idbas`	Basic identification
`idrs`	Advanced Identification (Ribas and Soares Methodology)

To create the panel, the microdata files for all quarters of the years of interest must be in the source(...) folder. The Quarterly PNADC function can be used directly. See h datazoom_pnadcontinua.

PNAD

datazoom_social, survey(pnad) source(...) save(...) date(...) [ops]

The available years for PNAD are from 1981 to 2015, when it was discontinued. There are also the following options:

Ops	Description
`pes`	Extract data for individuals
`dom`	Extract data for households
`both`	Extract data for individuals and households in the same file
`ncomp`	Request that variable compatibility is not executed
`comp81`	Request that variables be compatible with the 1980s
`comp92`	Request that variables be compatible with the 1990s (not valid for variables from the 1980s)

The PNAD function can be used directly. See h datazoom_pnad.

PNAD Covid

datazoom_social, survey(pnad_covid) source(...) save(...) date(...)

The available periods for PNAD Covid are from May 2020 to November 2020. The PNAD Covid function can be used directly. See h datazoom_pnad_covid.

New PME

datazoom_social, survey(pmenova) source(...) save(...) date(...) [ops]

The available years for the New PME are from 2002 to 2016. To create the panel, there are the following options:

Ops	Description
`nid`	Without identification
`idbas`	Basic identification
`idrs`	Advanced Identification (Ribas and Soares Methodology)

To create the panel, microdata files for all months of the years of interest must be in the source(...) folder. The PME function can be used directly. See h datazoom_pmenova.

Old PME

datazoom_social, survey(pmeantiga) source(...) save(...) date(...) [ops]

The available years for the Old PME are from 1991 to 2001. To create the panel, there are the following options:

Ops	Description
`nid`	Without identification
`idbas`	Basic identification
`idrs`	Advanced Identification (Ribas and Soares Methodology)

To create the panel, microdata files for all months of the years of interest must be in the source(...) folder. The Old PME function can be used directly. See h datazoom_pmeantiga.

ECINF

datazoom_social, survey(ecinf) source(...) save(...) date(...) record(...)

ECINF is available for the years 1997 and 2003. The record(...) input must be filled in with the Record Type:

Code	Record Type
`domicilios`	Households
`moradores`	Residents
`trabrend`	Work and Income
`uecon`	Economic Unit
`pesocup`	Occupied Persons
`indprop`	Owner
`sebrae`	The Brazilian Micro and Small Business Support Service (Sebrae) (available only for 2003)

The ECINF function can be used directly. See h datazoom_ecinf.

POF

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(...) [ops]

The available years for POF are 1995, 2002, 2008, and 2017. The datatype(...) input must be filled in with the type of data the user wants to extract: Standardized Bases datatype(std), Selected Expenditures datatype(sel), and Record Types datatype(trs). For each chosen Data Type, there are specific options:

Standardized Bases:

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(std) identification(...)

`identification`	Description
`dom`	Household
`uc`	Consumption Unit
`pess`	Individual

Selected Expenditures:

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(sel) identification(...) list(...)

identification(...) has the same options as the previous topic. Use the Dialog Box to see options for identification(...).

Record Types

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(std) registertype(...)

For registertype(...), there are different record types depending on the year, numbered according to the IBGE documentation. Use the Dialog Box to see options.

The POF functions can be used directly. See help datazoom_pof.

Attention: For POF 2017-2018, Standardized Bases and Selected Expenditures are not available.

Auxiliary Programs (Dictionaries)

Most of the package programs encounter original data stored in .txt format, which requires dictionaries – .dct format in Stata – to be read. The result is a volume of dictionaries that exceeds the 100-file limit allowed for a Stata package to be installed. Therefore, individual dictionaries are compressed into a single .dta file, read within each program. Both functions are defined in the file read_compdct.ado.

The first program defined in this file is write_compdct, which can be used as follows: after running the .ado file to define the function, simply use the code:

write_compdct, folder("/folder with dictionaries") saving("/path/dict.dta")

The function then reads all .dct files present in the folder and combines them into the dict.dta file, with each dictionary identified by a variable with its name.

To transform this compressed file back into the original dictionary, we reccomend using the read_compdct program:

read_compdct, compdct("dict.dta") dict_name("original_dict") out("extracted_dict.dct")

which extracts the original_dict from the dict.dta file and saves it as extracted_dict.dct. As an example, see the use of this function in the datazoom_pnadcontinua program:

tempfile dic // Temporary file where the extracted .dct will be saved

findfile dict.dta // Finds the dict.dta file saved by the package installation
                  // in the /ado/ folder and stores the path to it in the r(fn) 
                  //macro.

read_compdct, compdct("`r(fn)'") dict_name("pnadcontinua`lang'") out("`dic'")
  // Reads the compacted dict.dta dictionary, extracts the pnadcontinua 
  // dictionary (or pnadcontinua_en, `lang` is empty or "_en"), and saves the 
  // final file in the tempfile dic, which is used to read the data.

For our internal organization, each folder corresponding to a program stores the dictionaries in the /dct/ sub-folder. All these dictionaries are also stored together in the /dct/ folder directly, which is used to generate the dict.dta file using write_compdct. Note that no .dct files are actually listed in the datazoom_social.pkg file, and therefore, they are not installed on the user's computer. Only the dict.dta file is sent.

Program Structure

The functions of the package generally follow the structure below:

program datazoom_example
syntax, original(string) ... // Main function to be executed by the dialog boxes or datazoom_social

...

load_example, options // Data reading program defined below

...

(Data loading)

treat_example // Any data treatment

...  

save  

end

program load_example

...

end

program treat_example

...

end

Data Loading: As an example, let's look at a section of the PNS program:

program load_pns
syntax, original(str) year(integer) [english]

if "`english'" != "" local lang "_en"

tempfile dic (*)

findfile dict.dta (*)

read_compdct, compdct("`r(fn)'") dict_name("pns`year'`lang'") out("`dic'") (*)

qui infile using `dic', using(PNS_`year'.txt) clear

end

The lines marked with (*) just extract the required dictionary to read the data, as explained in the previous section. Observing the program options, load_pns receives the folder that stores the raw data, the chosen year, and the option for English labels, which causes the pns20xx_en dictionary to be read instead of pns20xx.

The first line inside this function is found in most programs in the package: when the user chooses the english option, a local variable with the same name is saved, storing the string "english". Since all English dictionaries are stored with the _en suffix, this line says that if this english macro is not empty – which only happens when the user chooses this option – the local variable lang is created with the string "_en," which serves as a suffix for the name of the dictionary to be read.

The heart of this load function is in the infile line, which uses the extracted dictionary to read the raw data file and load it into Stata memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

DataZoom Social Stata:

Contents

Installation

Syntax

Dialog Box

Specific Options for Each Survey

PNS

Demographic Census

PNAD Continuous (PNADC) - Annual Release

PNAD Continuous (PNADC) - Quarterly Release

PNAD

PNAD Covid

New PME

Old PME

ECINF

POF

Auxiliary Programs (Dictionaries)

Program Structure

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

DataZoom Social Stata:

Contents

Installation

Syntax

Dialog Box

Specific Options for Each Survey

PNS

Demographic Census

PNAD Continuous (PNADC) - Annual Release

PNAD Continuous (PNADC) - Quarterly Release

PNAD

PNAD Covid

New PME

Old PME

ECINF

POF

Auxiliary Programs (Dictionaries)

Program Structure