Skip to content

Latest commit

 

History

History
307 lines (194 loc) · 16.2 KB

README_en.md

File metadata and controls

307 lines (194 loc) · 16.2 KB

DataZoom: Developed by the Department of Economics at the Pontifical Catholic University of Rio de Janeiro (PUC-Rio)

DataZoom Social Stata:

For the Portuguese Version, click on the Shield below:

pt-br

DataZoom Social Stata datazoom_social is a package that enables compatibility with microdata from surveys conducted by the Brazilian Institute of Geography and Statistics (IBGE). With this package, it is possible to read all household surveys conducted by IBGE: Population Census, National Household Sample Survey (PNAD and Continuous PNAD), Monthly Employment Survey (PME), Consumer Expenditure Survey (POF), and Urban Informal Economy Survey (ECINF).

Contents

  1. Installation
  2. Syntax
  3. Dialog Box
  4. Specific Options for Each Survey
    1. PNS
    2. Demographic Census
    3. Continuous PNAD - Annual Release
    4. Continuous PNAD - Quarterly Release
    5. PNAD
    6. PNAD Covid
    7. New PME
    8. Old PME
    9. ECINF
    10. POF
  5. Auxiliary Programs (Dictionaries)
  6. Program Structure

Installation

Enter the code below in Stata's command line to download and install the DataZoom Social package's files:

net install datazoom_social, from("https://raw.githubusercontent.com/datazoompuc/datazoom_social_stata/master/") force

Syntax

The package's syntax can be summarized as follows:

datazoom_social [, options]

Where the [options] can be:

Options Description
survey(...) Survey selection
source(...) Folder where the raw data is saved
save(...) Folder where you would like to save the dataset generated by Stata
date(...) Reference date of compatibilized data
english To obtain variables in English, add this option at the end of the command
other options Specific options for each survey, as detailed below

Attention: Given the number of surveys and their specifically related options, we recommend using our Dialog Box for data harmonization.

Dialog Box

The package's Dialog Box can be accessed using the following command: db datazoom_social. The following window will open:

Simply navigate through the Dialog Box options to read IBGE microdata.

Specific Options for Each Survey

The options survey(...), source(...), save(...), and date(...) are mandatory. Depending on the selected survey, other options will be required.

PNS

datazoom_social, survey(pns) source(...) save(...) date(...)

The available years for PNS are 2013 and 2019. The PNS function can be used directly. See h datazoom_pns.

Demographic Census

datazoom_social, survey(census) source(...) save(...) date(...) state(...) [ops]

The available years for the Demographic Census are those of 1970, 1980, 1991, 2000, and 2010. In the state option, you can enter the initials of as many states as you want for data reading. For example, for the States of Amapá (AP), Mato Grosso do Sul (MS), and Amazonas (AM), type state(AP MS AM). There are also the following options:

Ops Description
pes Extract data for individuals
fam Extract data for families (available only for the 2000 Census)
dom Extract data for households
both Extract data for individuals and households in the same file
all Extract data for all types of records in the same file
comp Request compatibility of variables to be executed

The Census function can be used directly. See h datazoom_censo.

PNAD Continuous (PNADC) - Annual Release

datazoom_social, survey(pnadcontinua_anual) source(...) save(...) date(...)

This function harmonizes microdata related to a specific year's household visit. The available years for the Annual PNADC are from 2012 to 2023. The function for the Annual PNADC can be used directly. See h datazoom_pnadcont_anual.

Important: Annual Supplementary Surveys are included in their respective visits or quarters. To locate them, visit: https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Anual/Microdados/Visita/PNADC_Pesquisas_Suplementares_Anuais_20230811.pdf

Attention: The source(...) option must contain the specific file path for the microdata.

PNAD Continuous (PNADC) - Quarterly Release

datazoom_social, survey(pnadcontinua) source(...) save(...) date(...) [ops]

The available years for the Quarterly PNADC are from 2012 to 2023. To create the panel, there are the following options:

Ops Description
nid No Identification
idbas Basic identification
idrs Advanced Identification (Ribas and Soares Methodology)

To create the panel, the microdata files for all quarters of the years of interest must be in the source(...) folder. The Quarterly PNADC function can be used directly. See h datazoom_pnadcontinua.

PNAD

datazoom_social, survey(pnad) source(...) save(...) date(...) [ops]

The available years for PNAD are from 1981 to 2015, when it was discontinued. There are also the following options:

Ops Description
pes Extract data for individuals
dom Extract data for households
both Extract data for individuals and households in the same file
ncomp Request that variable compatibility is not executed
comp81 Request that variables be compatible with the 1980s
comp92 Request that variables be compatible with the 1990s (not valid for variables from the 1980s)

The PNAD function can be used directly. See h datazoom_pnad.

PNAD Covid

datazoom_social, survey(pnad_covid) source(...) save(...) date(...)

The available periods for PNAD Covid are from May 2020 to November 2020. The PNAD Covid function can be used directly. See h datazoom_pnad_covid.

New PME

datazoom_social, survey(pmenova) source(...) save(...) date(...) [ops]

The available years for the New PME are from 2002 to 2016. To create the panel, there are the following options:

Ops Description
nid Without identification
idbas Basic identification
idrs Advanced Identification (Ribas and Soares Methodology)

To create the panel, microdata files for all months of the years of interest must be in the source(...) folder. The PME function can be used directly. See h datazoom_pmenova.

Old PME

datazoom_social, survey(pmeantiga) source(...) save(...) date(...) [ops]

The available years for the Old PME are from 1991 to 2001. To create the panel, there are the following options:

Ops Description
nid Without identification
idbas Basic identification
idrs Advanced Identification (Ribas and Soares Methodology)

To create the panel, microdata files for all months of the years of interest must be in the source(...) folder. The Old PME function can be used directly. See h datazoom_pmeantiga.

ECINF

datazoom_social, survey(ecinf) source(...) save(...) date(...) record(...)

ECINF is available for the years 1997 and 2003. The record(...) input must be filled in with the Record Type:

Code Record Type
domicilios Households
moradores Residents
trabrend Work and Income
uecon Economic Unit
pesocup Occupied Persons
indprop Owner
sebrae The Brazilian Micro and Small Business Support Service (Sebrae) (available only for 2003)

The ECINF function can be used directly. See h datazoom_ecinf.

POF

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(...) [ops]

The available years for POF are 1995, 2002, 2008, and 2017. The datatype(...) input must be filled in with the type of data the user wants to extract: Standardized Bases datatype(std), Selected Expenditures datatype(sel), and Record Types datatype(trs). For each chosen Data Type, there are specific options:

  • Standardized Bases:

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(std) identification(...)

identification Description
dom Household
uc Consumption Unit
pess Individual
  • Selected Expenditures:

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(sel) identification(...) list(...)

identification(...) has the same options as the previous topic. Use the Dialog Box to see options for identification(...).

  • Record Types

datazoom_social, survey(pof) source(...) save(...) date(...) datatype(std) registertype(...)

For registertype(...), there are different record types depending on the year, numbered according to the IBGE documentation. Use the Dialog Box to see options.

The POF functions can be used directly. See help datazoom_pof.

Attention: For POF 2017-2018, Standardized Bases and Selected Expenditures are not available.

Auxiliary Programs (Dictionaries)

Most of the package programs encounter original data stored in .txt format, which requires dictionaries – .dct format in Stata – to be read. The result is a volume of dictionaries that exceeds the 100-file limit allowed for a Stata package to be installed. Therefore, individual dictionaries are compressed into a single .dta file, read within each program. Both functions are defined in the file read_compdct.ado.

The first program defined in this file is write_compdct, which can be used as follows: after running the .ado file to define the function, simply use the code:

write_compdct, folder("/folder with dictionaries") saving("/path/dict.dta")

The function then reads all .dct files present in the folder and combines them into the dict.dta file, with each dictionary identified by a variable with its name.

To transform this compressed file back into the original dictionary, we reccomend using the read_compdct program:

read_compdct, compdct("dict.dta") dict_name("original_dict") out("extracted_dict.dct")

which extracts the original_dict from the dict.dta file and saves it as extracted_dict.dct. As an example, see the use of this function in the datazoom_pnadcontinua program:

tempfile dic // Temporary file where the extracted .dct will be saved

findfile dict.dta // Finds the dict.dta file saved by the package installation
                  // in the /ado/ folder and stores the path to it in the r(fn) 
                  //macro.

read_compdct, compdct("`r(fn)'") dict_name("pnadcontinua`lang'") out("`dic'")
  // Reads the compacted dict.dta dictionary, extracts the pnadcontinua 
  // dictionary (or pnadcontinua_en, `lang` is empty or "_en"), and saves the 
  // final file in the tempfile dic, which is used to read the data.

For our internal organization, each folder corresponding to a program stores the dictionaries in the /dct/ sub-folder. All these dictionaries are also stored together in the /dct/ folder directly, which is used to generate the dict.dta file using write_compdct. Note that no .dct files are actually listed in the datazoom_social.pkg file, and therefore, they are not installed on the user's computer. Only the dict.dta file is sent.

Program Structure

The functions of the package generally follow the structure below:

program datazoom_example
syntax, original(string) ... // Main function to be executed by the dialog boxes or datazoom_social

...

load_example, options // Data reading program defined below

...

(Data loading)

treat_example // Any data treatment

...  

save  

end

program load_example

...

end

program treat_example

...

end

Data Loading: As an example, let's look at a section of the PNS program:

program load_pns
syntax, original(str) year(integer) [english]

if "`english'" != "" local lang "_en"

tempfile dic (*)

findfile dict.dta (*)

read_compdct, compdct("`r(fn)'") dict_name("pns`year'`lang'") out("`dic'") (*)

qui infile using `dic', using(PNS_`year'.txt) clear

end

The lines marked with (*) just extract the required dictionary to read the data, as explained in the previous section. Observing the program options, load_pns receives the folder that stores the raw data, the chosen year, and the option for English labels, which causes the pns20xx_en dictionary to be read instead of pns20xx.

The first line inside this function is found in most programs in the package: when the user chooses the english option, a local variable with the same name is saved, storing the string "english". Since all English dictionaries are stored with the _en suffix, this line says that if this english macro is not empty – which only happens when the user chooses this option – the local variable lang is created with the string "_en," which serves as a suffix for the name of the dictionary to be read.

The heart of this load function is in the infile line, which uses the extracted dictionary to read the raw data file and load it into Stata memory.