Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

R - How to calculate mean by decile in a Svydesign object

$
0
0

@marloz wrote:

So, I´m working with ENIGH - Database, which stands for ¨National Survey of Household Income and Expenses¨ in Spanish, this is an exercise conducted by the Mexican government and like most surveys of its kind, it works with Weights.

What I´m trying to do is to calculate the mean, maximum and minimum household income by Decile. In other words What´s the income of each 10%, grouping household base on their income.
To be honest, I haven’t gone that far but this is what I got until now:

  1. I need my svydesign object
  2. Convert that into a table using svytable
  3. Arrange using desc() on my income variable
ENIGH_design <-svydesign(id=~upm, strata=~est_dis, weights=~factor_hog, data = ENIGH)
ENIGH_table <- svytable(ing_cor, ENIGH_design)

Here is where it gets tricky, supposing I have 100 rows, I can’t take the first 10 of them because in reality, when taking weights in mind, the might be 9% or 20% (I´m just throwing numbers) of the actual population.

I could use cut() on my income variable but I would be forgetting about weights and results will only be representative of the sample, not total population.

I think that the best approach would be to use a combination of:

  • mutate() to create a new variable base
  • if() in conjugation with mutate to define on which decile each row falls to
  • group_by() and mean() to calculate what I´m aiming for

This way I will have an extra variable which I could use to calculate whatever I want with whatever other variable I wish to. But again, I haven´t define my groups so it´s pretty much useless.

Thank you for reading. Thank you for your help.

Database available: https://www.inegi.org.mx/programas/enigh/nc/2016/default.html#Datos_abiertos

Here is a glimpse of how my DB looks:

folioviv	foliohog   ubica_geo    est_dis    upm  factor	   ing_cor
100587003	   1	    10010000	   2	  610     180	    22,723
100587004	   1	    10010000	   2	  610     180	    17,920
100587005	   1	    10010000	   2	  610     180	    27,506
100587006	   1	    10010000	   2	  610     180	    56,236
100605201	   1	    10010000	   2	  620     178	    41,587
100605202	   1	    10010000	   2	  620     178	    135,437
100605203	   1	    10010000	   2	  620     178	    62,386
100605205	   1	    10010000	   2	  620     178	    103,502
100605206	   1	    10010000	   2	  620     178	    27,323
100606301	   1	    10010000	   3	  630     223	    68,042
100606302	   1	    10010000	   3	  630     223	    98,537
100606305	   1	    10010000	   3	  630     223	    53,237
100606306	   1	    10010000	   3	  630     223	    132,861
100609801	   1	    10010000	   3	  640     232	    190,033
100609802	   1	    10010000	   3	  640     232	    28,654
100609805	   1	    10010000	   3	  640     232	    74,408
100631401	   1	    10010000	   1	  650     171	    80,761
100711503	   1	    10010000	   1	  770     184	    38,640
100711504	   1	    10010000	   1	  770     184	    81,672

There are many more columns but they aren´t necessary for this exercise.

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles