#JSM2018 Standing room only for this session. With Jorg Dreschler, Raghu, Rod Little, and Don Rubin as presenters, easy to understand why
#JSM2018 Jorg Dreschler up first. Classification systems for industries change frequently, no consistent coding over time. Treat problem as missing data problem.
#JSM2018 Dreschler Has Establishment History Panel from all establishments in Germany with 1+ employee covered by social security. Lots of data on each establishment. 4 different classification systems in use. Two systems are jointly observed in one year.
#JSM2018 Dreschler 95% of the work went into data preparation, but the audience doesn’t find that interesting in a talk.
#JSM2018 Dreschler Had imputation by extrapolation, then deterministic imputation, then probabilistic imputation. Imputed years sequentially, using adjacent coding systems and adjacent years.
#JSM2018 Probabilistic imputation using CART used b/c parametric imputation explodes for categorical variables.
#JSM2018 Dreschler The overview - grow large trees for the imputation; can’t fit one tree to the entire dataset. Run separate CART models by industry code so that each block has up to 20 industry category
#JSM2018 Dreschler Compares CART approach with 2 methods used previously at IAB and in the LEHD. Use difference in cell size between imputated and original data. Marginal distribution of overall industry category similar
#JSM2018 Dreschler detailed Industry codes worst for deterministic approach. Some industries never selected for being imputed - imputing the mode doesn’t work. Probabilistic and CART imputation better reflects distributions.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
#JSM2018 Tobias Schmidt Looking at interviewer experience and interview duration
#JSM2018 Schmidt In this survey, duration linked to interviewer salaries.
#JSM2018 Schmidt Looking at interviewer experience over the course of survey and respondent experience within survey and experience over repeated surveys. Looking in particular at experience within panel survey for both Iers and Rs
#JSM2018 Wuyts Interested in within-survey workload. Use call history data and interview time data. Some Measure workload by fixed measures of experience and interview order cumulated over the field period. They use actual number of cases assigned at time t in field period
#JSM2018 Wuyts Use Paradata to create new measures of interview workload, based on sample units assigned on given day
#JSM2018 Rebecca Powell from @RTI_Intl talking about an experiment on Add Health shifting from interviewer administered to self administered survey
#JSM2018 Powell moved to a 55 self-administered survey from 90 minutes interviewer administered. Worried about response burden with this length of self-admin survey. Randomized n=7600 into either full 55 minute survey or 2 modules- one 35 minutes then 20 minutes.
#JSM2018 Powell Could select to continue on the web. In paper, had to first complete module A, then sent module B. Cover letters told about modules in the incentive part, but not up front. $55 incentive total in each condition
#JSM2018 The brilliant Susan Murphy is this year’s Fisher Lecture award recipient!
#JSM2018 Murphy Lab does sequential experimentation in improving health. Some for companies.
#JSM2018 Murphy Experimentation and continual optimization is key. How do we use learning as an experiment is put into the field to improve outcomes for individuals? Mobile interventions are key here. Intervention may be either a push intervention or pull intervention
#JSM2018 Next up Hubert Hamer from NASS talking about NASS Small Area Estimation
#JSM2018 Hamer NASS has Agriculture Loss Coverage County Option program. Payments triggered based on county crop revenue falling below program guarantee. NASS surveys used to make this decision, along with other data
#JSM2018 Hamer Program paid out $3.7 billion on 2016. Small changes can affect payments
#JSM2018 Peter Miller appearing as a Northwestern University emeritus professor, providing comments on the CNSTAT reports
#JSM2018 Miller Survey paradigm vs multiple data source paradigm. Surveys may become irrelevant b/c they are slow, not granular, not nimble, costly, not sustainable
#JSM2018 Miller Multiple Data sources require new: methods, computing resources, privacy protections, training, data quality frameworks. Not cheap. What does this give us?