Join the staged COVIDcast data into the "covidcast_fb_survey_results" table
The section Inspect the COVIDcast .csv files concluded by showing that the three staging tables, one for each .csv
file, have the same primary key whose columns have the meaning "(survey_date, state)". This means that the data in all of the tables can be joined using this primary key into a single table. It showed, further, that the three "payload" columns in each of the staging tables, which there have the same names—"value", "stderr", and "sample_size"—can be carried over by the join into columns whose names reflect their origin and therefore their ultimate meaning, thus:
VALUE # Will go to MASK_WEARING_PCT, SYMPTOMS_PCT, or CMNTY_SYMPTOMS_PCT.
STDERR # Will go to MASK_WEARING_STDERR, SYMPTOMS_STDERR, or CMNTY_SYMPTOMS_STDERR.
SAMPLE_SIZE # Will go to MASK_WEARING_SAMPLE_SIZE, SYMPTOMS_SAMPLE_SIZE, or CMNTY_SYMPTOMS_SAMPLE_SIZE.
Here is the statement that creates the table:
drop table if exists covidcast_fb_survey_results cascade;
create table covidcast_fb_survey_results(
survey_date date not null,
state text not null,
mask_wearing_pct numeric not null,
mask_wearing_stderr numeric not null,
mask_wearing_sample_size int not null,
symptoms_pct numeric not null,
symptoms_stderr numeric not null,
symptoms_sample_size int not null,
cmnty_symptoms_pct numeric not null,
cmnty_symptoms_stderr numeric not null,
cmnty_symptoms_sample_size int not null,
constraint covidcast_fb_survey_results_pk primary key (state, survey_date),
constraint covidcast_fb_survey_results_chk_mask_wearing_pct check(mask_wearing_pct between 0 and 100),
constraint covidcast_fb_survey_results_chk_symptoms_pct check(symptoms_pct between 0 and 100),
constraint covidcast_fb_survey_results_chk_cmnty_symptoms_pct check(cmnty_symptoms_pct between 0 and 100),
constraint covidcast_fb_survey_results_chk_mask_wearing_stderr check(mask_wearing_stderr > 0),
constraint covidcast_fb_survey_results_chk_symptoms_stderr check(symptoms_stderr > 0),
constraint covidcast_fb_survey_results_chk_cmnty_symptoms_stderr check(cmnty_symptoms_stderr > 0),
constraint covidcast_fb_survey_results_chk_mask_wearing_sample_size check(mask_wearing_sample_size > 0),
constraint covidcast_fb_survey_results_chk_symptoms_sample_size check(symptoms_sample_size > 0),
constraint covidcast_fb_survey_results_chk_cmnty_symptoms_sample_size check(cmnty_symptoms_sample_size > 0)
);
The constraints honor the same principle that the use of the "assert_assumptions_ok()" procedure does:
- to check, explicitly, that all of the documented rules about the source data hold so that the results of applying the linear regression analysis can be trusted.
The procedure "xform_to_covidcast_fb_survey_results()" creates and populates the "covidcast_fb_survey_results" table. It uses dynamic SQL so that it can use the names for the staging tables that are defined in the "covidcast_names" table. This is the logic:
declare
mask_wearers_name text not null := (select staging_table from covidcast_names where staging_table = 'mask_wearers');
symptoms_name text not null := (select staging_table from covidcast_names where staging_table = 'symptoms');
cmnty_symptoms_name text not null := (select staging_table from covidcast_names where staging_table = 'cmnty_symptoms');
stmt text not null := '
insert into covidcast_fb_survey_results(
survey_date, state,
mask_wearing_pct, mask_wearing_stderr, mask_wearing_sample_size,
symptoms_pct, symptoms_stderr, symptoms_sample_size,
cmnty_symptoms_pct, cmnty_symptoms_stderr, cmnty_symptoms_sample_size)
select
time_value, geo_value,
m.value, m.stderr, round(m.sample_size),
s.value, s.stderr, round(s.sample_size),
c.value, c.stderr, round(c.sample_size)
from
?1 as m
inner join ?2 as s using (time_value, geo_value)
inner join ?3 as c using (time_value, geo_value)';
begin
execute replace(replace(replace(stmt,
'?1', mask_wearers_name),
'?2', symptoms_name),
'?3', cmnty_symptoms_name);
...
end;
Invoke the join into the final table manually, thus:
call xform_to_covidcast_fb_survey_results();
The ingest-the-data.sql
script creates the procedure "xform_to_covidcast_fb_survey_results()" and then invokes it like this:
do $body$
begin
call assert_assumptions_ok(
start_survey_date => to_date('2020-09-13', 'yyyy-mm-dd'),
end_survey_date => to_date('2020-11-01', 'yyyy-mm-dd'));
call xform_to_covidcast_fb_survey_results();
end;
$body$;
Notice that if "assert_assumptions_ok()" aborts with an assert failure, then "cr_covidcast_fb_survey_results()" will not be called.