rwebdb 03-September-2010
I added two new variables to the warehouse recently. Both refer to the geographic location of institutions of higher education in the IPEDS surveys.
You can get an overview of the new variables (rState and rStateInd) from the metadata in rVariables.xml. This XML also shows a new attribute called documentation that provides a detailed description of each variable, including some documentation on IPEDS data sources.
The emphasis on writing utility XQuery programs continues. Listed below are several examples. This includes utilities to recode IPEDS variables into warehouse variables, to crosswalk two IPEDS variables prior to recoding into warehouse variables, to create metadata for warehouse variables, and to identify additional IPEDS variables for possible inclusion in the warehouse.
I’m again reminded how critical the IPEDS metadata is. Many of the XQuery programs use this metadata (e.g., see list_inconsistent_metadata_1var.xq that helps identify changes in metadata across the 25 IPEDS survey years). Metadata makes writing XQuery easier. And it also helps generalize the queries so that specifics get passed at run-time rather than hard-coded into the queries themselves.
The benefit from metadata and generalized utility programs can be immense. But first you need to hit a critical point where the number of utility tools is sufficient to do most of what needs to happen even in novel data situations. This week I had my first taste of that critical point for this project. It was a good feeling.
Next up are a couple more classificatory variables to add to the warehouse. Once that happens, it’s time to do the first of several external verifications of the warehouse data against published IPEDS data. It’s relatively easy to introduce gotchas into a data warehouse, so verification against independent sources is vital.
Lots yet to do. But very fun to see the warehouse come together.
New
1. Utility program to help recode IPEDS variables to warehouse variables
- list_inconsistent_metadata_1var.xq: Find inconsistencies in variable metadata that may require recoding.
- example_list_inconsistent_metadata_fips.xml: Sample inconsistencies in the IPEDS fips (state) variable that shows the 2002 survey used state abbreviations instead of state names as labels. It also shows that in the 1997 survey, the state Arizona was mis-spelled.
- list_recodes_rState.xml: Recodes used to translate between the IPEDS state variables (fips and geost) and the warehouse variable rState.
2. Utility program to help crosswalk codes in 1 IPEDS variable to the different codes in a second IPEDS variable
- create_initial_xwalk_var1_to_var2.xq: Creates an initial crosswalk between 2 IPEDS variables based on variable labels.
- crosswalks.xml: Crosswalks between the two IPEDS state variables (fips and geost). Needed to produce the warehouse state variable (rState).
3. Utility program to help create metadata for 1 warehouse variable
- create_initial_rVariables_1var.xq: Simplifies the creation of metadata for warehouse variables where there are many code values.
- rVariables.xml: Metadata for warehouse variables. The entries for rState were created initially by the XQuery program and then edited lightly.
4. Utility programs to help identify additional IPEDS variables to include in the warehouse
- list_metadata_1year.xq: Provides a nice overview of the variables in one survey year.
- example_list_metadata_2008.xml: An overview of the variables included in the 2008 IPEDS survey (institutional characteristics only).
- list_vars_by_numYears_1lastYear.xq: List variables by the number of survey years in which they appeared. Those variables used in most survey years may be attractive candidates for the warehouse.
- example_list_vars_by_numYears_lastYear_2008.xml: Sample from the list of 2008 variables sorted by the number of surveys in which the variables appeared.
Updated
1. Warehouse metadata
- rVariables.xml: Note the addition of a documentation attribute that describes the variable in some detail.
2. Data verification programs used whenever variables move from working areas to staging and finally into the warehouse.
