rwebdb 26-July-2010

Back now from a refreshingly restful vacation in northern Maine.

Today I added two more XQuery utility programs that make life easier when managing the IPEDS data files. One program (#1 below) lists all variables for a specified year sorted by prevalence of use in all survey years.

The second program (#3 below) provides an example of a dynamic XQuery program (ie, it produces a second XQuery program as output). In this case running the second XQuery calculates the frequency distribution for a specific variable in a specific year. The example (#5 below) shows the distribution of colleges and universities in the 1984 survey by their state (address) abbreviation.

Both programs help when trying to identify variables to include in the data mart. In particular the dynamic XQuery illustrates a particularly powerful and generalizable technique for creating run-time queries. This is analogous to dynamic SQL in the relational database world. The initial program (in this case dyn_freq_1var_1year.xq) looks obtuse because you’re up a level of abstraction (ie, writing a program to write another program) but once you get started writing it’s an easy enough transition. And the benefits can be substantial in that one program covers a multitude of more specific use cases.

As I said last time, we’re headed toward identifying a tentative population of colleges and universities to include in the data mart. Hopefully I’ll have more on that topic in the next post.

Nice to be back.
 


 
New
1. list_vars_1surveyYear.xq
Lists all IPEDS variables in directory files for a specified survey year. Another utility program helpful when selecting variables for inclusion in the data warehouse.

2. example_list_vars_1surveyYear.txt
A snippet of the output produced by list_vars_1surveyYear.xq when specifying the survey year as 1984.

3. dyn_freq_1var_1year.xq
Dynamically (ie, at run-time) generates a XQuery program which, when run, produces a frequency distribution for 1 specified variable in 1 specified survey year.

4. freq_1var_1year.xq
This program is created by running dyn_freq_1var_1year.xq. In this example, the variable specified was stabbr (state abbreviations) and the survey year was 1984.

5. example_freq_stabbr_1984.txt
Example of a frequency distribution produced by running freq_1var_1year.xq.

Updated
1. ipedsData.pl
Added year attribute to the output file ipedsData.xml.