rwebdb 30-October-2010
On tearing apart programs, queasiness, and reassembling them, all the while testing and re-testing for query performance changes. Along with lots and lots of verification that no logic errors get introduced in the process.
On tearing apart programs, queasiness, and reassembling them, all the while testing and re-testing for query performance changes. Along with lots and lots of verification that no logic errors get introduced in the process.
A boatload of new enrollment variables, including the dreaded full-time equivalent enrollments. Gratifying results.
Thoughts on application development, the use of rwebdb warehouse data, and story-telling with data and data visualizations.
On extended data verification, resolution of data discrepancies, candidate variables for warehouse inclusion, and listening to gut instincts.
Enrollment data got added to the warehouse. Bulk loads, query performance issues, procedural changes for staging data, how to handle edits of source data, an impervious data anomaly, new documentation, new utility programs, an effort to improve eventual analysis with the time series, and on an on. It’s been an intense but productive two weeks.
NCES recently released some of the 2009 IPEDS survey files. I added the 2009 institutional directory file to the warehouse, but in the process I also dealt with documentation issues, internal organization of data files and program files, ways to synchronize the warehouse and the IPEDS release files, initial steps toward automation of some warehouse procedures, and even some way-too-early considerations of a public domain open data license and the benefits of using github for parts of this project. Very fun.
Two warehouse design principles, three verifications of data against published external sources, one new warehouse variable, two design traps, an issue related to the timing for release of public IPEDS data sets, and source synchronization as the next step.
So far, so good. The initial build stage is complete. Six new variables were added to the warehouse. An initial verification of warehouse data was completed against sources published by the National Center for Education Statistics. Several new items to do as part of the second build phase in this project.
The benefit from metadata and generalized utility programs can be immense. But first you need to hit a critical point where the number of utility tools is sufficient to do most of what needs to happen even in novel data situations. This week I had my first taste of that critical point for this project. It was a good feeling.
Many new items: an operating system, xquery upgrade, greater flexibility in the warehouse population, more generalized move procedures, a more robust data verification process, improved program performance, and a nifty new way to translate IPEDS variables into warehouse variables.