(: file: create_stgPop_1year.xq date: 26-August-2010 author: Gary Lewis purpose: For a specified survey year, this XQuery combines the minimized IPEDS data files for that year. notes: 1. As each new variable gets moved to staging and the warehouse, this XQuery should be revised so that it is consistent with create_min_stgPop_dir_1year.xq, create_min_stgPop_enrl_1year.xq, etc. 2. The minimized IPEDS data files should be created before running this XQuery. To do this: Run dyn_create_min_stgPop_all_1year.xq to create the shell script called create_min_stgPop_all_1year.sh. Execute this shell script to create minimized IPEDS data files for each survey year. The files will reside in staging (~/rwebdb/projects/ipeds/xml/staging). The word "minimized" means that the files only include variables headed for the warehouse. As a result, the files are considerably smaller in size than the original IPEDS data files. This improves XQuery performance. usage: Program gets run in create_stgPop_all_1year.sh. example: zorba -e year:="2008" -f -q create_stgPop_1year.xq -o ~/rwebdb/projects/ipeds/xml/staging/stgPop_2008.xml -z indent=yes revision history: date: 21-October-2010 revised: 1. Added enrollment counts based on LINE variable in IPEDS enrollment data files. Line 8: FT undergraduates Line 9: FT first-professional Line 11: FT graduate Line 14: Total FT (sum of lines 8, 9, 11) Line 22: PT undergraduates Line 23: PT first-professional Line 25: PT graduate Line 28: Total PT (sum of lines 22, 23, 25) Line 29: Grand total (sum of lines 14 and 28) 2. Added function local:calcFTE to calculate FTE (full-time equivalent) of part-time enrollments. This function was created using an IPEDS algorithm for converting part-time enrollments to full-time equivalents. See http://nces.ed.gov/ipeds/glossary/index.asp?id=854. NOTE: The IPEDS algorithm does not cover all data situations that appear in the IPEDS enrollment data files. Where this occurred, I used 0.333333 to convert part-time enrollments to full-time equivalent enrollments. 3. Added several FTE variables: fteUG: FTE undergraduate enrollment fte1P: FTE first-professional enrollment fteGR: FTE graduate enrollment ftePT: FTE equivalent of part-time students fteTotal: Total FTE enrollment (sum of fteUG, fte1P, fteGR) date: 25-October-2010 revised: 1. Some survey years (eg, 1986, 1987, and 1989) include only LINE=29 for some institutions. More generally, if the sum of the enrollments in the relevant component LINEs do not equal the enrollment in LINE=29 (grand total), then the flag enrlConsistentInd was set to false (0). Specifically, enrlConsistentInd = 0 if enrl29 != (enrl8 + enrl9 + enrl11 + enrl22 + enrl23 + enrl25 + enrlFTun + enrlPTun). 2. Note: enrlFTun and enrlPTun refer to the unclassified enrollments that appear in surveys prior to 1986. date: 28-October-2010 revised: 1. Made consistent with the revisions in create_min_stgPop_dir_1year.xq that improve performance down the line in create_stg2whPop.xq. See revision history in create_min_stgPop_dir_1year.xq for details. date: 08-November-2010 revised: 1. Added third IPEDS data file (student charges). First file was dir (directory). Second file was enrl (enrollments). This file is chg (student charges). 2. Added function calcTFug() to calculate undergraduate full-time tuition and fees based on institutional control, level, and survey year. date: 10-November-2010 revised: 1. Added $xquery and $datetime. :) declare variable $xquery := "create_stgPop_1year.xq"; declare variable $datetime := format-dateTime(current-dateTime(), "[D]-[MNn]-[Y] [H01]:[m01][Pn] [ZN]"); declare variable $year external; declare function local:calcFTE( $control as xs:integer?, (: institution control: 1=public, 2=private not-for-profit, 3=private for-profit :) $iclevel as xs:integer?, (: institution level: 1=4-year, 2=2-year, 3=less-than 2-year :) $enrlLevel as xs:string?, (: enrollment level: ug=undergraduate, 1p=first-professional, gr=graduate, un=unclassified :) $enrlPT as xs:integer?) as xs:integer? (: part-time enrollment that will be converted to full-time equivalant enrollment :) (: Note: Each time 0.333333 appears in this function, it deals with data situations not covered by the IPEDS algorithm :) { let $fte := if ($enrlLevel eq "ug") then ( if ($control = 1 and $iclevel = 1) then ($enrlPT * 0.403543) else ( if ($control = 1 and $iclevel = (2,3)) then ($enrlPT * 0.335737) else ( if ($control = (2,3) and $iclevel = 1) then ($enrlPT * 0.392857) else ($enrlPT * 0.397058)))) else ( if ($enrlLevel eq "1p") then ( if ($control = 1 and $iclevel = 1) then ($enrlPT * 0.600) else ( if ($control = (2,3) and $iclevel = 1) then ($enrlPT * 0.545454) else ($enrlPT * 0.333333))) else ( if ($enrlLevel eq "gr") then ( if ($control = 1 and $iclevel = 1) then ($enrlPT * 0.361702) else ( if ($control = (2,3) and $iclevel = 1) then ($enrlPT * 0.382059) else ($enrlPT * 0.333333))) else ( if ($enrlLevel eq "un") then ($enrlPT * 0.333333) else 0))) return xs:integer(round($fte)) }; declare function local:calcTF( $control as xs:integer?, $level as xs:integer?, $tuitionInDistrict as xs:integer?, $feeInDistrict as xs:integer?, $tuitionInState as xs:integer?, $feeInState as xs:integer?, $publicTF as xs:integer?, $privateTF as xs:integer?, $year as xs:integer?) as xs:integer? { let $tuitFee := if ($control = (1,2,3)) then ( if ($level = (1,2,3)) then ( if ($year >= 2000) then ( if ($control = 1 and $level = (2,3)) (: public 2-year or less-than 2-year :) then (if ($tuitionInDistrict = -9 or $feeInDistrict = -9) then () else ($tuitionInDistrict + $feeInDistrict)) else (if ($tuitionInState = -9 or $feeInState= -9) then () else ($tuitionInState + $feeInState))) else ( if ($year >= 1987 and $year <= 1998) then ( (: in these years $tuition var contained tuition and fees :) if ($control = 1 and $level = (2,3)) then (if ($tuitionInDistrict = -9) then () else $tuitionInDistrict) else (if ($tuitionInState = -9) then () else $tuitionInState)) else ( if ($year = (1980, 1984, 1985, 1986)) then ( if ($control = 1) then (if ($publicTF = -9) then () else $publicTF) else (if ($privateTF = -9) then () else $privateTF)) else () )) ) else () ) else () return $tuitFee }; { let $xYear := xs:integer($year), $staging := "/home/gml/rwebdb/projects/ipeds/xml/staging/", $dirFile := concat("min_stgPop_dir_", $year, ".xml"), $dirFileLoc := concat($staging, $dirFile), $enrlFile := concat("min_stgPop_enrl_", $year, ".xml"), $enrlFileLoc := concat($staging, $enrlFile), $enrlFileExists := try {doc($enrlFileLoc)} catch err:FODC0002 {"File not found"}, $enrlFile := if ($enrlFileExists eq "File not found") then "" else $enrlFile, $chgFile := concat("min_stgPop_chg_", $year, ".xml"), $chgFileLoc := concat($staging, $chgFile), $chgFileExists := try {doc($chgFileLoc)} catch err:FODC0002 {"File not found"}, $chgFile := if ($chgFileExists eq "File not found") then "" else $chgFile return { let $enrlTotals := if ($enrlFileExists eq "File not found") then () else doc($enrlFileLoc)/ipeds/survey/inst, $charges := if ($chgFileExists eq "File not found") then () else doc($chgFileLoc)/ipeds/survey/inst for $i in doc($dirFileLoc)/ipeds/survey/inst let $unitid := $i/@unitid, $instnm := $i/@instnm, $rControl := $i/@rControl, $xControl := if (string($rControl) eq "") then -9 else xs:integer($rControl), $rLevel := $i/@rLevel, $xLevel := if (string($rLevel) eq "") then -9 else xs:integer($rLevel), $rState := $i/@rState, $rStateInd := $i/@rStateInd, $rAdminInd := $i/@rAdminInd, $rDegreeInd := $i/@rDegreeInd, $rTitle4Ind := $i/@rTitle4Ind, $rDESdgInd := $i/@rDESdgInd, $rDESheInd := $i/@rDESheInd, $rDESInd := $i/@rDESInd, $rRegion := $i/@rRegion, $j := $enrlTotals[@unitid = $unitid], $enrl8 := xs:integer($j/@enrl8), $enrl9 := xs:integer($j/@enrl9), $enrl11 := xs:integer($j/@enrl11), $enrl14 := xs:integer($j/@enrl14), $enrl22 := xs:integer($j/@enrl22), $enrl23 := xs:integer($j/@enrl23), $enrl25 := xs:integer($j/@enrl25), $enrl28 := xs:integer($j/@enrl28), $enrl29 := xs:integer($j/@enrl29), $enrlFTug := $enrl8, $enrlFT1p := $enrl9, $enrlFTgr := $enrl11, $enrlFTun := xs:integer($j/@enrlFTun), (: full-time unclassified :) $enrlPTug := $enrl22, $enrlPT1p := $enrl23, $enrlPTgr := $enrl25, $enrlPTun := xs:integer($j/@enrlPTun), (: part-time unclassified :) $enrlFT := ($enrl8 + $enrl9 + $enrl11 + $enrlFTun), $enrlPT := ($enrl22 + $enrl23 + $enrl25 + $enrlPTun), $enrlUG := ($enrl8 + $enrl22), $enrl1P := ($enrl9 + $enrl23), $enrlGR := ($enrl11 + $enrl25), $enrlUN := ($enrlFTun + $enrlPTun), (: total unclassified :) $enrlCheck := ($enrlUG + $enrl1P + $enrlGR + $enrlUN), $enrlConsistentInd := if ($enrlCheck = $enrl29) then 1 else 0, (: enrollment components sum to total ok :) $enrlTotal := $enrl29, $fteFTug := $enrlFTug, $fteFT1p := $enrlFT1p, $fteFTgr := $enrlFTgr, $fteFTun := $enrlFTun, $ftePTug := local:calcFTE($xControl, $xLevel, "ug", xs:integer($enrl22)), $ftePT1p := local:calcFTE($xControl, $xLevel, "1p", xs:integer($enrl23)), $ftePTgr := local:calcFTE($xControl, $xLevel, "gr", xs:integer($enrl25)), $ftePTun := local:calcFTE($xControl, $xLevel, "un", xs:integer($enrlPTun)), $fteFT := ($fteFTug + $fteFT1p + $fteFTgr + $fteFTun), $ftePT := ($ftePTug + $ftePT1p + $ftePTgr + $ftePTun), $fteUG := ($fteFTug + $ftePTug), $xFteUG := if (string($fteUG) eq "") then -9 else xs:integer($fteUG), $fte1P := ($fteFT1p + $ftePT1p), $fteGR := ($fteFTgr + $ftePTgr), $xFteGR := if (string($fteGR) eq "") then -9 else xs:integer($fteGR), $fteUN := ($fteFTun + $ftePTun), $fteTotal := ($fteUG + $fte1P + $fteGR + $fteUN), $k := $charges[@unitid = $unitid], $tuition1 := $k/@tuition1, $fee1 := $k/@fee1, $tuition2 := $k/@tuition2, $fee2 := $k/@fee2, $tuition5 := $k/@tuition5, $fee5 := $k/@fee5, $tuition6 := $k/@tuition6, $fee6 := $k/@fee6, $pbtfgi := $k/@pbtfgi, $pbtfui := $k/@pbtfui, $prtfg := $k/@prtfg, $prtfu := $k/@prtfu, $xTuition1 := if (string($tuition1) eq "") then -9 else xs:integer($tuition1), $xFee1 := if (string($fee1) eq "") then -9 else xs:integer($fee1), $xTuition2 := if (string($tuition2) eq "") then -9 else xs:integer($tuition2), $xFee2 := if (string($fee2) eq "") then -9 else xs:integer($fee2), $xTuition5 := if (string($tuition5) eq "") then -9 else xs:integer($tuition5), $xFee5 := if (string($fee5) eq "") then -9 else xs:integer($fee5), $xTuition6 := if (string($tuition6) eq "") then -9 else xs:integer($tuition6), $xFee6 := if (string($fee6) eq "") then -9 else xs:integer($fee6), $xPbtfgi := if (string($pbtfgi) eq "") then -9 else xs:integer($pbtfgi), $xPbtfui := if (string($pbtfui) eq "") then -9 else xs:integer($pbtfui), $xPrtfg := if (string($prtfg) eq "") then -9 else xs:integer($prtfg), $xPrtfu := if (string($prtfu) eq "") then -9 else xs:integer($prtfu), $ugTF := local:calcTF($xControl, $xLevel, $xTuition1, $xFee1, $xTuition2, $xFee2, $xPbtfui, $xPrtfu, $xYear), $xUGtf := if (string($ugTF) eq "") then -9 else xs:integer($ugTF), $grTF := local:calcTF($xControl, $xLevel, -9, -9, $xTuition6, $xFee6, $xPbtfgi, $xPrtfg, $xYear), (: in-state prices only :) $xGRtf := if (string($grTF) eq "") then -9 else xs:integer($grTF), $wgtUGtf := if ($xFteUG > 0 and $xUGtf > 0) then ($xFteUG * $xUGtf) else (), $wgtGRtf := if ($xFteGR > 0 and $xGRtf > 0) then ($xFteGR * $xGRtf) else () return } }