+ Reply to Thread
Page 1 of 3
1 2 3 LastLast
Results 1 to 10 of 24

Thread: Analysis on the jump in quality of accepted applicants

  1. #1
    Within my grasp! untitled is a TestMagic guru. Show your respect! untitled is a TestMagic guru. Show your respect!
    Join Date
    Apr 2008
    Posts
    447

    Analysis on the jump in quality of accepted applicants

    There was another thread about the reason for the jump in quality this year. This thread is about the existence of such a jump. Namely, I don't actually see a jump (I wish I did, and wishing makes me see suggestions of it where the data is probably fairly inconclusive). Sorry for the length, it almost would be better to make this a seperate webpage, but if I did so, I'd allow myself to go even longer, and that doesn't seem like a great idea just yet.

    I'm not super good at statistics, can I ask if I am right in my analysis, and if there is anything else I should be doing? Also, am I doing and interpreting the dummy correctly?

    My data set is here, note that the data will change as it is updated to include more results posted and to fix for my own errors. If you would like to use the data, feel free (I haven't formally done so, but I intend to put everything I have under some sort of cc license which would require that I be credited for my work...). I code acceptance as 1, rejection as 0, pending/waitlist as -1, and no data as -99. Everything else should be self explanatory, but feel free to ask. If you do use it, I would appreciate being told how it is being used, and if you learn anything interesting.

    So, i do a logit model where I regress school X acceptance/rejection on GREQ (normalized by subtracting from each score the mean, 788.7), GPA (normalized by subtracting from each score the mean, 3.734) , and two dummy variables, y2008d, and y2009d, based on the year of the application (am I correct that the intercept can not exactly be thought of as the coefficient on y2007d (which isn't included due to perfect multicollinearity, which if I didn't know before, I would have figured out from the R output), and if so, as what can it be thought of). Here is the output for Penn:

    ___________ Estimate _ Std. Error _ z value _ Pr(>|z|)
    (Intercept) _ -0.47513 _ 0.74847 __ -0.635 __ 0.5256
    GREQ ______ 0.01746 __ 0.04917 __ 0.355 ___ 0.7226
    GPA _______ 5.34905 _ 2.77505 ___ 1.928 __ 0.0539 .
    y2008d ____ 0.52080 __ 0.82979 ___ 0.628 __ 0.5302
    y2009d ____ -1.56755 _ 0.87218 __ -1.797 __ 0.0723 .

    y2009d of -1.568 is significant only at the 10% level, which isn't bad for as small of n as we have (44).

    Anyway, can I interpret that -1.568 as being an indicator of how much harder it was to get into Penn this year? Not quite, I think. If I use the "divide by 4 rule" (Gelman talks about this in arm), this means that applying in 2008 increases the chances of the average applicant getting an acceptance to Penn by something less than 13%, while applying in 2009 decreases the chances by something less than 39%. Applying inverse logit, we can calculate the probability the average applicant will get in; in 2007 = invlogit (-.47513) = 38%, in pr (acc to penn in 2008) = invlogit (-.47513 + .522080) = 51%, and pr (acc to penn in 2009) = invlogit (-.47513 + -1.56755) = 11%.

    Now, that last number looks about right, but the other two probabilities look way too high. The standard error for both dummies and the intercept is around .8, and the actual probabilities are likely to be within confidence interverals, but its a bit depressing to have such high sds. Also, I chose to show penn, because, while nearly all schools had negative signs on y2009d, not many are significant at any level.

    In fact, Cornell gives probability in 2007 of 5%, in 2008 of 41% and in 2009 of 47% (that is the sign on y2009d was positive, indeed it was equal to 2.76 with sd 1.13). I think Cornell was thrown off by two applied admits and wind up bird's admit all with less than 800 GREQ, and all the rejections having 800 GREQs.

    Anyway, the take from this is that I am not seeing a large effect of the y2009d (although there does seem to be a small one). If instead of using dummies, we use include a year trend, usually the coefficient is negative, but this doesn't make everything much clearer.

    We do have a few other pieces of data. One of note is jlists statement about chicago, basically that this year was harder than average. Indeed, it was harder than 2008, but only barely harder than 2007. To compare 2009 and 2007 for uc, we look at a model with y2007d and y2009d, so we get:

    ___________ Estimate _ Std. Error _ z value _ Pr(>|z|)
    (Intercept) _ 0.33146 __ 0.53573 __ 0.619 __ 0.5361
    GREQ ______ 0.06039 __ 0.03977 __ 1.519 __ 0.1289
    GPA _______ 3.03195 __ 2.14808 __ 1.411 __ 0.1581
    y2007d ____ -1.35248 __ 0.77709 __ -1.740 _ 0.0818 .
    y2009d ____ -1.39423 __ 0.75848 __ -1.838 _ 0.0660 .

    Using invlogit we can calculate probabilities of acc in 2009 of 36% versus 38% in 2007 which seems a bit high) on tmers applying to chicago.

    The other piece of data we drop is applicants acceptances to other schools. The holy grail for this data set is to somehow incorporate that data, but I'm yet to figure out how...

    Other notes: there is some selectivity in reporting information. I don't think I am the only person who did not bother to put MIT rejection in the admitions and rejections thread - I think generally people who don't get any top 10 admits don't bother to post all of their top 5 results, its a bit emberrassing to say that I thought there was even a chance (my excuse was that everyone is supposed to apply for nsf, and if you get nsf, you might get into mit). Further selectivity is based on the fact that the average tmer is possibly above average overall. In fact, I think seeing greater than 50% of tmers get into chicago last year makes it pretty clear. Also, because n is small, small fluctuations such as someone with low numbers getting into top 5 schools can mess things up (in a good way). Today it looks like Princeton is rejecting many of its waitlisted candidates. since the 2009 numbers don't include waitlists, and most waitlists are eventually rejected (especially at top schools), this will change things a bit, too. I checked the numbers under the assumption that all waitlists in 2009 are rejected, and it didn't change much (although the changes were significant for those involved - I hope that trying to look at these statistics doesn't hurt anyone on a personal level).
    Last edited by untitled; 04-14-2009 at 09:43 PM. Reason: format tables

  2. #2
    Within my grasp! ICECOLDECON 's dreams are becoming reality. ICECOLDECON's Avatar
    Join Date
    Dec 2007
    Posts
    487
    Im basically interested in a differences in averages approach to the number of below top 10 schools applied to by students who applied to applied to a decent number of top 10 schools.

    Trouble is what is decent and what is and is not top 10? Too much subjectivity for my taste.
    UT Austin Entering Class of 2009

    Do you like fishsticks???

  3. #3
    Just wanna get in one Jaktu just joined TestMagic.
    Join Date
    Jan 2009
    Posts
    83
    Commendable effort, but a few points:
    • I don't think there are enough data points. I didn't count, but estimated maybe 60 for 2009? For me, that seems pretty small to draw meaningful conclusions.
    • The data points are those people who post on TM. So you are looking at the people who do research online and ask questions (i.e. probably care more than average). Hence, their profiles will probably be better than average.
    • Concentrating on strictly GPA and GRE-Q is not very good to draw conclusions. For GRE-Q, especially, since it's not very hard to reach the cap (800) combined with the previous point means that most data points here will be high. When talking about the quality of applications this year, people at schools (secretaries and admissions people) mentioned the unprecedented depth of the applications. I'm assuming this means more than simply better GPA and GRE scores.


  4. #4
    UW bound! Econtastic is on the way! Econtastic's Avatar
    Join Date
    Jun 2008
    Posts
    226
    Quote Originally Posted by Jaktu View Post
    Commendable effort, but a few points:
    • I don't think there are enough data points. I didn't count, but estimated maybe 60 for 2009? For me, that seems pretty small to draw meaningful conclusions.
    • The data points are those people who post on TM. So you are looking at the people who do research online and ask questions (i.e. probably care more than average). Hence, their profiles will probably be better than average.
    • Concentrating on strictly GPA and GRE-Q is not very good to draw conclusions. For GRE-Q, especially, since it's not very hard to reach the cap (800) combined with the previous point means that most data points here will be high. When talking about the quality of applications this year, people at schools (secretaries and admissions people) mentioned the unprecedented depth of the applications. I'm assuming this means more than simply better GPA and GRE scores.

    I'm not sure if the selection bias is that big of an issue, there is a very wide range of profiles on TM, and I doubt that there are a lot of PhD applicants who don't really care, maybe they just never thought to look for a site like TM (or are so good that they don't believe they need the advice)
    As far as the sample size goes, I agree that this might be an issue, but I have no idea how to address it
    Attending: University of Washington

  5. #5
    Apropos of the Wet Snow decision09 just joined TestMagic. decision09's Avatar
    Join Date
    Mar 2009
    Posts
    55
    Let's see if this displays correctly, I am not familiar with a divide by four rule but one needs to be careful with logit and binary explanatory variables, the simplest and best way to do this is using the average marginal effect (Cox, D. R., Snell, E. J. (1989). Analysis of binary data.) :


    $n^{-1}\displaystyle\sum_{i=1}^n {\left(G\left(\hat{\beta}_0 + \hat{\beta}_1x_{i,1}+\ldots+\hat{\beta}_{k-1}x_{i,k-1}+\hat{\beta}_k(c_{k}+1)\right)\;-\;G\left(\hat{\beta}_0 + \hat{\beta}_1 x_{i,1}+\ldots+\hat{\beta}_{k-1}x_{i,k-1}+\hat{\beta}_k c_{k}\right)\right)}$\symbolfootnote[2]{where $G(z)=\frac{exp(z)}{1+exp(z)}$ is the CDF from the logistic function.}

    OK, so it does not but if you put it in a latex editor it should display correctly.

    All it says is to calculate the predicted values using the binary variable in question set to 1 and then calculate the predicted values with the binary variable set to 0, take the difference and then compute the mean of this: you get the average partial effect.

    I hope I did not misinterpret something you did 'untitled'. If so, all apologies.
    Accepted: Wisconsin(No 1st year $),Cornell(?)
    Rejected: UMich,Berkeley,Yale,NWestern,Penn,Brown,Duke,PSU,UCLA,Maryland,Mich State
    Waitlisted: Null; Silent: Caltech, UCSD, BU, Texas(Austin)

  6. #6
    Apropos of the Wet Snow decision09 just joined TestMagic. decision09's Avatar
    Join Date
    Mar 2009
    Posts
    55
    Holy crap man, that data file must taken a few snickers.
    Accepted: Wisconsin(No 1st year $),Cornell(?)
    Rejected: UMich,Berkeley,Yale,NWestern,Penn,Brown,Duke,PSU,UCLA,Maryland,Mich State
    Waitlisted: Null; Silent: Caltech, UCSD, BU, Texas(Austin)

  7. #7
    What's with the fish? Aumann just joined TestMagic.
    Join Date
    Mar 2009
    Location
    UK
    Posts
    149
    Quote Originally Posted by Econtastic View Post
    I'm not sure if the selection bias is that big of an issue, there is a very wide range of profiles on TM, and I doubt that there are a lot of PhD applicants who don't really care, maybe they just never thought to look for a site like TM (or are so good that they don't believe they need the advice)
    As far as the sample size goes, I agree that this might be an issue, but I have no idea how to address it
    There is a wide range of profiles here, but even if the average TM profile isn't better than the average non-TM profile (and I think it is), your average TMer will certainly be more informed, will therefore apply to more appropriate schools and will know how to squeeze alot more out of their profile than others would.

  8. #8
    Within my grasp! untitled is a TestMagic guru. Show your respect! untitled is a TestMagic guru. Show your respect!
    Join Date
    Apr 2008
    Posts
    447
    @ ICECOLDECON: No problem, I'll post again with something like that soon.

    @ Jaktu: I think I mentioned n of 44 for Penn, it was 49 for Chicago (dof is 4 less, remember). here certainly aren't many data points, but there are enough to look at and think about what we see. I think one of the hardest part of applied micro is that you often don't get to do large N surveys. There are something like 61 countries in Africa, and only 16 in South America - country level studies focussing on a particular continent is a big challenge, but worth doing. Here we have a couple hundred data points, but only 20-60 per school, which isn't terrible, I think. Your second point is fine, this isn't a random sample, and we can't be too confident about using our results to predict the future, but we can use our results to learn about ourselves as a class. The third point is a big deal, I could try to use GREV and AW, or an "attended grad school" dummy, but I haven't had enough time to work on the code in the last month to try to add these. Even then, the numbers are a bad way to do this. What I'd like to do is somehow use the information of the other schools which rejected and accepted each candidate to better predict a candidates chances at a school. I have some rough ideas about this, but am a long way from implimentation, and the small n-ness is going to be a difficult issue to grapple with.

    @ Econtastic: Our earlier extra analysis of profiles shows the selection bias in tm profiles. As has been noted, All tmers who applied to UC-Davis in the past were admitted, but we know that in the past, their admission rate was less than 50% (I think it is like 35% or 40%, but I'm not sure). But as I said, I think this is ok, as long as we keep it in mind.

    @ decision09: I used the divide by 4 rule to estimate things while I was first writing the post, but when I started talking about implied probabilities, I switched to the invlogit function, which I think is the implication of your first post (btw, if anyone wants to see the latex rendered without using a tex editor, paste the code into the text box at this link). As for the data, it comes out of the same code and analysis I've been using all winter, so it wasn't much to put together. I don't know if I mentioned it, but where I didn't code a GPA, I just use 3.73, the average of all GPAs, for that applicant (missing data imputation is not my strong point, but I figure, why not).

    @ Aumann: I completely agree.

  9. #9
    Within my grasp! untitled is a TestMagic guru. Show your respect! untitled is a TestMagic guru. Show your respect!
    Join Date
    Apr 2008
    Posts
    447
    Here is a link with the average gpa and gre q scores of applicants and accepted applicants. Sorry, the page is rough, I don't have as much time today as I thought, and I'll be out the rest of the week, so I wanted to get ICECOLDECON answer, even if it isn't perfect.

  10. #10
    Within my grasp! ICECOLDECON 's dreams are becoming reality. ICECOLDECON's Avatar
    Join Date
    Dec 2007
    Posts
    487
    Quote Originally Posted by untitled View Post
    Here is a link with the average gpa and gre q scores of applicants and accepted applicants. Sorry, the page is rough, I don't have as much time today as I thought, and I'll be out the rest of the week, so I wanted to get ICECOLDECON answer, even if it isn't perfect.
    I appreciate the effort! I actually was trying to test the hypothesis that a large jump in the quality of applicants at schools might have something to do with applicants applying to more safety schools. So I was going to look at the average number of non top ten schools applied to by top ten applicants (i.e. those who apply to a number of top ten schools....not just one or two for the heck of it) and compare this number across years. This measure would be full of problems anyways....
    UT Austin Entering Class of 2009

    Do you like fishsticks???

+ Reply to Thread
Page 1 of 3
1 2 3 LastLast

Similar Threads

  1. What explains the quality of this year's applicants?
    By Jaktu in forum PhD in Economics
    Replies: 33
    Last Post: 04-15-2009, 03:12 AM
  2. Replies: 10
    Last Post: 11-14-2007, 02:18 AM
  3. DS - jump! (Papertest 37)
    By nick_sun in forum GMAT Math
    Replies: 1
    Last Post: 05-11-2007, 02:00 PM
  4. genes can jump
    By gmattrap in forum GMAT Sentence Correction
    Replies: 4
    Last Post: 01-17-2007, 07:23 PM
  5. Cannot Jump To A Particular Forum
    By Sidharthskumar in forum Feedback
    Replies: 1
    Last Post: 08-09-2005, 12:58 AM

Bookmarks

What you can do

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

SEO by vBSEO 3.5.0 RC2