Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Thursday, June 6, 2013

Simulating ICSE marks (Hacking into the Indian Education System)


Ref: http://deedy.quora.com/Hacking-into-the-Indian-Education-System?srid=3THA&share=1

I read article above and could not resist trying to simulate the behavior. The spike problem pointed out in the article is similar to one that happens with Hash functions. If the grader grade in certain increments, e.g. 0, 2, 4, 8, 10, or 0, 5, 10, it is possible to replicate the plots.

The plots below are generated assuming there is are internal grades for some project and a number of questions with some grades per question. I also assume that the grader grades in some increments, e.g. e.g. 0, 2, 4, 8, 10, or 0, 5, 10. I generated random samples for 100,000 students for both internal and external marks. The source code is at the end. The first set of plots is without any bias, i.e. the grader was not biased to give good or bad grades. The second set of plots assumes three different types of graders, each one is either not biased (0), biased to give good marks (positive bias) or biased to give bad marks (negative bias).

Without Bias: We can evidence of spikes in almost all cases below, except the last one.







With Bias: With bias, we can still see spikes and also see skew, and bimodal distributions.






Conclusion: It is posible that ICSE graders are grading questions in some common increments and they are not necessarily rigging the grades.


Python Source Code:

With Bias:
from __future__ import division

import math
import numpy
import pylab


def marks2(n=100, internal_marks=[0, 10, 20],

           per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None,
           bias1=0, bias2=0, bias3=0):

    m1 = marks(n, internal_marks, per_question_marks, total, figname, bias1)

    m2 = marks(n, internal_marks, per_question_marks, total, figname, bias2)

    m3 = marks(n, internal_marks, per_question_marks, total, figname, bias3)

    m = numpy.concatenate([m1, m2, m3])

    x = numpy.arange(total+2)

    y, _ = numpy.histogram(m, x)

    print "Mean=%s, stderr=%s min=%s max=%s median=%s" % \

        (m.mean(), m.std() / math.sqrt(len(m)), m.min(), m.max(), numpy.median(m))

    pylab.plot(x[:-1], y, 'r-o')

    pylab.grid()

    pylab.title("Internal=%s, External=%s,\nn=%s, bias1=%d, bias2=%d bias3=%d" % (str(internal_marks), str(per_question_marks), n, bias1, bias2, bias3))

    pylab.xlabel("marks")

    pylab.draw()


def marks(n=100, internal_marks=[0, 10, 20],

          per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None, bias=0):

    m = numpy.zeros(n)

    for i in range(n):

        m[i] = marks_one_student(internal_marks, per_question_marks, total, bias)

    return m

def pos(n, bias):

    return numpy.max([0, numpy.min([numpy.random.randint(n) + bias, n-1])])



def marks_one_student(internal_marks, per_question_marks, total, bias):

    nquestions = (total - numpy.max(internal_marks)) // numpy.max(per_question_marks)

    assert numpy.max(internal_marks) + \

        nquestions * numpy.max(per_question_marks) == total

    marks = internal_marks[pos(len(internal_marks), bias)]

    for i in range(nquestions):

        marks += per_question_marks[pos(len(per_question_marks), bias)]


    return marks

Without Bias:

from __future__ import division



import math

import numpy

import pylab



def marks(n=100, internal_marks=[0, 10, 20],

          per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None):

    m = numpy.zeros(n)

    for i in range(n):

        m[i] = marks_one_student(internal_marks, per_question_marks, total)



    print "Mean=%s, stderr=%s min=%s max=%s median=%s" % \

        (m.mean(), m.std() / math.sqrt(len(m)), m.min(), m.max(), numpy.median(m))

    x = numpy.arange(total+2)

    y, _ = numpy.histogram(m, x)



    print len(x), x

    print len(y), y



    pylab.plot(x[:-1], y, 'r-o')

    pylab.grid()

    pylab.title("Internal=%s, External=%s,\nn=%s, Mean=%.1f, Median=%.1f" % (str(internal_marks), str(per_question_marks), n, m.mean(), numpy.median(m)))

    pylab.xlabel("marks")

    pylab.draw()



    if figname:

        pylab.savefig(figname)



    return m





def marks_one_student(internal_marks, per_question_marks, total):

    nquestions = (total - numpy.max(internal_marks)) // numpy.max(per_question_marks)

    assert numpy.max(internal_marks) + \

        nquestions * numpy.max(per_question_marks) == total



    marks = internal_marks[numpy.random.randint(0, len(internal_marks))]



    for i in range(nquestions):

        marks += per_question_marks[numpy.random.randint(0, len(per_question_marks))]



    return marks