Ref: http://deedy.quora.com/Hacking-into-the-Indian-Education-System?srid=3THA&share=1
I read article above and could not resist trying to simulate the behavior. The spike problem pointed out in the article is similar to one that happens with Hash functions. If the grader grade in certain increments, e.g. 0, 2, 4, 8, 10, or 0, 5, 10, it is possible to replicate the plots.
The plots below are generated assuming there is are internal grades for some project and a number of questions with some grades per question. I also assume that the grader grades in some increments, e.g. e.g. 0, 2, 4, 8, 10, or 0, 5, 10. I generated random samples for 100,000 students for both internal and external marks. The source code is at the end. The first set of plots is without any bias, i.e. the grader was not biased to give good or bad grades. The second set of plots assumes three different types of graders, each one is either not biased (0), biased to give good marks (positive bias) or biased to give bad marks (negative bias).
Without Bias: We can evidence of spikes in almost all cases below, except the last one.
With Bias: With bias, we can still see spikes and also see skew, and bimodal distributions.
Conclusion: It is posible that ICSE graders are grading questions in some common increments and they are not necessarily rigging the grades.
Python Source Code:
With Bias:
from __future__ import division import math import numpy import pylab def marks2(n=100, internal_marks=[0, 10, 20], per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None, bias1=0, bias2=0, bias3=0): m1 = marks(n, internal_marks, per_question_marks, total, figname, bias1) m2 = marks(n, internal_marks, per_question_marks, total, figname, bias2) m3 = marks(n, internal_marks, per_question_marks, total, figname, bias3) m = numpy.concatenate([m1, m2, m3]) x = numpy.arange(total+2) y, _ = numpy.histogram(m, x) print "Mean=%s, stderr=%s min=%s max=%s median=%s" % \ (m.mean(), m.std() / math.sqrt(len(m)), m.min(), m.max(), numpy.median(m)) pylab.plot(x[:-1], y, 'r-o') pylab.grid() pylab.title("Internal=%s, External=%s,\nn=%s, bias1=%d, bias2=%d bias3=%d" % (str(internal_marks), str(per_question_marks), n, bias1, bias2, bias3)) pylab.xlabel("marks") pylab.draw() def marks(n=100, internal_marks=[0, 10, 20], per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None, bias=0): m = numpy.zeros(n) for i in range(n): m[i] = marks_one_student(internal_marks, per_question_marks, total, bias) return m def pos(n, bias): return numpy.max([0, numpy.min([numpy.random.randint(n) + bias, n-1])]) def marks_one_student(internal_marks, per_question_marks, total, bias): nquestions = (total - numpy.max(internal_marks)) // numpy.max(per_question_marks) assert numpy.max(internal_marks) + \ nquestions * numpy.max(per_question_marks) == total marks = internal_marks[pos(len(internal_marks), bias)] for i in range(nquestions): marks += per_question_marks[pos(len(per_question_marks), bias)] return marks
from __future__ import division import math import numpy import pylab def marks(n=100, internal_marks=[0, 10, 20], per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None): m = numpy.zeros(n) for i in range(n): m[i] = marks_one_student(internal_marks, per_question_marks, total) print "Mean=%s, stderr=%s min=%s max=%s median=%s" % \ (m.mean(), m.std() / math.sqrt(len(m)), m.min(), m.max(), numpy.median(m)) x = numpy.arange(total+2) y, _ = numpy.histogram(m, x) print len(x), x print len(y), y pylab.plot(x[:-1], y, 'r-o') pylab.grid() pylab.title("Internal=%s, External=%s,\nn=%s, Mean=%.1f, Median=%.1f" % (str(internal_marks), str(per_question_marks), n, m.mean(), numpy.median(m))) pylab.xlabel("marks") pylab.draw() if figname: pylab.savefig(figname) return m def marks_one_student(internal_marks, per_question_marks, total): nquestions = (total - numpy.max(internal_marks)) // numpy.max(per_question_marks) assert numpy.max(internal_marks) + \ nquestions * numpy.max(per_question_marks) == total marks = internal_marks[numpy.random.randint(0, len(internal_marks))] for i in range(nquestions): marks += per_question_marks[numpy.random.randint(0, len(per_question_marks))] return marks