Thursday, June 6, 2013

Simulating ICSE marks (Hacking into the Indian Education System)


I read article above and could not resist trying to simulate the behavior. The spike problem pointed out in the article is similar to one that happens with Hash functions. If the grader grade in certain increments, e.g. 0, 2, 4, 8, 10, or 0, 5, 10, it is possible to replicate the plots.

The plots below are generated assuming there is are internal grades for some project and a number of questions with some grades per question. I also assume that the grader grades in some increments, e.g. e.g. 0, 2, 4, 8, 10, or 0, 5, 10. I generated random samples for 100,000 students for both internal and external marks. The source code is at the end. The first set of plots is without any bias, i.e. the grader was not biased to give good or bad grades. The second set of plots assumes three different types of graders, each one is either not biased (0), biased to give good marks (positive bias) or biased to give bad marks (negative bias).

Without Bias: We can evidence of spikes in almost all cases below, except the last one.

With Bias: With bias, we can still see spikes and also see skew, and bimodal distributions.

Conclusion: It is posible that ICSE graders are grading questions in some common increments and they are not necessarily rigging the grades.

Python Source Code:

With Bias:
from __future__ import division

import math
import numpy
import pylab

def marks2(n=100, internal_marks=[0, 10, 20],

           per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None,
           bias1=0, bias2=0, bias3=0):

    m1 = marks(n, internal_marks, per_question_marks, total, figname, bias1)

    m2 = marks(n, internal_marks, per_question_marks, total, figname, bias2)

    m3 = marks(n, internal_marks, per_question_marks, total, figname, bias3)

    m = numpy.concatenate([m1, m2, m3])

    x = numpy.arange(total+2)

    y, _ = numpy.histogram(m, x)

    print "Mean=%s, stderr=%s min=%s max=%s median=%s" % \

        (m.mean(), m.std() / math.sqrt(len(m)), m.min(), m.max(), numpy.median(m))

    pylab.plot(x[:-1], y, 'r-o')


    pylab.title("Internal=%s, External=%s,\nn=%s, bias1=%d, bias2=%d bias3=%d" % (str(internal_marks), str(per_question_marks), n, bias1, bias2, bias3))



def marks(n=100, internal_marks=[0, 10, 20],

          per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None, bias=0):

    m = numpy.zeros(n)

    for i in range(n):

        m[i] = marks_one_student(internal_marks, per_question_marks, total, bias)

    return m

def pos(n, bias):

    return numpy.max([0, numpy.min([numpy.random.randint(n) + bias, n-1])])

def marks_one_student(internal_marks, per_question_marks, total, bias):

    nquestions = (total - numpy.max(internal_marks)) // numpy.max(per_question_marks)

    assert numpy.max(internal_marks) + \

        nquestions * numpy.max(per_question_marks) == total

    marks = internal_marks[pos(len(internal_marks), bias)]

    for i in range(nquestions):

        marks += per_question_marks[pos(len(per_question_marks), bias)]

    return marks

Without Bias:

from __future__ import division

import math

import numpy

import pylab

def marks(n=100, internal_marks=[0, 10, 20],

          per_question_marks=[0, 2, 4, 8, 10], total=100, figname=None):

    m = numpy.zeros(n)

    for i in range(n):

        m[i] = marks_one_student(internal_marks, per_question_marks, total)

    print "Mean=%s, stderr=%s min=%s max=%s median=%s" % \

        (m.mean(), m.std() / math.sqrt(len(m)), m.min(), m.max(), numpy.median(m))

    x = numpy.arange(total+2)

    y, _ = numpy.histogram(m, x)

    print len(x), x

    print len(y), y

    pylab.plot(x[:-1], y, 'r-o')


    pylab.title("Internal=%s, External=%s,\nn=%s, Mean=%.1f, Median=%.1f" % (str(internal_marks), str(per_question_marks), n, m.mean(), numpy.median(m)))



    if figname:


    return m

def marks_one_student(internal_marks, per_question_marks, total):

    nquestions = (total - numpy.max(internal_marks)) // numpy.max(per_question_marks)

    assert numpy.max(internal_marks) + \

        nquestions * numpy.max(per_question_marks) == total

    marks = internal_marks[numpy.random.randint(0, len(internal_marks))]

    for i in range(nquestions):

        marks += per_question_marks[numpy.random.randint(0, len(per_question_marks))]

    return marks

Wednesday, May 8, 2013

Min and Max of 2 numbers

Problem: Find minimum and maximum of given 2 numbers.

Solution in Python:

def max(n, m):
    Author: Mayur P Srivastava

    if n >= m:
        return n
    return m

def min(n, m):
    Author: Mayur P Srivastava

    if n < m:
        return n
    return m

Concepts Learned: Logical conditions.

Cricket net run rate

Problem: Calculate net run rate for a cricket match, given runs scored by team A (batted first), overs played by team A, runs scored by team B, overs played by team B, winning team name, whether team A was bowled out, whether team B was bowled out, total number of overs in one innings.

Solution in Python:

from __future__ import division

import math

def net_run_rate(runsA, oversA, runsB, oversB,
    Author: Mayur P Srivastava

    assert winning_team in ['AB', 'A', 'B']

    if winning_team == 'AB':
        return 0.0, 0.0

    oversA = parse_overs(oversA)
    oversB = parse_overs(oversB)

    if abs(oversA - total_overs) > eps:
        bowled_outA = True

    if abs(oversB - total_overs) > eps and winning_team == 'A':
        bowled_outB = True

    if bowled_outA:
        oversA = total_overs
    if bowled_outB:
        oversB = total_overs

    rrA = runsA / oversA
    rrB = runsB / oversB
    if winning_team == 'A':
        nrrA = rrA - rrB
        nrrB = -nrrA
        nrrB = rrB - rrA
        nrrA = -nrrB

    return (nrrA, nrrB), (runsA, runsB), (oversA, oversB)

def parse_overs(o):
    completed_overs = math.floor(o)

    balls = math.floor(0.5 + 10 * (o - completed_overs))
    assert balls >= 0 and balls < 6
    return completed_overs + balls / 6.0

Concepts Learned: Maths

Cricket run rate

Problem: Compute current run rate and required run rate for a cricket game. Given: current score, target score, number of overs bowled, total number of overs.

Solution in Python:

from __future__ import division

import math

def calculate_run_rates(current_score, target_score, current_overs, total_overs):

    Author: Mayur P Srivastava

    In overs, fraction part represents number of balls,
    e.g. 5.1, 5.2, 5.3, 5.4, 5.5, 6.0

    current_overs = parse_overs(current_overs)
    total_overs   = parse_overs(total_overs)

    if current_overs > 0:
        current_rr = current_score / current_overs
        current_rr = 0

    remaining_overs = total_overs - current_overs
    runs_to_win     = target_score - current_score + 1

    required_rr = runs_to_win / remaining_overs

def parse_overs(o):
    completed_overs = math.floor(o)

    balls = math.floor(0.5 + 10 * (o - completed_overs))
    assert balls >= 0 and balls < 6
    return completed_overs + balls / 6.0

Concepts Learned: Maths

Matrix Addition

Problem: Add the given 2 matrixes.

Solution in Python:

def add(A, B):
    Author: Mayur P Srivastava

    m1, n1 = shape(A)
    m2, n2 = shape(B)

    if not can_add(m1, n1, m2, n2):
        return None

    C = create_matrix(m1, n2)

    for i in range(m1):
        for j in range(n2):
            C[i][j] = A[i][j] + B[i][j]

    return C

def shape(A):
    m = len(A)   
    n = 0
    for row in A:
        n2 = len(row)
        if n == 0:
            n = n2
        elif n != n2:
            assert False

    return m, n

def can_add(m1, n1, m2, n2):
    return m1 == m2 and n1 == n2

def create_matrix(m, n, value=0):
    matrix = []
    for i in range(m):
        row = []
        for j in range(n):
    return matrix

Concepts Learned: Nested loops and Maths

Matrix Multiplication

Problem: Multiply the given 2 matrixes.

Solution in Python:

def multiply(A, B):
    Author: Mayur P Srivastava

    m1, n1 = shape(A)
    m2, n2 = shape(B)

    if not can_multiply(m1, n1, m2, n2):
        return None

    C = create_matrix(m1, n2)

    for i in range(m1):
        for j in range(n2):
            c = 0
            for k in range(n1):
                c += A[i][k] * B[k][j]
            C[i][j] = c

    return C

def shape(A):
    m = len(A)   
    n = 0
    for row in A:
        n2 = len(row)
        if n == 0:
            n = n2
        elif n != n2:
            assert False

    return m, n

def can_multiply(m1, n1, m2, n2):
    return n1 == m2

def create_matrix(m, n, value=0):
    matrix = []
    for i in range(m):
        row = []
        for j in range(n):
    return matrix

Concepts Learned: Maths


Problem: Check whether a given number n is divisible by another number m.

Solution in Python:

def is_divisible(n, m):
    Author: Mayur P Srivastava

    if m == 0:
        return False

    return n % m == 0

Concepts Learned: Maths