Tuesday, 21 July 2015

A Universal Date type in Nepali

During my corpus collection work in Nepali, I wanted to understand all types/formats of dates available in the News sources. Many news sources keep their date information in  different formats. For example: eKantipur uses २०७२ श्रावण ५ ०८:३१  format where as Nagarik News uses मङ्गलबार ५ श्रावण, २०७२ format and so on.  The diagrammatic representation of the system in state machine is given below.

Fig. State machine for different Nepalese date formats.

My intuition is to make this corpus searchable too. So, I wrote a computer program that understand different formats of Nepalese date and index it into sort-able formats.  For more detail, click here.

Tuesday, 7 July 2015

Research – Morphological and Sentiment Analysis based on Nepali Corpus

My Work: A morphological analyzer for Nepali language and Sentiment analysis as a classifier is in online with demo is posted here.

Friday, 3 July 2015

Beginning Stuff: Lab problems on C - Programming for Undergraduate students

My under graduate students ask me to post some materials on C-Programming. This post helps you to begin in programming with some implementation of problems from undergraduate first year. 
   
This sheet introduces the IDE development environment, basics of program coding, compilation  and run, and also familiarize the printf and scanf functions. This is designed for two lab days.

This sheet helps students to convert the program specification into the C program i.e. flowchart to program. It especially designed to be familiar with appropriate data types and the data arithmetic. It requires one lab day.

This sheet is designed to learn the usage of decision statements in C programming and requires two lab days. 

This sheet introduces the loop statements in C programming, I mean simple is without nesting and requires three lab days.

This sheet introduces the loops and arrays in C programming. Students will realize the importance of loops nesting here and requires two lab days.

This sheet is designed to make students familiar with strings and arrays. It requires two lab days to complete.

This sheet introduces writing and calling functions in C programming and requires two lab days.

This sheet introduces the definition and usage of pointers type to solve the problems in C programming. It requires two lab days to complete.

This sheet is designed to make students familiar with file input/output and the usage of structure (user defined type) in C programming language. It requires four lab days.

I like to post some sample solutions here:

1. Solution of dictionary based string comparison.

#include<stdio.h>

int compareStrings(char *, char *);
int getStringLength(char *);
char* strToupper(char*);

int compareStrings(char *a, char *b){
    int result = 0;
    while(*a != '\0' || *b != '\0'){
    printf("Iteration : compare %c and %c.\n", *a, *b);
        if(*a > *b){
            return 1;
        }else if (*a < *b){
            return -1;
        }
        a++;b++;
    }
    if(*a != '\0'){
        return 1;
    }else if(*b != '\0'){
        return -1;
    }
    return result;
}

char *strToupper(char *str){
    int i = 0;
    char *rStr = (char*) malloc((strlen(str) + 1) * sizeof(char));
    for(; str[i]; ++i){
        if((str[i] >= 'a') && (str[i] <= 'z'))
            rStr[i] = str[i] + 'A' - 'a';
        else
            rStr[i] = str[i];
    }
    rStr[i] = '\0';
    return rStr;
}

int getStringLength(char *str){
    int result = 0;
    while(*str != '\0'){
        result ++;
        str++;
    }
    return result;
}

int main(){
    char *first = "appze";
    char *second = "apple";
    int result = compareStrings(first, second);
    first = strToupper(first);
    second = strToupper(second);
    printf("Comparison result : %d\n", result);
    printf("Length of %s = %d\n", first, getStringLength(first));
    printf("Length of %s = %d\n", second, getStringLength(second));
    return(0);
}

2. Solution to calculate angle between two vectors.

//Vector Demo in C
#include <stdio.h>
#include <math.h>
#define MAX 10 // denotes the maximum dimensions of a vector

typedef struct {
    double coef[MAX];
}Vector;

double Magnitude(Vector);
double DotProduct(Vector, Vector);
double CostTheta(Vector, Vector);
double Theta(Vector, Vector);

double Magnitude(Vector v){
    double sum = 0.0;
    int index;
    for(index = 0; index < MAX; index++){
        sum += v.coef[index] * v.coef[index];
    }
    return sqrt(sum);
}

double DotProduct(Vector v, Vector w){
    double sum = 0.0;
    int index;
    for(index = 0; index < MAX; index++){
        sum += v.coef[index] * w.coef[index];
    }
    return sum;
}

double CosTheta(Vector x, Vector y){
    return DotProduct(x, y)/(Magnitude(x) * Magnitude(y));
}

double Theta (Vector v, Vector w){
    return acos(CosTheta(v, w)) * 180.0 / M_PI;
}

int main(){
    Vector v1 = {{40, 0, 45, 30, 20, 89, 90, 100, 5, 0}};
    Vector v2 = {{20, 100, 6, 89, 999, 9, 900, 89, 50, 21}};
    printf("Angle : %.2f degree.\n", Theta(v1, v2));
    return 0;
}

I believe that these sheets of problems helps to improve the programming skill of especially undergraduate students and someone who begins to write computer program in C. Please feel free to post your comments, you are most welcome. Thanks to Shailesh and Nischal Dai.