\documentstyle[12pt,fullpage]{article}

Assignment 3 (Neural Computation, Spring 2009)

Number of problems/points: Four problems for total of 150 points

Out: February 27, 2009

Due: March 15, in class.

Problem 1: (25 points)

Prove that Tilling algorithm presented in last lecture will converging.

Hint: Construct a weight assignment from layer k to layer k+1 such that the error of the master node in layer k+1 is smaller that the error of the master node in layer k.

Problem 2: (25 points)

In this problem your job is to explore to what extend baseball salaries are based on performance. For this, download data description files http://www.amstat.org/publications/jse/datasets/baseball.txt and salary and performance data on 337 Major League Baseball players who played at least one game in both the 1991 and 1992 seasons (contained at file http://www.amstat.org/publications/jse/datasets/baseball.dat).

(a) Then, in 5-cross validation experiments develop a regression neural network to predict 1992 salary of baseball players (in thousands of collars) based on the following performance measures from 1991: batting average, on-base percentage, number of runs scored, number of hits, number of doubles, number of triples, number of home runs, number of runs batted in, number of bases on balls or walks, number of strikeouts, number of stolen bases and number of errors made. Report accuracy in terms of coefficient of determination (r-square measure) and also in terms of dollar error and error-deviation.

(b) Remove outliers and repeat analysis for the remaining player.

Problem 3: (50 points)

Download files hw3data_train.txt and hw3data_test.txt from CIS525 homework site (this is a reduced version of NIST database of hand written digits as prepared by Dr. G. Grudic at Colorado State University).

Each of these files contains 2,500 matrices where each matrix consists of 144 attributes representing a 14x14 discretized version of a hand written digit (text on top of a matrix tells which digit it represents; if you replace zero’s by blanks you should be able to see what digit each matrix encodes).

(a) Use examples from hw3data_train.txt for training a multi-layer feed-forward network to recognize each of 10 digits and evaluate the prediction accuracy on examples from hw3data_test.txt. Make sure to normalize attributes and shuffle matrices of training dataset before attempting to design classifiers. Compare computational efficiency and accuracy when training neural networks using the backpropagation vs. a second-order optimization method of your choice (e.g. Levenberg-Marquardt algorithm). In accuracy comparison for each class vs. not that class plot ROC curves for both predictors (false positives on X-axis versus true positives on Y-axis). Plot both curves on the same ROC graph for easier visual comparison. At http://gim.unmc.edu/dxtests/roc1.htm you can find a nice example on calculatingsensitivity and specificity, and drawing and interpreting ROC curve. Report your methodology and findings concisely.

(b) Solve the same problem using Radial Basis Functions neural network. Compare computational efficiency and accuracy to learning methods used in part (a).

( c) Solve the same problem using linear and polynomial kernel based Support Vector Machines. You are recommended to use SVM-Light software that you can download from http://svmlight.joachims.org/. Compare computational efficiency and accuracy to learning methods used in parts (a) and (b).

Problem 4: (50 points)

Write a research proposal for the CIS 525 class project that you plan

to perform. Use the following format:

· Title and author;

· Objective and Significance (outline learning problem);

· Background (summarize related work by others);

· Proposed Approach (describe available data, learning method you plan to use,

how you plan to test and compare to alternative methods);

· Provide approximate timeline of the proposed work.

· References.

The description may not exceed 3 pages in 12 pt style.

Write your e-mail address on page one to allow prompt feedback.