CIS 4524/5524: ANALYSIS AND MODELING OF SOCIAL AND INFORMATION NETWORKS
Spring 2025

Homework Policies (applicable for all assignments):


  1. You are required to do the homework problems in order to pass.
  2. Understandability of the solution is as desired as correctness.
  3. Penalty for late homework assignments submissions is 20% per day. So, do it on time.
  4. Solutions are expected to be your own work. Group work is not allowed unless explicitly approved for a particular problem. If you obtained a hint with help (e.g., through library work, discussion with another person, etc.) acknowledge your source, and write up the solution on your own. Plagiarism and other anti-intellectual behavior will be dealt with severely.

Assignment 0

Out: January 15
Due: Friday, January 17 by 11:59AM (Noon)

Background Knowledge Survey:

Please answer 3 survey questions by simply indicating "Yes" if you have some background knowledge of this topic or "No" if you do not. You need to answer this survey to do Homework1. This survey is due Friday, Jan 17, 2025 at noon. No Canvas submission is needed.

Link to survey: https://forms.office.com/r/kU5mjZvxqv

Assignment 1

Out: January 17
Due: January 23 by 5:00 pm on Canvas
*Please write your name and TUID at the top of your CANVAS submission.

Download the Homework 1 from the class homework folder. This table is generated according to the background survey for this course. In this table, each raw represents a response of one of you to a survey asked as Homework 0; the column values are 1 if you have some background experience in (1) Data Mining (CIS 4523/5523) or Machine Learning); (2) Python or R programming; (3) Graphs or Statistics.

Problem 1. [Visualizing a Multilayer Network]
In this exercise, your job is to visualize a 3-layer course network, where nodes represent students. Each layer corresponds to a single topic network representing one type of relationship among students taking this course (e.g., in layer 1, nodes representing two students should be connected if both students took a Data Mining/Machine Learning course). Note: you need to convert row data to graph format edge list, or matrix; .csv or .txt format should be fine. Make a screenshot of your visualized networks.

Problem 2. [Visualizing a Weighted Network]
Visualize a network obtained by projecting the 3-layer network from Problem 1 to a single-layer weighted network, where two students are linked by a weighted edge representing the number of topics both students took.

Problem 3. [Visualization of a Bipartite Network]
Visualize a bipartite students-topics network where an edge between a student node and a topic node exists if and only if this student has taken a course on that topic.

Problem 4. [Computing Global Network Properties]
For each layer of the 3-layer network constructed in Problem 1, compute the following global network properties:
  1. the size and diameter of the network’s largest connected component
  2. degree distribution (you can report average degree distribution or plot degree distribution histogram)
  3. average path length
  4. average clustering coefficient
Repeat this for the network constructed in Problem 2.

Impotent Notes:
  • Before importing the student network, you must transform the row data into a supported graph format. Check how to convert row data to graph input in the library/software you plan to use.
  • Visualizing a multilayer network is different from a single-layer network. To visualize a Multilayer network, make sure to use a supporting library for example, Pymnet and Multinetx (check Syllabus- Software section for more libraries)
  • It is highly recommended to use the Python library NetworkX. Check sections 'Creating a graph' and 'Drawing graphs' to see how to visualize a graph. You can also use Gephi, a platform-free software for graph visualization and analysis, which you can download from https://gephi.org/ (follow the quick start guide) to import and visualize the student network.
  • All metrics can be calculated using the Python library NetworkX, but you can use any freely available packages to compute these properties or develop your code.

Assignment 2

Out: January 22
Due: January 30 by 5:00pm on Canvas
*Please write your name and TUID at the top of your CANVAS submission

Problem 1.
Download Amazon product dataset from http://snap.stanford.edu/data/com-Amazon.html, and compute:
  1. the size of the network largest connected component,
  2. the number of connected components,
  3. degree distribution,
  4. path length and
  5. clustering coefficient.

All metrics can be calculated using Python library NetworkX, but you can use any specialized freely available packages to compute these properties, or you develop your own code. Use visual displays (graphs/plots generated in a software of your choice, e.g. gnuplot) for a clear presentation.

Assignment 3

Out: January 30
Due: February 6 by 5:00pm on Canvas
*Please write your name and TUID at the top of your CANVAS submission

Problem 1.
In an Erdős–Rényi graph with N=4000 nodes, the linking probability is p=0.001
  1. What is the average degree of a node in this graph?
  2. What is the variance in the degrees of the nodes?
  3. What is the expected number of nodes with a degree which is at least twice larger than the average degree?

Problem 2.
Consider Gn,p, an Erdös–Rényi random graph with n nodes, m edges, and mean degree c:
  1. Compute the probability p of creating an edge in Gn,p.
  2. Show that in the limit (large n) the expected number of triangles in Gn,p is 1/6 · c3

Assignment 4

Out: February 13
Due: February 20 by 5:00pm on Canvas
*Please write your name and TUID at the top of your CANVAS submission

Problem 1.
Compute the expected maximum degree for the Movie Actors and for Citation network listed in this table
Network Size (k) k out in
www 325 729 4.51 900 2.45 2.1
www 4×107 7 2.38 2.1
www 2×108 7.5 4000 2.72 2.1
www, site 260 000 1.94
Internet, domain* 3015–4389 3.42–3.76 30–40 2.1–2.2 2.1–2.2
Internet, Router* 3888 2.57 30 2.48 2.48
Internet, Router* 150 000 2.66 60 2.4 2.4
Movie actors* 212 250 28.78 900 2.3 2.3
Co-authors, SPIRES* 56 627 173 1100 1.2 1.2
Co-authors, neuro.* 209 293 11.54 400 2.1 2.1
Co-authors, math.* 70 975 3.9 120 2.5 2.5
Sexual contacts* 2810 3.4 3.4
Metabolic, E. coli 778 7.4 110 2.2 2.2
Protein, S. cerev.* 1870 2.39 2.4 2.4
Ythan estuary* 134 8.7 35 1.05 1.05
Silwood Park* 154 4.75 27 1.13 1.13
Citation 783 339 8.57 3 3
Phone call 53×106 3.16 2.1 2.1
Words, co-occurrence* 460 902 70.13 2.7 2.7
Words, synonyms* 22 311 13.48 2.8 2.8

Problem 2.
In the small-world network model, each of n nodes on a regular ring lattice is connected to c of its closest neighbors on the lattice (where c is even), and each of the edges of the lattice is rewired with probability p; by rewired, we mean that an edge on the ring is selected (with probability p), removed from the ring, and replaced with an edge that “crosses the ring” by joining two vertices chosen uniformly at random; such randomly placed edges are commonly referred to as shortcuts.
  • What is the number of shortcut edges in a small-world network with n nodes and average degree of c?
  • What is the number of shortcut ends?
  • In the model described above, it is possible for a vertex to become disconnected from the rest of the network by the rewiring process, e.g., if all of the edges indecent to the vertex are rewired and no shortcut ends fall at the vertex. Show that the probability of this happening to a given vertex is [pe-p]c.

    (Hint: Recall that \( \lim_{n \to \infty} \left( 1 - \frac{1}{n} \right)^n = \frac{1}{e} \))

Problem 3.
Solve exercise 3, part A at the end of chapter 5.6 in Easley and Kleinberg textbook (pages 133-134).
Hint: You can find the page here: https://www.cs.cornell.edu/home/kleinber/networks-book/networks-book-ch05.pdf

Assignment 5

Out: February 20
Due: February 27 by 5:00pm on Canvas
*Please write your name and TUID at the top of your CANVAS submission

Problem 1.
Generate a network of 10,000 nodes using the Preferential Attachment algorithm that adds nodes 1 at a time, each with edges to m of the previous existing nodes (find code at stack overflow or at GitHub or write you own). Then solve the following 3 tasks:
  1. Plot on log-log scale the degree distribution at intermediate steps for networks of 100, 1,000 and 10,000 nodes.
  2. Compute the average clustering coefficient as a function of the number of nodes on these networks.
  3. Measure the degree dynamics of one of the initial nodes and of the nodes added to the network at time t=100, t=1,000 and t=5,000.

Problem 2.
Solve exercise 2 at the end of chapter 3 in the Easley and Kleinberg textbook (page 83)
Hint: The paper number for this problem in a hard copy textbook is 74-75, but this is page 83 at the PDF (section 3.7 Exercises).

Assignment 6

Out: TBD
Due: TBD by 5:00pm on Canvas
*Please write your name and TUID at the top of your CANVAS submission