MP1 Release

c99b9332 · chrisjuniorli · f847dc30 · c99b9332 · c99b9332 · c99b9332
Commit c99b9332 authored 2 years ago by chrisjuniorli
--- a/MP1/README.md
+++ b/MP1/README.md
+## ECE549 / CS543 Computer Vision, Spring 2023, Assignment 1
+### Instructions
+1.  Assignment is due at **11:59:59 PM on Wednesday Feb 15, 2023**.
+2.  See [policies](http://saurabhg.web.illinois.edu/teaching/ece549/sp2023/policies.html)
+    on [class website](http://saurabhg.web.illinois.edu/teaching/ece549/sp2023/).
+3.  Submission instructions:
+    1.  A single `.pdf` report that contains your work for Q1, Q2, and Q3.  You
+        can either type out your responses in LaTeX, or any other word processing
+        software.  You can also type your work in Markdown using the template file
+        in `README.md`. In that case, please enter your work under the `Answer:`
+        section of each problem.  You can also hand write them on a tablet, or scan
+        in hand-written answers. If you hand-write, please make sure they are neat
+        and legible.  If you are scanning, make sure that the scans are legible.
+        Lastly, convert your work into a `PDF`.
+        PDF file should be submitted to [Gradescope](https://www.gradescope.com)
+        under `MP1`. Course code is **3J2KWY**. Please make sure you tag the
+        reponses in your PDF with the Gradescope questions outline as described in
+        [Submitting an Assignment](https://youtu.be/u-pK4GzpId0).  
+    2.  For Q4 we are using **gradescope autograder** for testing your code.
+    For this to work, you will need to submit the code according to the
+    following instructions:
+        - Code should be submitted to [Gradescope](https://www.gradescope.com)
+        under `MP1-code`, you will need to submit the *python* file: `render_image.py`.
+        - Do not compress the files into `.zip` as this will not work.
+        - Do not change the provided files names nor the names of the functions but
+        rather change the code inside the provided functions and add new functions.
+        Also, make sure that the inputs and outputs of the provided functions are
+        not changed.
+        - The autograder will give you feedback on how well your code did.
+        - The autograder is configured with the following python libraries only: 
+            - numpy
+            - matplotlib
+    3. You will also submit code for Q3. Include it as `.py` files in the code
+       you upload to `MP1-code`. This code won't be autograded but failure to
+       include this code will lead to zero credit on Q3.
+    4.  We reserve the right to take off points for not following
+        submission instructions. In particular, please tag the responses in your
+        PDF with the Gradescope questions outline as described in
+        [Submitting an Assignment](https://youtu.be/u-pK4GzpId0)).
+4.  Lastly, be careful not to work of a public fork of this repo. Make a
+    private clone to work on your assignment. You are responsible for
+    preventing other students from copying your work. Please also see point 2
+    above.
+### Problems
+1. **Comparing Heights [7 pts]**
+   Consider the image of person A and person B standing on the ground plane as
+   captured by a perspective camera of focal length $f$. $H$ is the horizon
+   line ($y=0$). $a_1$ is the distance between A's toes and the horizon, and
+   $a_2$ is the distance between A's head and the horizon in the unit of
+   pixels. Similarly for the person B. Suppose A's height is $h$ feet.
+   <div align="center">
+   <br/>
+   <br/>
+   <img src="./q1/height.png" width="30%">
+   <br/>
+   <br/>
+   </div>
+   1. **[1 pt]** From the picture, is it possible to determine who is taller in the real world? If so, who is taller A or B? Show your work.
+       *Answer*:
+   2. **[6 pts]** Give expressions for the following quantities in terms of $f, a_1, a_2, b_1, b_2, h$. Show your work.
+       1. **[2 pts]** How many feet above the ground is the camera?
+          *Answer*:
+       2. **[2 pts]** What is the height of person B in feet? 
+          *Answer*:
+       3. **[2 pts]** Distance (along z-axis) from the camera to person B (in feet)?
+          *Answer*:
+2. **Rectangle and Cylinder under Perspective Projection [12 pts]**
+   1.  **[4 pts]** Suppose a rectangle of width $L$ is moving along the X axis.
+   How does its width $l$ on the image plan change _w.r.t._ its distance $d$ to
+   the origin? Recall that, under perspective projection a point $(X,Y,Z)$ in
+   3D space maps to $(f\frac{X}{Z}, f\frac{Y}{Z})$ in the image, where $f$ is
+   the distance of the image plane from the pinhole.
+       <div align="center">
+       <img src="./q2/rectangle.png" width="50%">
+       </div>
+       *Answer*:
+   2. **[8 pts]** What if we replace the rectangle with a cylinder of radius
+   $r$ on the X axis, how does its width $l$ on the image plane change _w.r.t._
+   its distance $d$ to the origin? Show your work, and try to simplify the
+   final result as much as possible. We won't take points off if your answer is
+   correct and complete, but is only missing algebraic simplifications.
+       <div align="center">
+       <img src="./q2/cylinder.png" width="50%">
+       </div>
+       *Answer*:
+3. **Dynamic Perspective [8 pts]**. In this question, we will simulate optical flow induced on the image of a static scene due to camera motion.
+You can review these concepts from [lecture 5](http://saurabhg.web.illinois.edu/teaching/ece549/sp2023/slides/lec05_dynamic_perspective.pdf). For this problem (including both 3.1 and 3.2), you can assume that the X-axis points to the right, Y-axis points down and Z-axis points into the scene.
+    1. **[4 pts]** Assuming a camera moving with translation velocity of $t$ and angular velocity of $\omega$. Derive the equation that governs the
+    optical flow at a pixel $(x, y)$ in terms of the focal length $f$, and the depth of the point $Z(x, y)$. Note that a point $(X, Y, Z)$ 
+    in the world projects to $(f\frac{X}{Z}, f\frac{Y}{Z})$ in the image.
+       *Answer*:
+    2. **[4 pts]** Next, we will try to visualize the optical flow induced on
+    the image of a static scene due to camera motion. We will build off
+    starter code in [dynamic_perspective_starter.py](./q3/dynamic_perspective_starter.py).
+    We have implemented two simple scenes, that of a vertical wall directly
+    in front of the camera (`get_wall_z_image`), and that of a camera overlooking
+    a horizontal plane (`get_road_z_image`). Your task is to use these scenes
+    to visualize the induced optical flow for the following situations:
+        1. Looking forward on a horizontal plane while driving on a flat road.
+        2. Sitting in a train and looking out over a flat field from a side window.
+        3. Flying into a wall head-on.
+        4. Flying into a wall but also translating horizontally, and vertically.
+        5. Rotating in front of a wall about the Y-axis.
+       You should pick the appropriate scene, $t$ and $w$, and visualize the
+    induced optical flow (you can use the `plot_optical_flow` function).  Note
+    that the origin (for the perspective projection equations) is at the center
+    of the camera, and this is different from the origin for images in numpy.
+    As an example, we have provided the output for the first case. 
+       1. Looking forward on a horizontal plane while driving on a flat road.
+           <div align="center">
+           <img src="./q3/q1_ans.png" width="50%">
+           </div>
+       Include the generated optical flow for the remaining cases below.
+       2. **[1 pts]** Sitting in a train and looking out over a flat field from a side window.
+          *Answer*:
+       3. **[1 pts]** Flying into a wall head-on.
+          *Answer*:
+       4. **[1 pts]** Flying into a wall but also translating horizontally, and vertically.
+          *Answer*:
+       5. **[1 pts]** Rotating in front of a wall about the Y-axis.
+          *Answer*:
+4. **Phong Shading Model [16 pts]**. In this problem, we will take a closer
+    look at different types of surfaces and their appearance under varying
+    lighting and viewing conditions. We will work with the 
+    ambient + lambertian + specular model for image formation (see Section 2.2,
+    Equation 2.94 in [Szeliski](https://szeliski.org/Book/). In particular,
+    we will work with the following equation for the intensity at a given pixel $x$, 
+    ```math
+    I(x) =  \text{Ambient Term} + \text{Diffuse Term} + \text{Specular Term} \\
+    I(x) = k_a L_a + k_d \sum_i L_i [\hat{v}_i \cdot \hat{n}]^{+} + k_s \sum_i L_i ([\hat{v}_r \cdot \hat{s}_i]^{+})^{k_e}
+    ```
+    Here,
+    - The ambient term, is simply the ambient reflection coefficient, $k_a$, times the ambient light, $L_a$.
+    - The diffuse term, assumes that the surface is lambertian, that is, it reflects incoming light, $L_i$ multiplied by the diffuse reflection coefficient $k_d$, equally in all directions. However, we need to pay attention to the amount of light that is coming in.  It depends on the angle at which light is incident onto the surface. It is given by the dot product $\hat{v}_i \cdot \hat{n}$ between the surface normal at the point $\hat{n}$, and the direction from which light is incident $\hat{v}_i$. $[\cdot]^{+}$ denotes the $\max$ with $0$.
+    - For the specular term, the light gets reflected preferentially in directions close to the actual direction of reflection. In particular, we will use a dependence of the form $([\hat{v}_r \cdot \hat{s}_i]^{+})^{k_e}$, where $\hat{s}_i$ is the direction of reflection, $\hat{v}_r$ is the viewing direction, and $k_e$ is the shininess coefficient.
+    - Vectors $\hat{n}$, $\hat{v}_i$, $\hat{v}_r$ and $\hat{s}_i$ are illustrated below for reference 
+        <div align="center">
+        <img src="./q4/vector.png" width="50%">
+        </div>
+    - We are going to ignore shadows and inter-reflections: 
+        - As long as the surface is facing the light source, we will assume that the surface will receive light from the light source.
+        - A surface only receives light directly from the point / directional light sources, or the ambient light.
+    - Lastly, we are also going to ignore the $1/r^2$ attenuation for point light sources.
+    As part of this problem, we will simulate these three terms and use it to render a simple scene. We will provide the per-pixel scene depth, surface normal, and the different coefficients $k_a$, $k_d (=k_a)$ and $k_s$; as well as the strength and locations of the various lights in the scene. Your task is to compute the image based on this information using the Phong Shading model described above.
+    We have provided a scene with a sphere in front of a wall. You can access this scene using the `get_ball` function from the file  [generate_scene.py](./q4/generate_scene.py). It returns the per-pixel depth, surface normal and $k_a$, $k_d$ and $k_s$ for the scene, as visualized below (you can assume a reasonable value for $k_e$ (say 50)):
+        <div align="center">
+        <img src="./q4/input.png" width="90%">
+        </div>
+    We have also provided some starter code in  [render_image.py](./q4/render_image.py) that sets up the different test cases (positions of lights). Your task is to fill in the `render` function that implements the Phong shading model as described above. An example rendering that your code will produce is shown below.
+        <div align="center">
+        <img src="./q4/output.png" width="25%">
+        </div>
--- a/MP1/q1/height.png
+++ b/MP1/q1/height.png
--- a/MP1/q2/cylinder.png
+++ b/MP1/q2/cylinder.png
--- a/MP1/q2/rectangle.png
+++ b/MP1/q2/rectangle.png
--- a/MP1/q3/dynamic_perspective_starter.py
+++ b/MP1/q3/dynamic_perspective_starter.py
+import numpy as np
+import matplotlib.pyplot as plt
+def get_wall_z_image(Z_val, fx, fy, cx, cy, szx, szy):
+    Z = Z_val*np.ones((szy, szx), dtype=np.float32)
+    return Z
+def get_road_z_image(H_val, fx, fy, cx, cy, szx, szy):
+    y = np.arange(szy).reshape(-1,1)*1.
+    y = np.tile(y, (1, szx))
+    Z = np.zeros((szy, szx), dtype=np.float32)
+    Z[y > cy] = H_val*fy / (y[y>cy]-cy)
+    Z[y <= cy] = np.NaN
+    return Z
+def plot_optical_flow(ax, Z, u, v, cx, cy, szx, szy, s=16):
+    # Here is a function for plotting the optical flow. Feel free to modify this 
+    # function to work well with your inputs, for example if your predictions are
+    # in a different coordinate frame, etc.
+    x, y = np.meshgrid(np.arange(szx), np.arange(szy))
+    ax.imshow(Z, alpha=0.5, origin='upper')
+    # ax.quiver is to used to plot a 2D field of arrows. 
+    # please check https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.quiver.html to
+    # understand the detials of the plotting with ax.quiver
+    q = ax.quiver(x[::s,::s], y[::s,::s], u[::s,::s], v[::s, ::s], angles='xy')
+    ax.axvline(cx)
+    ax.axhline(cy)
+    ax.set_xlim([0, szx])
+    ax.set_ylim([szy, 0])
+    ax.axis('equal')
+if __name__ == "__main__":
+    # Focal length along X and Y axis. In class we assumed the same focal length 
+    # for X and Y axis. but in general they could be different. We are denoting 
+    # these by fx and fy, and assume that they are the same for the purpose of
+    # this MP.
+    fx = fy = 128.
+    # Size of the image
+    szy = 256
+    szx = 384
+    # Center of the image. We are going to assume that the principal point is at 
+    # the center of the image.
+    cx = 192
+    cy = 128
+    # Gets the image of a wall 2m in front of the camera.
+    Z1 = get_wall_z_image(2., fx, fy, cx, cy, szx, szy)
+    # Gets the image of the ground plane that is 3m below the camera.
+    Z2 = get_road_z_image(3., fx, fy, cx, cy, szx, szy)
+    fig, (ax1, ax2) = plt.subplots(1,2, figsize=(14,7))
+    ax1.imshow(Z1)
+    ax2.imshow(Z2)
+    # Plotting function.
+    f = plt.figure(figsize=(13.5,9))
+    u = np.random.rand(*Z1.shape)
+    v = np.random.rand(*Z1.shape)
+    plot_optical_flow(f.gca(), Z1, u, v, cx, cy, szx, szy, s=16)
+    f.savefig('optical_flow_output.png', bbox_inches='tight')
--- a/MP1/q3/q1_ans.png
+++ b/MP1/q3/q1_ans.png
--- a/MP1/q4/generate_scene.py
+++ b/MP1/q4/generate_scene.py
+import numpy as np
+import os
+import matplotlib.pyplot as plt
+import warnings
+def z_wall(z=4, f=128, h=256, w=384, cx=192, cy=128):
+  x, y = np.meshgrid(np.arange(w), np.arange(h))
+  x = x - cx
+  y = y - cy
+  Z = depth = x - x + z
+  N = np.zeros((h, w, 3), dtype=np.float64)
+  N[:,:,2] = -1.
+  A = np.mod(x // 64 + y // 64, 2) * 0.5 + 0.1
+  A[:] = 0.4
+  S = A - A
+  return Z, N, A, S
+def sphere(r=1.2, z=2, specular=False, f=128, h=256, w=384, cx=192, cy=128):
+  # Let's assume that camera is at origin looking in the +Z direction. X axis
+  # to the right, Y axis downwards.
+  # Let's assume that the sphere is at (0, 0, z) and of radius r.
+  x, y = np.meshgrid(np.arange(w), np.arange(h))
+  x = x - cx
+  y = y - cy
+  phi = np.arccos(f / np.sqrt(x**2 + y**2 + f**2))
+  # equation is rho*2 (1+tan^2(phi)) - 2 rho * z + z*z - r*r = 0
+  a = np.tan(phi)**2 + 1
+  b = -2 * z
+  c = z**2 - r**2
+  with warnings.catch_warnings():
+    warnings.simplefilter("ignore")
+    rho = -b - np.sqrt(b**2 - 4*a*c) 
+  rho = rho / 2 / a
+  depth = rho
+  # Calculate the point on the sphere surface
+  X = x * depth / f
+  Y = y * depth / f
+  Z = depth
+  N = np.array([X, Y, Z])
+  N = np.transpose(N, [1,2,0])
+  N = N - np.array([[[0, 0, z]]])
+  N = N / np.linalg.norm(N, axis=2, keepdims=True)
+  Z[np.isnan(Z)] = np.inf
+  # Compute k_d, k_a 
+  with warnings.catch_warnings():
+    warnings.simplefilter("ignore")
+    A = (np.sign(X*Y) > 0) * 0.5 + 0.1
+  # Set specularity
+  if specular:
+    S = np.invert(np.isinf(depth)) + 0
+  else:
+    S = A-A
+  return Z, N, A, S
+def compose(Z1, N1, A1, S1, Z2, N2, A2, S2):
+  ind = np.argmin(np.array([Z1, Z2]), axis=0)
+  Z = Z1 + 0
+  Z[ind == 1] = Z2[ind == 1]
+  A = A1 + 0
+  A[ind == 1] = A2[ind == 1]
+  S = S1 + 0
+  S[ind == 1] = S2[ind == 1]
+  N = N1 + 0
+  ind = np.repeat(ind[:, :, np.newaxis], 3, axis=2)
+  N[ind == 1] = N2[ind == 1]
+  return Z, N, A, S
+def get_ball(specular):
+  Z1, N1, A1, S1 = sphere(specular=specular)
+  Z2, N2, A2, S2 = z_wall()
+  Z, N, A, S = compose(Z1, N1, A1, S1, Z2, N2, A2, S2)
+  return Z, N, A, S
+if __name__ == '__main__':
+  Z1, N1, A1, S1 = sphere(specular=True)
+  Z2, N2, A2, S2 = z_wall()
+  Z, N, A, S = compose(Z1, N1, A1, S1, Z2, N2, A2, S2)
+  fig = plt.figure(constrained_layout=False, figsize=(20, 5))
+  gs1 = fig.add_gridspec(nrows=1, ncols=4, left=0.05, right=0.95, wspace=0.05, top=0.95, bottom=0.05)
+  axes = []
+  for i in range(4):
+    axes.append(fig.add_subplot(gs1[0,i]))
+    axes[-1].set_axis_off()
+  axes[0].imshow(Z, cmap='gray', vmin=0, vmax=5)
+  axes[0].set_title('Depth, Z')
+  # visualizing surface normals using a commonly used color map.
+  N[:,:,0] = -N[:,:,0]
+  axes[1].imshow((-N[:,:,[0,1,2]]+1)/2.)
+  axes[1].set_title('Surface normal, N')
+  axes[2].imshow(A, cmap='gray', vmin=0, vmax=1)
+  axes[2].set_title('Ambient reflection coefficient, $k_a =$ Diffuse reflection coefficient, $k_d$')
+  axes[3].imshow(S, cmap='gray', vmin=0, vmax=1)
+  axes[3].set_title('Specular reflection coefficient, $k_s$')
+  plt.savefig('input.png', bbox_inches='tight')
--- a/MP1/q4/input.png
+++ b/MP1/q4/input.png
--- a/MP1/q4/output.png
+++ b/MP1/q4/output.png
--- a/MP1/q4/render_image.py
+++ b/MP1/q4/render_image.py
+import numpy as np
+from generate_scene import get_ball
+import matplotlib.pyplot as plt
+# specular exponent
+k_e = 50
+def render(Z, N, A, S, 
+           point_light_loc, point_light_strength, 
+           directional_light_dirn, directional_light_strength,
+           ambient_light, k_e):
+  # To render the images you will need the camera parameters, you can assume
+  # the following parameters. (cx, cy) denote the center of the image (point
+  # where the optical axis intersects with the image, f is the focal length.
+  # These parameters along with the depth image will be useful for you to
+  # estimate the 3D points on the surface of the sphere for computing the
+  # angles between the different directions.
+  h, w = A.shape
+  cx, cy = w / 2, h /2
+  f = 128.
+  # Ambient Term
+  I = A * ambient_light
+  # Diffuse Term
+  # Specular Term
+  I = np.minimum(I, 1)*255
+  I = I.astype(np.uint8)
+  I = np.repeat(I[:,:,np.newaxis], 3, axis=2)
+  return I
+def main():
+  for specular in [True, False]:
+    # get_ball function returns:
+    # - Z (depth image: distance to scene point from camera center, along the
+    # Z-axis)
+    # - N is the per pixel surface normals (N[:,:,0] component along X-axis
+    # (pointing right), N[:,:,1] component along Y-axis (pointing down),
+    # N[:,:,2] component along Z-axis (pointing into the scene)),
+    # - A is the per pixel ambient and diffuse reflection coefficient per pixel,
+    # - S is the per pixel specular reflection coefficient.
+    Z, N, A, S = get_ball(specular=specular)
+    # Strength of the ambient light.
+    ambient_light = 0.5
+    # For the following code, you can assume that the point sources are located
+    # at point_light_loc and have a strength of point_light_strength. For the
+    # directional light sources, you can assume that the light is coming _from_
+    # the direction indicated by directional_light_dirn (i.e., \hat{v}_i =
+    # directional_light_dirn), and with strength directional_light_strength.
+    # The coordinate frame is centered at the camera, X axis points to the
+    # right, Y-axis point down, and Z-axis points into the scene.
+    # Case I: No directional light, only point light source that moves around
+    # the object. 
+    point_light_strength = [1.5]
+    directional_light_dirn = [[1, 0, 0]]
+    directional_light_strength = [0.0]
+    fig, axes = plt.subplots(4, 4, figsize=(15,10))
+    axes = axes.ravel()[::-1].tolist()
+    for theta in np.linspace(0, np.pi*2, 16): 
+      point_light_loc = [[10*np.cos(theta), 10*np.sin(theta), -3]]
+      I = render(Z, N, A, S, point_light_loc, point_light_strength, 
+                 directional_light_dirn, directional_light_strength,
+                 ambient_light, k_e)
+      ax = axes.pop()
+      ax.imshow(I)
+      ax.set_axis_off()
+    plt.savefig(f'specular{specular:d}_move_point.png', bbox_inches='tight')
+    plt.close()
+    # Case II: No point source, just a directional light source that moves
+    # around the object.
+    point_light_loc = [[0, -10, 2]]
+    point_light_strength = [0.0]
+    directional_light_strength = [2.5]
+    fig, axes = plt.subplots(4, 4, figsize=(15,10))
+    axes = axes.ravel()[::-1].tolist()
+    for theta in np.linspace(0, np.pi*2, 16): 
+      directional_light_dirn = [np.array([np.cos(theta), np.sin(theta), .1])]
+      directional_light_dirn[0] = \
+        directional_light_dirn[0] / np.linalg.norm(directional_light_dirn[0])
+      I = render(Z, N, A, S, point_light_loc, point_light_strength, 
+                 directional_light_dirn, directional_light_strength,
+                 ambient_light, k_e) 
+      ax = axes.pop()
+      ax.imshow(I)
+      ax.set_axis_off()
+    plt.savefig(f'specular{specular:d}_move_direction.png', bbox_inches='tight')
+    plt.close()
+if __name__ == '__main__':
+  main()
--- a/MP1/q4/vector.png
+++ b/MP1/q4/vector.png
--- a/MP1/requirements.txt
+++ b/MP1/requirements.txt
+numpy
+matplotlib
--- a/README.md
+++ b/README.md
 ## ECE549/CS543 (Computer Vision)
 1. [Programming Assignment 0](./MP0). Due at **11:59:59 PM on Wednesday Feb 1, 2023**.
+1. [Programming Assignment 1](./MP1). Due at **11:59:59 PM on Wednesday Feb 15, 2023**.
\ No newline at end of file