CS180 Project 1: Colorizing the Prokudin-Gorskii Photo Collection
Name: Shuai Victor Zhou
SID: 3035912996
Overview
In this project, we use the pictures taken by Prokudin-Gorskii using red, green, and blue filters to produce colored images. The RGB-filtered images are provided to us in the form of a single image, where (roughly) the top third is the blue channel, the middle third is the green channel, and the bottom third is the red channel. Colored images are formed by stacking these three color channels on top of each other, bottom to top being in the order of R-G-B. We perform this stacking by conducting the following steps:
- Crop the original inputted image by cutting away edges on all 4 sides until the average pixel value of each side is no greater than some threshold and no less than some 1 - threshold. In this project, we opt for a threshold of 0.85.
- Cut the image into thirds and assigning the top to be the blue channel, the middle to be the green channel, and the bottom to be the red channel.
- Crop each of the color channels using the same method as step 1.
- For each of the color channels, add random pixels on the edges until they all have the same width and height.
- Conduct the alignment of the color channels. For almost all pictures, we’ll using the blue channel as our stationary channel and move the other two channels to align with it. Smaller pictures will use a single-scale alignment method, while larger pictures will use a pyramid multi-scale alignment method.
- Return the aligned color channels stacked on each other and convert that into an image.
Cropping and Adding Pixels (Bells & Whistles)
For cropping images, we run the following idea:
# cropping images
for side in [left, right, top, bottom]:
while (edgePixelAverage > 0.85 or edgePixelAverage < 0.15): # too bright or too dark
cutOff(edge)
When we crop each of the color channels using the method described above, we risk the possibility of the cropped color channels having different dimensions. To account for this, for add pixels back onto the channels using the following method:
# adding pixels back on
maxWidth = max(r.width, g.width, b.width)
maxHeight = max(r.height, g.height, b.height)
for channel in [r, g, b]:
diffInWidth = maxWidth - channel.width
add halfOfDiffInWidth columns on the left, each element being rand(0, 1)
add halfOfDiffInWidth columns on the right, each element being rand(0, 1)
for channel in [r, g, b]:
diffInHeight = maxHeight - channel.height
add halfOfDiffInHeight columns on the left, each element being rand(0, 1)
add halfOfDiffInHeight columns on the right, each element being rand(0, 1)
Single-Scale Aligning
In single-scale aligning, we align our R and G channels against the B channel by shifting each of R and G up to 15 pixels vertically and horizontally and checking how “similar” the shifted channel is to the B channel. This similar is calculated using a scorer function, which calculates the Euclidean distance between the channels.
We neglect the square root in this scoring function since if some score x is greater than some score y, then sqrt(x) is still greater than sqrt(y) for all nonnegative x, y.
It is worth noting that another version of scoring possible is maximizing the correlation between the two channels, where
Upon attempting all 225 possible shifts for each of the R and G channels and returning the best shift for each of them (the one that produces the smallest Euclidean distance from the B channel), we stack the channels and return the final product as an image.
For tobolsk.jpg, we used 0.9 as the threshold for cropping (i.e. borders with average pixel value greater than 0.9 or less than 0.1 are cropped) since the sky of this picture is quite bright. Using the usual threshold of 0.85 would mess up our result.
Pyramid Multi-Scale Aligning
For multi-scale aligning, the fundamental underlying mechanism is still the same as what we did for single-scale, except we will do it multiple times. To align some movedColor to some setColor, we create an array for each one where element at index i is rescaled down by a factor of 2^i (so the 0th element is just the original image). The smallest image is the largest rescale such that the dimensions of the images are at most 256, with the reasoning being that the smaller .jpg files could be aligned with just searching in a window of [-15, 15] pixels.
Given the arrays, we then work from the smallest image back up to the original image. At each “layer” (as we work right to left in our array), we conduct the same aligning as the single-scale aligning. However, after finding the best values for a and b (a being the vertical shift and b being the horizontal shift), we multiply them by a factor of 2 and apply them to the next biggest layer. We continue this until we have the original image and conduct one more iteration of aligning.
For each layer’s aligning, we don’t always have to search in a window of [-15, 15]. If we enumerate our array 0, 1, etc. starting from the very right (smallest image), then for the ith layer, we search in a window of [np.ceil(-15 / 2^i), np.ceil(15 / 2^i)] pixels. This dynamically changing window allows us to cut down on runtime for larger images.
emir.tif produced a better result when we aligned the R and B channels onto the B channel.
(Additional) Bells & Whistles
*Not really meant for extra credit — just thought it was worth noting.
Another attempt at making images clearer was to add borders of 100 pixels to all 4 sides of our images (can be seen in the Appendix section of the code at the bottom). This was done so that when np.roll was called, pixels on one side don’t just jump to the other, but instead would simply shift a tiny bit. This didn’t help, but perhaps the method could be improved in some other ways so as to create better aligned images (such as generating random pixels instead of all black).
Conclusion
Some images have a bit of a blur, but similar to tobolsk.jpg and emir.tif, they would likely be fully corrected if we made some small alterations to the threshold of cropping or the channel onto which we’re aligning our images.