Skip to content

Here’s What No One Tells You About Computer Vision.

When we talk about Machine Learning, we have to talk about Computer Vision! It’s a major subfield of Machine Learning and is used in many different ways and areas. When you use filters in Snapchat every day; well that’s computer vision. When Facebook tells you that someone has posted a photo of you; well that’s also computer vision. When we make robotics or classifications on images or self-driving cars; well that’s all computer vision. And so many more. But how can the computer do so? How could it tell that this is your face, not your cat’s? You’re about to know!

How does computer vision work?

It all starts with an image specifically, a digital one. We transform it, remove noise, refine the quality, and so on. We call that preprocessing. We need it to get the images ready for machine learning. But before we dig deeper, let’s first know how the computer handles images.
An image in the digital form is a sequence of pixels as we say. Each pixel has a numerical value representing the color of that pixel. These pixels are saved as a 2-D matrix with the same height and width of the image. We take this matrix and perform mathematical operations and transformations on it. This is the step we call image processing.
After that, comes a step where we use these operated images to extract information and make decisions based on them. Actions like extracting objects, detecting faces, and classifying the image whether this is an image of a sea view or a countryside or whatever it is.

What is computer vision?

For this article, We will be using mahotas which is an open-source library for computer vision and image processing in python. It has many functions that vary in complexity and supports a wide range of image types. For more information about it, you can visit the official website:

Before starting, You will need to install it using PIP or Anaconda if you prefer. Also, make sure that pillow & NumPy are both installed and up to date for the library to work properly.

pip install mahotas
pip install pillow --upgrade
pip install numpy --upgrade

Loading images for our computer vision tutorial

# Load important libraries
import mahotas as mh
import numpy as np
import matplotlib.pyplot as plt
# For loading the image
# We'll be using mahotas.imread() which takes the path/name of the image
# and returns a numpy array representing this image
img = mh.imread('dog.jpg')
# Shape of the variable img is (hight, width, channels)
# So, it's a 3D numpy array where first level contains rows, 
# second contains pixels of this row, 
# third contains the RGB values for this specific pixel
# this is how the first row looks like 
array([[ 94, 128,  52],
       [ 94, 128,  52],
       [ 95, 129,  53],
       [ 34,  66,  43],
       [ 34,  66,  43],
       [ 33,  65,  42]], dtype=uint8)
# We can see that the first pixel contains RGB values 
#   (94 for red, 128 for green, 52 for blue)

# to show the image on the screen, we'll be using 
#  pyplot.imshow() 
# which takes the numpy array representing the image
# and render it (draw it) to the screen

from matplotlib import pyplot as plt
plt.imshow(img) # draws into a figure # shows the figure

Basic Image processing for computer vision

Now we can load and show an image from the disk. But we want to make some “processing” to it too.

As we got the array representing the image, we can make any modifications and filters we want to this image.

Let’s now look at some examples


It’s one of the basic and simple operations, which binarizes the image; meaning it takes the image and converts it to just a black and white image (0 and 1).

To do so, we first convert the image into grayscale (no RGB). Then we define a threshold, which is a number where above it all will be white and below all will be black.

Let’s start by a threshold of 128 and see how it would look like.

# First, converting the image into grayscale

# To do so, We get the average of all channels (RGB)
# and put it in just one channel

# But we won't bother ourselves with these computations,
# We'll use the function mahotas.colors.rgb2gray()
# It takes the image and dtype you want the channel to be
# we want the values to be integers with just 8 bits,
# so we'll use np.uint8 (which stands for unsigned integer with 8 bits)
grayscale_img = mh.colors.rgb2gray(img, dtype=np.uint8)

# then we will notify our plt library that we want a grayscale image to be rendered
# this is because otherwise, by default the plt library will treat the single-channel image
# as false color image where it uses red for high values and blue for low ones

# let's now see the resulting image
# Pretty cool huh!
# We have got our grayscale image successfully
# now we need to transform it into black and white using the threshold we talked about
threshold = 128

# So, we want the pixel to be 1 if it's bigger than 128
# and 0 otherwise
# Well we can use a nice condition for this
binarized_img = grayscale_img > threshold

# Well, let's see how we have done
# Well.... 
# We got a black-and-white image indeed, but now the image is not very clear
# The dog body is not well defined
# So did we use the wrong threshold value?
# Let's increase it a little bit and try again
threshold = 196

binarized_img = grayscale_img > threshold

# Well... 
# I guess that's better and more clear but the dog is so dark now!
# but the question here is how to identify which is the right threshold value to use?
# Okay, mahotas library has an answer for us
# it looks through the image and identify the threshold value dynamically
# let's try it..

threshold = mh.thresholding.otsu(grayscale_img)
# So it thinks that 154 is the right threshold for this image
# let's see how it looks..

binarized_img = grayscale_img > threshold

# I guess that is the best we can get
# now the dog is clear and not too white or too dark
# great work so far!

# But you might be wondering, why to use thresholding?
# Well, this technique is used in segmentation, 
# where you need to divide the image into different regions or objects
# more on that later.

Gaussian Blurring

You might be thinking, why on earth would I blur an image!!

But it’s a very useful filter to be made on the image as a preprocessing step for reducing noise or smoothing the image. You also use blurring to blur parts of the image so the other parts stand out.

Let’s see how that works using mahotas..

# We will use the function mahotas.gaussian_filter()
# which takes 2 arguments: the single-image 
# and the size of the filter (the standard deviation ofthe filter)
# of course larger values will output more blurring
# Let's see how that goes..

blurred_img_8 = mh.gaussian_filter(grayscale_img, 8)
blurred_img_16 = mh.gaussian_filter(grayscale_img, 16)
blurred_img_32 = mh.gaussian_filter(grayscale_img, 32)


That’s a pretty clear difference, don’t you think? So now, believe it or not, the image is smoother! Sharp edges or extreme pixels have been reduced And this will be very handy when you preprocess your images for Machine Learning.

Salt and Pepper Noise with Median Filter*

Let’s now try to add salt (white pixels) and pepper (black pixels) to our image. I know you might be wondering why destroying the image quality!? But the thing here is not enhancement actually. We will add those to simulate a kind of noise that happens a lot in image and then we’ll try to fix it and remove this noise.

Beare with me and you’ll get the idea..

# let's first build 2 arrays with the same shape as the image for salt and pepper 
# which have random true values with only 1% of the total image 
salt = np.random.random(grayscale_img.shape) > 0.9
pepper = np.random.random(grayscale_img.shape) > 0.9

# Let's see the first row in salt and pepper
# Okay good!
# We now got what we needed 
# Let's now make a preprocessing operation 
# to make our image better before continuing

# We will stretch the image! 
# What we mean by that is that we will stretch the contrast 
# or the range of pixel values inside our image
# this mean that if for example the pixel values range is just 50-200
# We will stretch it to take the full range of 0-255
# which will make the image colors look better 

# We will use mahotas.stretch() function for this purpose
stretched_gray_img = mh.stretch(grayscale_img)

# Let's see if that did us any good..
# If you look closely, you'll see that the white became lighter
# and the black is now darker
# If you have a very dark (or very bright) image,
# This filter (the contrast stretching) will be a great tool for you
# to have a clearer image.

# Now what were we doing? Oh yess! Adding the salt and pepper
# We've already made the random places to put our noises
# Well, let's now add them..

# Replace all True values for salt into 255 in the image
salt_pepper_img = np.maximum(salt*255, stretched_gray_img)

# Replace all True values for pepper into 0 in the image
# we have to add the original values when pepper is False
# otherwise the image will be all black as the 0 will be minimum always
salt_pepper_img = np.minimum(pepper*0 + salt_pepper_img*(~pepper), salt_pepper_img)

# Let's see the output
# Wow! This is a very noisy image now!

# Okay now what! We have successfully noised our beloved dog...

# Let's try to fix it.
# one of the greatest filters to remove salt and pepper noise is the median filter
# which smoothes the image by calculating the median over a square of pixels around the pixel
# and replace the pixel value with this median
# now extreme pixels (0s or 255s) will be reduced and replaced with a normal value

# Let's use mahotas.median_filter() function 
# which by default make the square of filtering (AKA mask) 3x3

median_filtered_img = mh.median_filter(salt_pepper_img)

# Get ready to be amazed!

As you can see we did a great job and the image now is almost restored as it was.

Putting the center in focus

Remember the blurring filter?

Where we said we can blur some parts of the image to make the others stand out! We will blur the edges of the image so the center can be in focus

Well, let’s do that!

# Let's try first blurring the whole image with gaussian filter
# but this time let's blur each channel individually from the original img
# This is not required but you'll learn some good tricks doing so

# Before anything, return the plt to its defaults'default')

# By default the imread reads the image in BGR format
# So let's separate those channels using numpy
# np.transpose() function changes the channels 
r, g, b = img.transpose([2, 0, 1])

# now let's blur each of them by 16
r8 = mh.gaussian_filter(r, 8)
g8 = mh.gaussian_filter(g, 8)
b8 = mh.gaussian_filter(b, 8)

# finally let's add them back together once again
# with the use of mahotas.as_rgb() function
blurred_img_8 = mh.as_rgb(r8, g8, b8)

# let's see the outcome
# So far so good!

# now we need to join the original and blurred images together
# where it's original in the center and gradually turning to blur 
# as it reaches the edges.

# to do so, we need to put weights for each pixel 
# which represents the distance from the center

# first, let's get the width and height of the image
h, w, _ = img.shape

# let's now initialize the weights arrays for both width and height
# using numpy.mgrid object to initialize the values of the x and y coordinates
X, Y = np.mgrid[:h, :w]

# let's then update it to have the distance from center
X = X - h / 2.0
Y = Y - w / 2.0

# Now, normalize them to be ranging [-1 : 1]
X /= X.max()
Y /= Y.max()

# now let's build an array C which has maximum value at center 
# and fading as it goes to the edges
W = np.exp(-2.0 * (X ** 2 + Y ** 2))

# and normalize it to be [0 : 1]
W -= W.min()
W /= W.ptp()

# then adding a dummy third channel
W = W[:, :, None]

# and finally the moment of truth
# We now combine the blurred and original images 
# with the weights array we have built
center_focus_img = mh.stretch(img * W + (1-W) * blurred_img_8)

# Well, get ready to be amazed

Great work so far!

You now have some good basics in image processing. We recommend that you go play with some images yourself and see what you can make!

Computer Vision

Now, let’s level it up!

Yes, changing the images and filtering them is pretty good and we can use them in so many applications. But we’re here for the data and information. Nowadays, we have got billions of images and these as discussed are pretty huge data.

We want to extract knowledge and information out of them. One of the important things we can do is to classify these images with labels. Image classification is known as Pattern Recognition.

To do so, we need first to extract features from the image so that we can use these features as an input to the classification process. Let’s try it..


Before, Let us discover this vague word ‘features’. We have backgrounds, foregrounds, edges for objects, textures, locations, and brightness. All this is information that we can get from the image. But we need them in a form that is suitable for computers to understand and hence we represent them in metrics or scalars.

You might love this post as well! 10 Python Projects for Beginners.

Edge Detection

We can now say that you know what is a feature and how we represent them. But as you see not all images have feature sets close to each other. Don’t worry we also have a variation in filters and methods to fit with the wide range of image features. One of these techniques is the Sobel Filter. This filter is used to highlight the edges of objects. It would be perfect to detect features for images of text since text has many edges. The filter measures the gradient and the direction of light in an image.

# Mahotas has this functionality in a function called..
# AS you probably guessed sobel
# The function takes the image as a first argument
# another attribute is the 'jsut_filter' we set that to true 
# if we didn't it will apply a threshold to the image

sobel_img = mh.sobel(grayscale_img, just_filter=True)


Techniques for feature extraction are limitless and use case-based. We have introduced one for edge detection. Another common set of features is the Haralick Texture Features. It recognizes similar patterns over the whole image.

Of course, we can smoothly call a function from Mahotas that saves us the effort. (mh.features.haralick(image)

The next important technique is one that works on local regions. It’s Speeded Up Robust Features or SURF for short. features here are computed on small regions instead of the whole image. This makes it more suitable for similar images of different objects. Let’s say we have a car in the desert background and a camel in the same background. This would be a better technique to distinguish them both.

As always Mahotas has this functionality. we call it as:, descriptors_only=True)

A hands on example on computer vision

Now you’re ready to see a real dataset and get your hands dirty. We will apply all these steps to a collection of images to classify them. We have a dataset that consists of images with 4 classes (bees, transportation animals, cars, transportation, and natural scenes). Each image has a natural background and the object itself in the middle. We will classify it using a famous algorithm in classical machine learning called logistic regression.

pip install sklearn
from glob import glob
from mahotas.features import surf
from sklearn.cluster import KMeans
from sklearn.linear_model import LogisticRegression

# let's get our dataset 
# First define the base directory where the dataset is present
basedir = 'AnimTransDistr'

# now, define a classes list and add all 4 classes to it
classes = [

# let's then create a function to read images one by one 
# and return the image along with the class index
def images():
    # Loop over classes and get images inside them
    for class_index, class_name in enumerate(classes):
        # get all image filenames inside that class (folder) with globe()
        images_paths = glob(f'{basedir}\\{class_name}\\*.jpg')
        # now, for each image, open it and yield it
        for image_path in sorted(images_paths):
            img = mh.imread(image_path, as_grey=True)
            yield img, class_index

# Okay let's do the actual feature gathering
# first, initialize arrays for features, descriptors and labels
final_features = []
all_descriptors = []
hara_features = []
labels = []

# for feature extraction, we will be using the surf function
# it's another technique used to extract features 
# this technique works on extracting features of local areas in the image 

# loop over the images 
# using the images() function we created 
for img, class_index in images():
    # get the feature out of this image in a descriptor
    descriptor =, descriptor_only=True)

# get all descriptors into a single array to reduce them and use only every 32nd vector
# using all the output may thow better results but we will do this step for more speed
# we will use K-Means clustering in this process
concatenated = np.concatenate(all_descriptors) 
concatenated = concatenated[::32]

# choose the number of neighbors the (K) and fit the model
k = 256
km = KMeans(k)

# create an empty array to hold the features we collected
surf_features = []
# loop over the whole set of descriptors to predict their clusters and choose where to keep it.
for d in all_descriptors:
    c = km.predict(d)
    # now check each region or group of neighbors 
    surf_features.append([np.sum(c == ci) for ci in range(k)]) 
# add all to the features set
surf_features = np.array(surf_features)

for img, class_index in images():
    # create the haralic features list
    haralic_feature = mh.features.haralick(img).mean(0)
    # update the lists

# finally, turn the two lists into numpy arrays
hara_features = np.array(hara_features)
labels = np.array(labels)

# now, we're ready to train and classify the images
# let's set up the classifier and use a regression model to summarize the data a little bit
C_range = 10.0 ** np.arange(-4, 3)
grid = GridSearchCV(LogisticRegression(), param_grid={'C' : C_range})
clf = Pipeline([('preproc', StandardScaler()),('classifier', grid)])

# combine features
final_features = np.hstack([surf_features, hara_features])

# get the final accuracy of the classifier
score_SURF_global = cross_validation.cross_val_score(clf, final_features, labels, cv=cv).mean()

Great work!

You are now able to work with image datasets, preprocess them, extract features from them, and train a classifier to do image classifications afterward.

This was a basic introduction to the field of computer vision. We hope you can start where we have reached, build upon it, and achieve so much more.

Leave a Reply

Your email address will not be published. Required fields are marked *


Enjoy this blog? Please spread the word :)