Skip to content

Build A Search Engine With Python Programming & Computer Science

Lesson 1: Getting started with computer science using Python

introduction to the computer science using Python course

This Computer science and programming using python course will introduce you to the fundamental ideas in computing and teach you to read and write your own computer programs.

We are going to do that in the context of building a web search engine. I’m guessing everyone here has at least used a search engine before. Like Google, DuckDuckGo…

[100% OFF] BUILD A SEARCH ENGINE WITH #PYTHON Free For 3 Days Only Join Now

You type in what you are looking for, and voila – in literally a blink of an eye, about a tenth of a second, back come the results. This might not be enough to make you wise, but it is pretty amazing.

A goal of this class is to turn some of the magic of the search engine into something a bit more understandable. Our biggest goal tho is to learn about Computer science and programming using python.

Computer science is about how to solve problems, like building a search engine, by breaking them into smaller pieces and then precisely and mechanically describing a sequence of steps that you can use to solve each piece. And those steps can be executed by a computer.

For our search engine, the three main pieces are: finding data by crawling web pages, building an index to be able to respond quickly to search queries, and ranking pages so that we get the best result for a given query.

In this course, we will not get into everything that you need to build a search engine as powerful as Google, but we will cover the main ideas and learn a lot about computer science along the way.

The first three units will focus on building the web crawler. We will talk more about that soon. Units 4 and 5 will cover how to respond to queries quickly. And unit 6 will get into how to rank results and cover the method Google uses to rank pages, that made it so successful. But first, let’s talk about how to build a web crawler that we are going to use to get data for our search engine.

Be sure to check my instructor page to get the course as soon as it’s released.

Python Programming For Computer Science: Unit 1 Overview

The goal of the first three units in this course is to build a Web crawler that will collect data from the Web for our search engine. And to learn about big ideas in Computing by doing that.

In Unit 1, we’ll get started by extracting the first link on a web page. A Web crawler finds web pages for our search engine by starting from a “seed” page and following links on that page to find other pages.

Each of those links lead to some new web page, which itself could have links that lead to other pages. As we follow those links, we’ll find more and more web pages building a collection of data that we’ll use for our search engine.

A web page is really just a chunk of text that comes from the Internet into your Web browser. We’ll talk more about how that works in Unit 4. But for now, the important thing to understand is that a link is really just a special kind of text on that web page.

When you click on a link in your browser it will direct you to a new page. And you can keep following those links as a human. What we’ll do in this Unit is write a program to extract that first link from the web page. In later units, we’ll figure out how to extract all the links and build their collection for our search engine

About quizzes in this course

We’re going to have many quizzes throughout each unit. The point of a quiz is to check that you understand what we’ve covered. Some of the quizzes will be fairly straightforward just to see if you’ve followed what we said. Other quizzes will be more challenging and require you to put together several ideas that we’ve covered.

It certainly will be valuable to try to get the answer right the first time. But they shouldn’t be stressful. They’re meant to keep you engaged in the lecture, make sure you’re understanding things.

What is programming?

Programming is really the core computer science. Most machines are designed to do just one thing. A toaster, we can do more than one thing with a toaster. But it’s pretty limited in what it can do. Everything it can do is a variation on this basic functionality that it was designed for: this basic process of putting toast in, heating it up and getting the toast to pop out.

Without a program, a computer is even less useful than a toaster. You can’t do anything without a program. The program is what tells the computer what to do. And the power of the computer is that, unlike a toaster, which is only designed to do a few things, a computer can do anything.

A computer is a universal machine. We can program it to do essentially any computation. So anything that we can imagine, anything that we can figure out how to write a program for, we can make the computer do. And what the program needs to be is a very precise sequence of steps.

The computer by itself doesn’t know how to do anything. It has a few simple instructions that it can execute. And to make a program do something useful we need to put those instructions together in a way that it does what we want.

So we can turn the computer into a web browser, into a server, into a game-playing machine, into a toaster, without anywhere to put the bread. But it can do anything we can imagine: at least any computation we want to do.

The power of the computer is that it can execute the steps super fast. So we can execute billions of instructions in one second.

The program gives us a way to tell the computer what steps to take. So there are many different languages for programming computers. The language we are going to learn in this course is the language called Python, like the snake. It is also named after Monty Python.

My students find this Python beginners Projects helpful.

The important thing about Python is that it gives us a nice high-level language that we can use to write programs. And that means instead of our program running directly on the computer, the programs we write will be an input to the Python program which runs on the computer. What Python is called an interpreter. That means it runs our programs, it interprets them, executes the programs that we wrote in Python language by running a program in a language into the computer can understand directly.

Computer science using Python  
 What is Programming?
What is Programming?

What is a program?

Now it’s time for a quiz to see if understand what a computer program is. So which of the following are computer programs? Check all that apply.

Which of these are computer programs?

 
 
 
 

Question 1 of 1

First Programming Quiz

Now we’re just going to print the number 3, and when we click Run, it will run this code and show us the result down here. And the result of printing 3, we see the output 3. We can do more interesting things. So, we can print an arithmetic expression, so I’ve got 1 plus 1. Now we run this. We see both outputs, so first we printed 3, we see the result of 1 plus 1 is 2. We can write this more clearly by having spaces. We can have spaces between the parts of our expressions. So we can have one plus one with spaces between there. that’s a little easier to read, when we run that, we see the same result. The result is still 2. And we can make more and more complex expressions, so let’s print to result of 52 times You can check that yourself to see that Python got the right answer. We can use parentheses to group expressions. So, if we use parentheses, we can do what we did before, but putting parentheses around grouping the multiplications and grouping the other multiplication. And we run this, we see the same result without the parentheses, that means the same thing as we had when we add the parentheses like this. If we put the parenthesis in different places it means something different. Now what it means is 52 times the result of adding 3 plus 12 which is that we want like this. For example if we wanted to compute the number of seconds in a year, we can compose many multiplications. So we’ll multiply 365 days times 24 hours in a day times 60 minutes in an hour times 60 seconds in a minute. We can do all those multiplications together and we get this result. Which is about 31 and a half million seconds in a year.

So now it’s time for your first programming quiz. You’ve seen enough to be able to write a Python program. And your goal is to write a Python program that prints out the number of minutes there are in seven weeks, which is the amount of time we have for this course. You’ll do that by entering your code in here. And then you can try different things. You can try running the code. See the result. And then when you’ve got an answer, you can click to submit that answer and see if it’s correct.

First Programming Quiz
First Programming Quiz

First Programming Quiz Solution

There are lots of different ways you could have solved this. You need to use the print command to print out the result. And then we want an expression that calculates the number of minutes in seven weeks.

print ( 7 * 7 * 24 * 60 ) 

There are seven weeks, each week has seven days, so we can have seven times seven for the number of days. Then to get the number of minutes, we need to multiply that by 24 to get the number of hours And then multiply again, by 60. That should give us a number of minutes. So let’s see that in the Python interpreter.

First Python Programming Quz Solution
First Python Programming Quz Solution

We see that we have 70,560 minutes. Seems like a lot of time. It’s going to go pretty quickly, and we hope by the end of the seven weeks, all of you will be accomplished Python programmers.

Grammar and Python Rules

In order to learn about programming, we need to learn a new language. This will be a way to describe what we want the computer to do in a much more precise way than we could in a natural language like English.

And it’s a way to describe programs that the Python interpreter can run. One of the best ways to learn a programming language is to just try things.
You can try that in the Python interpreter that’s running in your browser.

Let’s, for example, try running print 2 plus 2 plus. In English, someone could probably guess that the value of 2 plus this, we get an error. And the reason we get an error is that this is not actually part of the Python language.

Python Programming Syntax Error Example

The Python interpreter only knows how to evaluate code that’s part of the Python language. If you try to evaluate something that’s not part of the Python language, it will give you an error.

Errors look a bit scary, the way they print out. But there’s nothing bad that can happen. It’s perfectly okay to try running code. If it produces an error, that’s one of the ways to learn about programming. The error we got here is what’s called a syntax error. That means that what we tried to evaluate is not actually part of the Python language.
Like English, Python has a grammar that defines what strings are in the language. In English, we can make lots of sentences that are not completely grammatical, and people still understand them, but there’s some underlying grammar behind the language.

Those of you who are native English speakers might have learned rules like this in what was once called grammar school. Those of you who learned English as a second language probably learned rules like this when you were learning English.


So, English has a rule that says you can make a sentence by combining a subject with a verb, followed by an object.
Almost every language has a rule sort of like this. The order of the subject and the verb and the object might be different, but there’s a way to combine those three things to form a sentence. The subject could be a noun. The object could also be a noun.

Python Programming - Grammar and Errors
Python Programming – Grammar and Errors

And then each of these parts of speech, well, we have lots of things they could be. So a verb could be the word eat. A verb could also be the word like, and there are lots of other words that the verb could be. A noun could be the word I, a noun could be the word Python, a noun could be the word cookies.

The actual English grammar is of course, much larger and more complex than this. But we can still think of it as having rules like this that allow us to form sentences from the parts of speech that we know, from the words that make those parts of speech.

Python Programming - John Backus (1924-2007)
Python Programming – John Backus (1924-2007)

The way we’re writing grammar here is a notation called Backus-Naur Form. And this was invented by John Backus.
So John Backus was the lead designer of the Fortran programming language back in the 1950s at IBM. This was one of the first widely-used programming languages. And the way they described the Fortran language was with lots of examples and text explaining what they meant. And this is a shot from the actual manual for the first version of Fortran.

Backus-Naur Form
Backus-Naur Form


This works okay, many programmers were able to understand it and guess correctly what it meant but was not nearly precise enough. And when it came time to design a later language, which was the language called ALGOL, it became clear that this informal way of describing languages wasn’t precise enough. And John Backus invented the notation that we’re using here to describe languages.

Python Programming – Grammar Quiz

Python Programming - Grammar Quiz
Python Programming – Grammar Quiz
Python Programming - Grammar Quiz Solution
Python Programming – Grammar Quiz Solution

Python Programming – Grammar Quiz 2

Python Programming - Grammar Quiz 2
Python Programming – Grammar Quiz 2
Python Programming - Grammar Quiz 2 Solution
Python Programming – Grammar Quiz 2 Solution

Python Programming – The Speed Of Light

Before we go on to the next major computer science topic we’re going to introduce. I want to give you one more quiz, to see if you can write a python
expression that’s going to give you some idea how fast a computer executes. So your goal for this quiz is to write some python code that will print out how far light travels in one nanosecond.

Let me give you some information that will help with this. So, the speed of light is 299,792,458 meters per second. So, almost 300 million.
One meter is 100 centimeters. One nanosecond is one billionth of a second, which is 1 divided by 1000 000 000.

Python Programming - The Speed Of Light Quiz
Python Programming – The Speed Of Light Quiz


So, your goal is to compute how far light travels in one nanosecond and to get that answer in centimeters. And I don’t want this to be an algebra quiz. So, all you need to do is multiply these three values together, and you’ll get the answer we want.

Python Programming – The Speed Of Light Quiz Solution

So here’s the Python expression to compute that we’re multiplying the speed of light times 100 cm in a meter, times a second.

You’ll note that I can’t have space in the numbers. It’s convenient when I write out the numbers to put spaces in them, so we can see how big they are.
Python doesn’t allow that, that looks like separate numbers if we have spaces there that wouldn’t be valid in the Python grammar.
So we can’t have the spaces there, so when we run this we get the result 29. That says it’s about 29 centimeters that light travels in one nanosecond.

Python Programming - The Speed Of Light Quiz Solution
Python Programming – The Speed Of Light Quiz Solution

This is a little surprising that is an exact number, and it’s an integer, and the reason it’s an integer is because of the way Python does arithmetic. If all the numbers here are integers, Python will truncate down to that integer. If we want a more accurate result we should turn one of these numbers into a decimal number.

Python Programming – A Word About Processors

So why do we care about how far light can travel in one nanosecond? If you know what kind of processor you have, and if you have a Mac, you can find this by selecting from the Apple menu, About this Mac. For Windows PCs right click on computer in the file explorer then chose properties.

Python Programming - A Word About Processors
Python Programming – A Word About Processors

You’ll see a window like this appear that will tell you what kind of processor you have. And if you zoom that a little bit you can see that we have a 2.7 GHZ Intel Core processor.

What GHz stands for is gigahertz. Which means that we can do 2.7 billion cycles in each second. So that means the time we have for one cycle is actually less than a nanosecond, and you can think of a cycle as the time that the computer has to do one step.
So it does one step 2.7 billion times in a second, that means the time for each cycle is in the time that the computer has for one cycle, light travels 11.1 centimeters.

So how far is that? Let’s have a little scale here. If we have a dollar bill, that’s actually quite a bit longer than 11.1 centimeters. 11.1 centimeters is about three-quarters of the way across the bill.
So within the time light can travel that distance, the computer’s got to finish processing one cycle, finish at least part of an instruction. This should give you some idea of how fast the computer is operating, and this is part
of the reason the processor has to be so small. If the processor was bigger than the processor in the time for one cycle.

Python Programming For Computer Science: Introducing Variables

So our answer to the last quiz would’ve been a lot easier to read and a lot more useful if we used names to keep track of values, instead of writing out those big numbers, especially numbers as big as the speed of light.

Python provides a way to do it. It’s called the Variable. We can use the variable to create a name and use that name to refer to a variable.

So the way to introduce a variable is using an assignment statement. And an assignment statement looks like this:

Speed_of_light = 299792458

We have a name, followed by an equal symbol, followed by an expression. After the assignment statement, the name that was on the left side refers to the value that the expression has.

The name can be any sequence of letters and numbers, as well as underscores, as long as it starts with a letter or an underscore. So here’s an example:

Python Programming : Introducing Variables
Python Programming : Introducing Variables

We could create the name, speed_of_light, and we can assign to it the value of the speed of light in meters per second. So after that assignment, the name speed_of_light refers to that value. One way to think of that is to have an arrow, so we can have the name speed_of_light, and that’s a name that refers to a value. And the value it refers to is this long value, which is the speed of light in meters per second.

Once we’ve done the assignment, we can use the name and the value of the name is the value that it refers to. In this case, it’s the speed of light in meters per second.
So here we’ve introduced the variable speed of light and we’ve assigned to it the value 299,792,458, the speed of light in meters per second.
And now we’ve got that, assign it a variable.
Instead of having to type out that whole number, we can use it directly. When we print out the speed of light, it will be the value that that name refers to.

So we’ll see, instead of seeing speed of light, we’ll see the 299 million value here. We can use in expressions as well. So if we want to convert it into centimeters instead of meters, we can multiply by 100 and now we see the result is the speed of light in centimeters per second.

Python Programming For Computer Science: Variables Quiz

So for this quiz, the question is, given the variables that are defined here. Your goal is to write Python code that prints out the distance, in meters, that light travels in one processor cycle.

Do the first variable as speed of light, and we’ve assigned to the variable speed_of_light the number of meters that light travels in a second. It might be hard to remember that. The second variable, we’ll call cycles_per_second, and we’ll give that the value 2 billion 700 million.

Given those two variable definitions, your goal is to write some Python code that prints out the distance, in meters, that light travels in one processor cycle. And we can compute that by dividing the speed of light by the number of cycles per second.

Python Programming For Computer Science: Variables Quiz Solution

So, here’s one way to answer this question. We have two-variable definitions. We can print out the distance light travels in one cycle by dividing the speed of light by cycles per second using the variables, and when we run that we see the result is 0.11. So, that’s that might be better would be to introduce another variable.


So, instead of just printing the result, we can store it in a variable. We’ll call it cycle_distance. Now, when we run it, there’s no result. We haven’t printed that out yet. But we’ve stored it in a variable.
Now, we can print out the result of cycle_distance. Which gives us the 11, 0.11 meters. If we want the result in centimeters, well since we’ve already stored the result in meters in a variable, we can compute that by just multiplying that by 100. And now we get the result in centimeters.

Python Programming For Computer Science: Variables Can Vary

The important thing about variables in Python is that they can vary. That’s why they’re called variables. Once we define the variable, we can change the value.
And then when we use that name again it refers to the new value.

Suppose we have a variable, a. And we’ll initialize it to the value 7 times 7. So what that does is introduce a name a.
And it refers to a value, which is the result of that expression. So it refers to the value 49, and that means when we look at the name a, we see what it refers to and we get the result, 49.

Python Programming For Computer Science: Variables Can Vary
Python Programming For Computer Science: Variables Can Vary


If we do another assignment. And in this case we’ll assign 48. Well, that’s a new assignment. We already have a name a. It used to refer to 49. But after the new assignment, it’s going to refer to this new value. Now it’s going to refer to the value 48. The number 49 still exists, but a no longer refers to it.

Now days refers to 48. Where things get more interesting is we can use variables in their own assignment statements. So here we have an assignment statement where we have the value a minus happens with that assignment? Well, we evaluate the right side first. We look for the value of days
and we see that it refers to 48. We compute a minus 1 and we get the value 47. Then we do the assignment that will assign to the variable a. So now the value days refers to the value 47, no longer refers to 48.

So we could keep doing that, if we did another statement, same exact one, that’s going to change the value again.
This time, the first time, the value a is 47, we’ll subtract 1, we’ll get the value 46. And then we do the assignment, that’ll change the value, so now days refers to the value 46.

Python Programming For Computer Science: Variables Assignement
Python Programming For Computer Science: Variables Assignement

The important thing to notice, this is not an equal symbol. This looks like an equal symbol. If you studied algebra you would think an equation like this looks like equality, and there is no way to solve an equation like that.
In Python and in most programming languages, equal does not mean equal. What equal means is an assignment. You should really think of it as an arrow. It’s an arrow saying put whatever value of the right side evaluates to, into the name on the left side.

We don’t write it as an arrow in most programming languages. There’s some that do, just because an arrow is harder to type, and lots of programs have lots of assignments. But you should think of the equal sign as not meaning equal, it means assignment.

Python Programming For Computer Science: Quiz On Variables

So now, we’re ready for a quiz to see that you understand the meaning of the assignment.

Python Programming For Computer Science: Quiz On Variables
Python Programming For Computer Science: Quiz On Variables

So the question is, what is the value that the variable hours refers to after running this code? And the code is above.

x = 11
x = x + 1
x = x * 2

First, we have an assignment statement assigning the value 11 to the variable x. Then, we have another assignment statement where the right side is x plus 1 and the left side is x. And then we have another assignment statement where the left side is x, and the right side is x times 2.

Try to figure out the answer yourself without evaluating this code in the interpreter. If you want to evaluate code in the interpreter, though, it’s certainly a good idea to try that.

Leave a Reply

Your email address will not be published. Required fields are marked *

error

Enjoy this blog? Please spread the word :)