Before building a database for yourself or a corporation, you need to understand a bit about database structure.
We are going to go in some depth on how to design a database, which deals with structuring your database the best way possible. However, I thought it would be appropriate to give an overview of the Db2 structure and start using the database right away.
Once we have the fundamentals down, we are going to then go back to database design to learn the details. This will allow us to more concretely discuss database design because we’ve already had some experience with the fundamentals.
The very first thing that you need to know is that data is organized into different tables. We first discussed tables in the Introduction to Db2. Each table describes one type of thing (for example, users). You should see the table as the blueprint describing how the data should be structured. Each row is an instance of whatever the table describes.
When we discuss database design, we will fully understand how to break up our data into different tables. For now, the important thing to understand is that a database consists of multiple tables related to one another.
In a Db2 database, tables are organized into schemas. A schema is just a fancy word for group. This allows us to organize our database in a more logical way when working with a large database. Not only are tables organized in different schemas, but all objects are.
The things we create in our database are called database objects and they are all going to have a name. These objects include tables, indexes, functions, stored procedures, triggers, etc.
When we create an object, we are going to give it a custom name to identify it. This name is called the identifier.
The schema name in combination with custom identifier is the fully qualified name of an object.
As an example, if we have a table named users, and this table is inside the db2inst1 (the default) schema, the fully qualified name of our object would be db2inst1.users.
For a minute I want you to try to visualize data. If you are looking at the structure of a company’s data from a thousand miles away, you will see a database or multiple databases. The database is the biggest piece containing our data. If we zoom into one database, we see schemas. These are the next biggest piece, second to the database. Zooming into one schema, we begin to see our database objects, such as tables. Zooming into one table we begin to see our columns and rows. Well, if you zoom in on one column, you will find even more detail.
If that made no sense, I’m sorry. Google Maps doesn’t offer the data-architecture zoom feature any more… Essentially, what I am trying to say is that we not only have to think about the structure from the big perspective, but also the very small details all the way down to the column level.
Every column is going to have a data type, which determines the type of data (was that too obvious?) each row can store in that column.
So for example, is the column going to store numbers, strings, a character, or binary? The column’s data type determines what is allowed.
Data types are a huge topic, so we cannot cover it all in this introduction. Let’s talk about the basic data types we’ll be using in the early blogs. First, stare at this image from the Db2 docs diagramming the types of data types:
As you can see from this chart, there are a lot of data types grouped in different categories.
Later on we will go in depth on the different data types, but for now we are going to learn just a few that we will be using immediately in the beginning of this series.
The integer data type (INT) is used for whole numbers and can be used to give each record an ID. Later we will learn about primary keys (unique identifiers), but for now know that the int data type is very important.
CHAR and VARCHAR
CHAR and VARCHAR are used for storing string data. A string is a sequence of characters, where a character is any acceptable letter/number/symbol. Strings are often signified by opening and closing quotes: “This is a string” is a perfect example.
The CHAR data type is used for strings that are supposed to be the same length for every single row. The VARCHAR data type is used for variable length strings. Think VARCHAR = VARiable CHARacter.
The BOOLEAN data type is used to store data that is in two states, usually true or false.
DATE, TIME, and TIMESTAMP
The DATE, TIME, and TIMESTAMP data types are all various data types used to stored dates and time. Later on we will get into the specifics of how this data is formatted, but for now just know that they exist and should be used when working with data relating to time and events.
The important thing to take away from this blog is a general understanding of how data is structured in a database and that we have to be explicit when it comes to the type of data. We control the type of data by giving each column a data type.
IMPORTANT: In the next blog we are going to begin using Db2. This means that you should have an edition of Db2 installed and ready to go. To get the Community Edition, use this link. The installation is pretty straightforward and will hopefully not give you any problems!
Here it is, your guide on getting started with Db2! Working with new software can be stressful, so my goal is to make this as easy as possible for you. Fortunately, Db2 setup is not too challenging.
Before we dive in and start downloading stuff, I want to take a moment to explain a few of the different editions of Db2. First off, we will be using Db2 for Linux, UNIX, and Windows. The edition we will be using is calledDeveloper Community Edition (You may hear it as Developer-C) which is a full-fledged version of Db2 with the only limitation that it is only to be used for development.
When you build an application with Db2, you can deploy a free version of Db2 which will work great but has limited capabilities. This is called the Express-C edition. This edition is great because we are able to use Db2 in a production environment, but this edition may not meet the needs for your application.
If you are releasing an application, there are multiple editions that can meet your needs exactly.
The important thing to know is that the various editions of Db2 share the same code. Thus, switching between versions is not an obstacle when it comes to scalability.
First, download Db2 Developer Community Edition here.
Go through the simple installation steps.
You are going to want to make sure that you check all boxes (Docker, Data Studio, and Db2) so we download everything we are going to need for these blogs.
It will then ask for some credentials.
The first set of credentials is for Data Server Manager (DSM), which is a tool used to work with the database. The next set of credentials is for the database itself (user…? learn more about authentication). You will also get an option of which sample database to install. I recommend you install the database for transactional workloads if you are unsure:
Once you click continue, Docker will be installed. Docker is a tool that is used by Db2.
Does Db2 Developer edition come with DSM Enterprise or base edition? https://www.ibm.com/developerworks/downloads/im/dsm/index.html and what changes are there in this newest version? Are parts of data studio moved to DSM or does data studio do everything DSM does plus more? Is DSM enough to do everything with the database or do you need data studio? If you need it why is it optional in the installation?
The primary tool we are going to be using is DSM, which allows us to easily work with our Db2 Database. Within DSM we can execute SQL against our database.
First off, what is a database? A database stores data, which is any information that might be of use to a company.
Db2 is a database created by IBM that is best known as an enterprise database, so we are talking about a database used by huge companies who have more data than they know what to do with. The great part for us is that that there are multiple editions of Db2 going all of the way up to the enterprise editions, but actually starting with a completely free version. That’s what I’ll be using.
So back on the topic of data. A database will often be used behind the scenes when you are using some software. Anytime you put information in software that you can access at a later time, that information will be stored in a database.
What’s the Big Deal with Data?
Databases are often used by businesses and organizations. The data that they store in a database is anything that they need to maintain their business, but also grow their business.
As an example, a company will likely keep track of all their sales and store that information in a database. They can then look back at their data and use that to figure out what products people like, which ones are earning the most money, and which ones they should discontinue. You may have heard of analytics, which is used to analyze data.
Structured and Unstructured Data
The data inside of a database is often represented in tables. A table will have columns and rows.
At the top we will have column headers that describe the structure of our data. Let’s say we have a table, users:
Each row is an individual user. This is how we store structured data, which fits nicely in a table. Db2 also supports unstructured data, which doesn’t necessarily fit the structure of a table.
There are endless ways you could represent this as unstructured data. One example would be like so:
Notice that we have the email now included for Amy. This syntax is called JSON which is a standard for representing unstructured data and is supported in Db2. Right now, the exact way we represent our data doesn’t matter. What matters is that we know Db2 is versatile when it comes to storing data.
Why a Database?
Why would you use a database when there are so many other options out there for data storage. As an example, why not use spreadsheets to store our data?
The first reason is data security. If we were trying to use something such as a spreadsheet to store our data, we would be in serious trouble because spreadsheets are all-or-nothing. If you want to share a spreadsheet with coworkers, you have to send them the whole document. This could work okay for small companies, but once we get a few employees this can introduce some serious security issues and data integrity issues when you make all of your data accessible to the organization. Even if you restrict access to certain people, those select people will still have access to the entire spreadsheet.
A database allows you to be extremely selective on who can read certain data and who can add data to the database. Imagine you had a spreadsheet with a social security number column. This is extremely sensitive data that you could not easily hide. With a database, you can restrict the reading of data at the column level through views showing only the data you want readable. There are immense capabilities when it comes to securing our data with a database.
The second reason is scalability. Scalability is the ability for you to easily grow your business without outgrowing your technology infrastructure. A simple example of this is how a database supports a very high number of concurrent users. Databases are built to allow multiple connections at one time without the risk of data conflict. This means as your data needs grow, you already have the infrastructure to keep up. With Db2, there is virtually unlimited scalability thanks to technologies such as pureScale, Db2’s clustering technology, and BlueMix, a cloud platform where you can easily provision 32 x 2.7 GHz Xeon processors and a terabyte of RAM for your Db2 database.
A database offers disaster recovery, data compression, and is designed to work well with programming languages. Db2 can be coded against using all of the major programming languages and frameworks–Java, C/C++, .Net, Ruby on Rails, PHP, Perl, and Python.
If you are building an application that is going to have any complexity or more than a few users, a database is essential. Of all the database options out there, Db2 makes a pretty good choice.
First, the price of starting off with Db2 is free. That’s usually enough to get me interested, but as I mentioned before, Db2 has multiple editions allowing you to scale from being a sole developer in your mom’s basement, to running a fortune 100 company across the world. Db2 offers more than the competitors at a cheaper cost. Many companies have experienced cost savings of over 50-60% compared to Oracle Database, a competitor of Db2.
The lower price tag is not at the expense of performance. Users of Db2 have also experienced extreme performance through built in technologies such as BLU acceleration, which allows for extremely fast analytical processing.
Throughout these blogs we are going to explore the basics of Db2 and also some of the cool capabilities.
Db2 is a database that you need to add to your resume. Check out the next blog to learn more!
Check out the first video in my Db2 Video Series (coming soon)!
So far we’ve learned about the very basic concepts of Db2, but now I want to dive in a bit and work with tables. Some of the material in blog may take a few more blogs to fully understand, but that is okay. The goal is to get you started. Later we will come back and learn all the details.
We will be using SQL (Structured Query Language) to communicate to our database. DSM (Data Server Manager) is the UI (user interface) that we can use to work with our database, but we should do as much as we can with SQL. Why use code when there is a user interface? I’m glad you asked.
The user interface is a great way to work with our database, don’t get me wrong. The problem with using the UI exclusevely is that you are limiting your usefulness. When writing an application that interfaces with the database, you are going to have to use code.
Additionally, SQL is a standard language that is used widely by other databases. As a developer, it is important to learn the information that is going to transfer between technologies. If we learn the SQL standard, we can easily work with Db2, Oracle, SQL Server, MySQL, etc, all of which are based on the SQL standard.
How to Use SQL in DSM
There will be a panel on the left containing a button Run SQL that will open a window similar to this one:
Upon opening this Run SQL tab, you will be prompted to give your credentials. These were set during the installation process (db2inst1 was the default and you chose the password):
You can optionally check Save credentials to repository to save you some effort in the future.
Creating a Table
Once you have the SQL panel open, type out this code:
This is the SQL needed to create a table with 6 columns. Each column is put on a new line with the data type immediately following the column identifier (column name). Putting each column on a new line is simply a convention. SQL is whitespace insensitive, meaning you can put spaces, tabs, and newlines to space out your code as you like. This is equally valid:
You may think this is harder or easier to read. It is ultimately up to you how you space out your code.
Once you have your SQL ready, you can execute your code by using the Run dropdown or issuing the shortcuts (such as CTRL S):
The Run Selected command will execute the SQL command that is currently clicked, up until the next semicolon (;). Alternatively you can highlight over multiple statements and then click Run Selected to execute them all.
The Run All button will execute everything in the document.
Once the script executes, there will be a new entry in the result panel below:
If for any reason there is an issue, the results will look like this:
Clicking the Log button will give you more information on the error and a link to the documentation for that specific error:
Once you have the table created, you can find in using DSM by going to Administer -> Tables.
From this page you can easily see all of the database tables. You can also see that the users table we just created is in the default db2inst1 schema. As we talked about in the data structure blog, a schema is a logical grouping of database objects. We will discuss schemas in more detail in the future.
Dropping a Table
We can drop this table that we just created very easily by issuing this SQL:
Be careful executing this command because it will delete everything inside of the table.
Looking to the Future – Primary Keys
We will discuss primary keys in detail, but for now it is important to understand that it is wise to label a particular column as a table’s primary key. Thus, if we recreate the table we just dropped, we might do it like so:
This addition is essential when it comes to data integrity, which we will discuss in more detail soon. Essentially, though, adding the PRIMARY KEY keyword forces each row to be unique and will be used as the basis for relationships between tables.
We’ve learned the very basics of DDL (Data Definition Language). This is the section of SQL that involves defining the structure of our database. Two of the important commands in DDL are CREATE TABLE and DELETE TABLE. Once we have the structure, we can begin working with data. That’s what we will be doing in the upcoming blog.
What is Mongo DB? MongoDB is an unstructured database that is becoming extremely popular for speed, ease-of-use, and scalability.
The Problem with Structured Data
Until now, the majority of the my content has covered relational databases. You communicate with a relational database using SQL (Structered Query Language). A relational database follows a structure defined by columns in a table.
The problem with a structured database is that it can become too structured and require a lot of resources to create and design a good database. For instance, I have a fifty part YouTube series just on how to design the structure of a relational database, and that doesn’t even talk about how to use the database.
Just to show one example of this, what if we have a table that defines some structure and for a specific row we want to have some extra data?
This is not easily done with a relational database.
A solution is a new type of database…NoSQL (Not only Structured Query Language). A NoSQL database is also referred to as an unstructured database. Like I mentioned, there are tons of different types of databases, but MongoDB is a specific kind of unstructured database called a document database.
Now don’t think that unstructured data should completely replace structured data. Structured data is still a great thing. Instead, think of unstructured data as being an addition to your structured data – as something that augments your development experience.
A document database has a list of key-value pairs. We can literally redraw our data using key-value pairs rather than having our data be rows inside of a structured table (You’ll see what I mean in a moment). In database terms, a key is known as as a field or attribute. In a moment we will draw out a database structure using MongoDB syntax.
Defining a simple structure
The term document is going to come up a lot when working with MongoDB. A document is a record (sort of like a row in a structured table). For example, if we have a database for users, a document would be one individual user.
Now, a document has a very specific structure. We will start off with a very basic structure.
Here we have fields and values.
As you use NoSQL more and more, it becomes a way of thinking and you are able to think outside of typical structure.
Now, a document can actually get pretty complex, and there are a lot of things you can do with one. This is just the bare minimum.
By the end of this article, I want you to understand what a document is and know how to work with one. Our goal is to take a document, throw it into a MongoDB database, and get that data back out.
3. When you extract the download, there is going to be a bin folder. Inside of this bin folder there are two important files: mongo, and mongod. Mongod is the server, mongo is how we communicate to this server.
4. We are going to need to create a new folder: \data\db. You can do this in a terminal using:
Alternatively, you can create this new folder using the Finder/Explorer.
5. Once we have this folder created, we can open mongod to start the server.
6. Next, open mongo and this is where we will run our commands against our MongoDB database.
If you receive any pop-ups about security, you’ll probably want to click private network only and not public network.
MongoDB is now installed, and we are connected to the database. You may want to make MongoDB a service (so that it runs behind the scenes). Refer to the installation guide linked earlier for more detailed instructions on how to do this. Setting MongoDB up as a service will ensure that the database starts right away when your computer turns on. If you don’t do this and you want to use MongoDB in the future, you’ll have to run mongod again to get it started.
Creating a Database
To view available databases, run this command:
Run this command to create a database:
This will create the database and set it to be the active database.
Collections and Inserting Data
A collection is like a table. It is literally just a grouping–or collection–of documents. This allows us to organize and group our data.
You have the option of explicitly creating a collection. Or what you can do is just insert a document into a collection and if it doesn’t exist, a collection will automatically be created.
You can see that when you do this, it returns this insertedID back. This is a unique identifier for this document.
The cool part is that we are not restricted to this structure (only these keys). We can actually insert another document but with a different structure:
This is not possible with structured data. If you tried to add a field to only one row with a relational database, you would get an error. The problem is a result of not having the appropriate column defined in the table structure. You would have to update the table structure in order to add a field.
Querying Our Data
Now that we’ve inserted some data, let’s get that data back out.
In SQL, this is equivalent to:
If you want to query the data but you don’t necessarily want to select all of the data, there is a way you can set some rules:
To make the results a little easier to read, you can also use:
In SQL, this is the equivalent to:
Before you go, I wanted to show you a tool that will allow you to easily manage a MongoDB Database. That tool is called ClusterControl. Severalnines, the creators of ClusterControl, breaks database management into four parts:
ClusterControl is designed to help you do these four things seamlessly. The cool part is that this can integrate with other database systems, MySQL and Postgres. Give it a try and let me know how you like it!
The goal of this post was not to make you a MongoDB master. Rather, my goal was to introduce you to the key features of MongoDB and help you see it as a tool you can use in certain circumstances.
Yes, I think it is important to learn the in-depth details, but that is something more suited for a long, in-depth video series. Let me know if that is something you guys would like to see!
My closing thoughts are that you should at least Give NoSQL a try. The way the world works with data is changing and if you are not ready for those changes, you are going to be behind.
If you are a developer, I encourage you to try to build a small app with MongoDB.
If you are running a technology company, know that the benefits of MongoDB and NoSQL in general shine when you realize that the structure of data can change over time and that using a NoSQL System allows you to scale seamlessly. It is important to wrap your head around it in order to future-proof your career and/or your business.
Like I mentioned earlier, NoSQL is not replacing SQL. The reality is that there are things that relational databases do better than non relational databases.
What exactly do I mean by future software engineer? No, I’m not from the future. I am currently working towards becoming a software engineer. I originally developed software as a contractor before I took a role in marketing, and now I’m trying to get back to my roots and get a software engineering position. Not just any development position, though. I want one that is challenging and one that will give me reputable experience. Because of this, I am preparing for the types of interviews given at the big name software companies (Google, Microsoft, Facebook, Amazon, IBM, Netflix, etc). I documented and recorded my experience for an entire day. The video is at the end.
Because I work full time, I only get so much time to work on my development skills. This is challenging because I feel it takes me a very long time to get anywhere, but it’s good because it’s teaching me not to waste time.
This blog showcases about what you would expect on a typical day for me as a social strategist in a tech company looking to switch roles to a development position.
First, I Do My Daily Work.
First thing in the morning I get to work.
I work 7AM to 4PM in the office about four days a week, and the office is quite the drive. Friday I am fortunate enough to work from home which always helps my creative juices to flow.
On a typical week day, I get home pretty late and don’t have much energy left, but I’ll occasionally be able to give an hour or two to coding. Friday and the weekend is where the magic happens, though.
For the sake of this blog, we’re going to look at a typical Friday.
Fridays are special right now because we have something called summer Fridays. This is a season of time where we can use Friday to focus on catching up, finishing certain projects, brain storming, career development training, or on occasion leaving early for a longer Friday evening away from work. When studying development, I do my best to take advantage of this time. This means that I try to finish my required Friday work within the first half of the day to designate the second to these other tasks.
I start work around 7, and try to finish by 11 or 12.
Then, I Focus Solely on Development.
Once 12PM hit, I joined my scheduled interview on Pramp.
For those of you who don’t know, Pramp is an extremely helpful website that puts you through mock interviews with your peers. These interviews are awesome because they allow you to work on problems given in real software engineering or software development interviews at the big name companies. The first half of the Pramp session I interview a peer (which also helps me grow my skills), and then in the second half the peer interviews me. We both work on distinct problems.
I love Pramp because It’s free and allows you to do unlimited interviews with peers in a manner that fits your schedule. Once you get enough experience you can interview with real companies through Pramp. Sweet!
This specific experience on Pramp was a rough one. The interviewer was great but I was given a very challenging question (a regex parser) that suggests a recursive solution. This question required me to spend time after my session to study, but questions can range from beginner to expert and you are able to tell Pramp how difficult you would like the questions to be.
Directly after my Pramp session I went out to lunch to Chipotle.
My spouse and I love chipotle because it’s reasonably priced, yet you get a large amount of quality food. We tend to do this on the weekends as we are usually both pretty worn out from the week and enjoy the opportunity to skip cooking and go have a date with one another. This is the perfect break from coding. I even treated myself to some frozen custard.
Back to Work…
Upon returning from lunch, I spent a bit more time working on my Pramp solution. After enough time I decided to switch gears to a different medium of practice, reading. I started reading Cracking the Coding Interview (CtCi).
There is a lot of controversy with this book because a lot of people believe that the manner in which software interviews are done is unreasonable and doesn’t directly translate to a candidate being efficient at the job and a good employee. That being said, a lot of companies are using resources like CtCi to define their interviewing processes. Because of this, I have decided to use this book because I am confident it is another resource that will help me train to be ahead of the competition in interviews and potentially even in the job itself.
The Structure of Cracking the Coding Interview
This book essentially has a topic and then a bunch of practice built around that topic. This is what you experience when you get into some of the more meatier chapters past chapter five. The first 5 chapters are more general and talk about general advice to doing well on an interview. These are the chapters that I read as this was my first time using this book.
I think I’ll go through most of the beginner exercises in this book, but I’m not going to have time to go through it all. My technical interview is in a few weeks and I only get so much time to work on code (I have other responsibilities like my job, YouTube, family, dog, etc).
My recommendation would be to read the first five chapters in one sitting. The content is fairly non-technical. That being said, the action plans may take longer than one sitting. These chapters force you to think about some of the most common questions and requests given in interviews. The reason I recommend doing this all in one sitting is because it enables you to see interviewing from the big picture. I believe sometimes I get caught up in the detail. When I read through all of these chapters in one sitting relatively quickly, I felt like I understood the technical interview and would be able to go into one with the same thoughts as the interviewer.
I put each question I found in the first chapters into a document and wrote down my answers (like my biggest weakness, etc). This allowed me to think through my responses prior to jumping into an interview and embarrassing myself. I was able to go through this material fairly quickly because I have done similar exercises for previous positions I interviewed for.
Upon finishing this section, I was at the chapter on Big O. At this point, my brain was totally fried. I put in over a days work and it was now 7PM. In addition to it already being 7PM, I still had a video I had to record for work. I had to stop studying.
Dealing with Anxiety
Part of my challenge is that it is very hard for me to let things go. I will obsess over something, get overwhelmed, crash, and then avoid it like the plague. This happens in cycles. Because of this, I often find myself studying for an entire day, quitting, and then going a few days without studying at all. When I am obsessing, I find myself thinking about coding even when I’m not coding or when my brain should be focused elsewhere (like on sleeping or family time). This is obviously not a sustainable approach to learning and living.
To address this problem, I try to pair productive times with stress reducing and mind relieving activities. After my crazy Friday, I took a long walk outside and then I went to the movies to see Mission: Impossible at 10:30PM.
I didn’t make it to bed until after 2AM that night, but it was well worth it as the time at the theaters allowed me to get my mind off of coding. After the movie, my mind was brought back to reality and I was able to go into studying again in a balanced manner the following week.
What Does a Typical Day Hold?
Although this blog is set on the particular Friday I decided to record my activities on, there is another activity that would usually be part of my day. Source control. Huh? Yeah, you heard me. I try to take extra time to take my code and commit it into GitHub or BitBucket. I see this as the journal of success. Even if I fail my interview, I’ll have all the effort recorded in an organized manner that I can refer back to when I need to refresh my brain or when I need to see how I solved a particular problem.
In addition to being a record of practice, it can show your experience. How could someone say I don’t have experience in C++ when I have an entire repository of numerous solutions to various computer science problems (well, I’m working on the numerous part)?
I basically assume that my brain will forget everything within a year, so I make sure to use source control to my advantage.
I believe this day was a massive success. It may have been that part of this success was fueled by the fact that I was sharing my day with the entire world on YouTube. In fact, that was probably a huge motivator. But even when not broadcasting my life, I am able to make progress towards my goals when I know there will be a reward for my effort.
Even on the weekends I find myself able to get up early (like, no earlier than 8AM, of course) because I know that I have a goal I’m working towards and I don’t want to be defeated. I’m even writing this blog right now on a Saturday afternoon after editing my day in the life video. It sucks, but the result is worth it.
I feel very similar about my software engineering prep. Although it is hard to see the reward when I give up an entire day to reading or watching tutorials, I can persevere because I know that each day is contributing a piece to the larger puzzle. If I skip out or fail to put the effort forth, I’m going to end up with an incomplete puzzle.
This specific position may or may not be the final picture displayed when I put together my puzzle. And that’s OK. If I do my best and act myself, then I’ll leave the rest to God. What I can control is putting the pieces where they need to go and putting forth the work needed to better my future for me and my family. If the reward’s not at this job interview, it may be at the next.
Enjoy this content? I’d encourage you to follow my same pattern. Give Pramp a try and start reading Cracking the Coding Interview! You can also check out my blog on getting your first coding job (coming soon).
At this point, you’ve probably already read part 1. This means that you know the basics of what coding is. A better question is: Do you know how to get a job? Knowledge is only part of the battle in the software development industry. I’ve been using this great website called Pramp. Pramp is a website where you get paired with other people and do mock interviews. Technical interviews are very challenging. Pramp gives you what you need to succeed. On top of that, companies are using Pramp to find competent individuals for real jobs. People have used Pramp to get positions at Facebook, Microsoft, Twitter, and more! I promise you won’t be disappointed, so check Pramp out!
Can We Automate That?
One of the big reasons companies develop software is for automation. Automation is enabling something to happen automatically without you having to dork with it. With software, we can write an application that has code that will execute the same thing over and over again. This means that something that used to be a complex process can be made very simple using software.
The whole idea behind self driving cars is automation. I would argue that the reason self driving cars are most likely to sell is because of automation. Why would any modern, busy individual want to drive when they could automate the process?
Even the buzzwords like AI and machine learning are rooted in automation. For the uninformed, AI is artificial intelligence and allows computers to make decisions. Machine learning is a method of AI that let’s computers learn from historical data.
If I wanted to do something like fraud detection, no-one has to look at my transactions to see if something is phishy. Machine learning uses data to automate this process and say “Yes, this is fraud,” or “No, this is not fraud.” This means I can swipe a credit card and get approved or denied immediately depending on whether or not it’s likely to be fraud. Obviously, this is not perfect. Many of us will go on a trip, swipe our card, and get denied. That being said, I believe the process is still improving and is much better than other options. Automation is still more effective than manually checking for fraud or waiting till fraud is reported and having to deal with it then.
Automation software often has a very specific goal. This means that the software can often be refined to a point where the automation software is actually better than a human at a particular job. Not only because it’s faster, but because software is much less likely to make mistakes. This means that the software can do more of a job than a human could, allowing the company to scale. By automating processes, a company can focus on things of higher value (such as growing their company).
Another example of automation is when you sign up for a website you automcatically get a verification email for you to verify your account. This is another example of automation.
Software comes down to one question: What can I do to make the user’s life easier? The user in this case would be the customer, or the person using my software. 99.9% of the time, the answer is automation. As a software developer, I want to create software that does something that typically takes the user a long time and is cumbersome, but my software does it quickly with only a click of a button. If the value my software adds to the customer’s life outweighs the cost of the software, people will buy it! As a result, a piece of the software development life cycle is finding out a problem people have that software could easily fix. If I automate the process in code, I make a ton of money. That’s a very money-centric view of how typical software development works. I’m by no means trying to say that this is the only reason you should develop software, all I’m saying is you should expect to find that kind of attitude in the industry.
Software Engineer VS. Software Developer
As an aside, I wanted to briefly talk about job titles. Is there a difference between software development and software engineering? Short answer: no, not really. Long answer: kind of (read on).
There are a ton of job openings for both software developers and software engineers. At first glance, it seems they are asking for the same requirements: “Entry level software engineer, 5 years of experience required.” Or, you may find a position for a developer: “Entry level software developer, 5 years of experience required.” Either position you take, you’re more than likely going to be doing about the same thing. From what I’ve seen, a software engineering role is possibly harder in that you may be expected to have some knowledge of traditional engineering, you may also take a small step back and focus a bit more on software architecture, design patterns, “engineering” the solution, etc. That being said, many software developers do all of this too.
Why am I even bringing this up? I kind of forget why, honestly. But it’s good to know. The main thing to takeaway from this is that many use the titles software engineer and software developer interchangeably, but you may tend to hear software engineer when talking more about designing solutions than building something with a particular programming language.
The challenges of Engineering Software
We’ve talked about the why of software (💰💰💰), but now let’s talk about the how.
A software engineer is forced to generalize his or her solutions. I’ll do my best not to overcomplicate this.
Generalizing is when you try to think of every possibly input and output, rather than some specific example.
One big reason I am glad that I went through a traditional bachelor’s of science is that I received training in math that I would likely have never have pursued on my own. I went into the program thinking I would dread the math, but ended up benefiting a lot from the material and actually had really good grades in Math.
One takeaway I got from the math is that it helped teach me, practically, how to generalize.
If you think of something in math, such as f(x) = 2x, we’re saying that for any input, we get twice the output.
This is a general solution, rather than saying we input 2 and get 4, we say we input x, and get 2x.
This skill was practiced even more when I got to proofs. To prove some general rule, you have to think of how to prove that in the general case (meaning without the use of an example to prove your point).
I’m by no means saying that going through a bunch of math courses is the only way to succeed in software engineering. All I’m saying is it is a good way to practice techniques that directly translate to coding software.
Let’s look at generalization when it comes to input for our code.
Dealing with Input and Output
If you consider our code to be a black box (meaning all you need to worry about is the input and output), you’ll quickly see the vast number of possible inputs. A developer must not only consider what happens when the user puts in something expected, but also when the user puts something totally meaningless. This is especially true for applications that take user input in the form of commands or text.
There are multiple ways to deal with the possible inputs. The first way is to restrict input. The most common way of restricting input is by going from a CLI (command line interface) to a GUI (graphical user interface). A GUI is able to restrict possible inputs by giving the user a limited number of options to select from. This will work in many cases, but not all (specifically, don’t expect this to work with web development).
The second way to deal with the possible inputs is to validate every input. This is something not often required in school projects, so it is often not well taught. This means that if you expect a particular type input, you must check if the input is valid. If it’s not, the input is rejected and the user must be asked again for another input. Expecting a number? Reject strings. Expecting a a letter? Reject numbers. Etc.
Hard Coded Data
One thing that kills generalization is hard coded data or values. Something is hard coded when it is typed out in the source code. Hard coded values destroy generalization because once the software is delivered to the customer, that value is permanent. The only way around this would be to change the value in the code and deliver an update to the user.
Whenever possible, data should be placed outside of the source code. This can be done in a text file, some sort of configuration file, an excel file, a database, or asking the user a question. The hard coded value is then replaced with code that reads the value from the file or asks for user input.
In this situation, the configuration file is also delivered to the customer with the final software. We can change the way software works by changing the values in the configuration file instead of having to give them a whole new version of a software (imagine having to get a new version of Microsoft Word every time you wanted to change a setting).
Here is an example of hard coded data:
This is a better:
You can clearly see the potential for garbage input here. What if I put “pizza” as my name? It’s vital that we validate all input prior to processing it to do something useful.
Once we are able to take input, we can begin branching the possible paths in our program. For example, if we have a game that you must be 13 or older, we can ask the person for their age. If they say something less than 13, they are rejected, if they say greater than 13, they are given access. Branching is most easily done with an if statement.
As you begin to understand more on what coding is, how we write software, and how to design and engineer solutions, you’ll begin to see that the vast amount of knowledge required can be overwhelming. Because of this, I thought it would be useful for me to recommend some resources that could help guide you through the process.
Because this blog series is looking at general software development / engineering, I am listing some resources that will help you think about how software is built sucesfully, what things to understand when going into software engineering, and some of the top books for different types of projects.
For everyone – Cracking the Coding Interview. This single resource has helped me tremendously. Not only does this book go over technical questions and answers, but it gives a lot of context as to why technical interviews work the way they do. This summarizes the most important stuff covered in a 4 year computer science program. In my opinion, this content of this book is useful beyond the interview. I’d highly recommend it as the one resource to guide you through your computer science education. Although this book can be more challenging at times, it will at least be a good guide on what you should understand, which will ultimately help you know what you need to research.
For everyone – The Agile Samurai – This book was a tremendous help to me when I wanted to understand more about the software development lifecycle. It guided me on how to get actual software out the door. This is a great book if you need something to casually read as it is not very technical, but tremendously helpful! I actually listed to the audiobook.
For those interested in C – Programming in C – This book is a basic introduction that will guide you through coding in C. This is a great resource if you are interested in doing more lower level programming. By that I mean you are closer to the operating system and have a lot more to worry about. That being said, lower level programming often helps you understand things at a much deeper level. I’d highly recommend reading this for even only an hour a day. You will make progress on your understanding of C.
For those interested in C++ – The C++ Programming Language – Highly rated, highly known in the industry, I’d recommend this book to anyone who wants to really understand C++. It was authored by the creator of C++ (Bjarne Stroustrup). This would be a great read if you are interested in lower level programming with more universal capabilities (like object oriented programming).
For those interested in Python – Python Crash Course – This is a book that will give you what you need to know on Python. This would be a good place to start if you want to build applications but not get so tied up in all the syntax. Python is a perfect programming language for beginners as it is very intuitive and easy to read. Not to mention that Python is very popular and is becoming even more so with the rise of machine learning, deep learning, and other varieties of artificial intelligence.
For those interested in C# or Java – Murach C# or Murach Java – I own and have read numerous Murach books. One unique thing about these is that they break each section up into one page of text and one page of code examples. This makes going through the content very simple. I read through Murach C# from cover to cover and I loved it. It took me a really long time, but was totally worth it. Why would you want to go for C# or Java? Primarily because the language is C++ like in syntax and object oriented programming, but it is managed. A managed language deals with a lot of the memory with features like garbage collection. Not to mention C# and Java can be used for web development and various other solutions (such as Xamarin for cross platform mobile app development).
And here is my video What is Coding Part 2:
I hope these resources are helpful. The journey here is not yet done though, so be on the lookout for part 3!
So you want to learn about coding, huh? Maybe you’re looking at becoming a software developer or software engineer. This is the blog for you! I’m hoping to give you the 10,000 foot view of coding.
Be sure to check out the sponsor of this post, Pramp, where you can get the best preparation for technical interviews.
Code is the basis for every software application. An application is anything that runs on a computer. This includes any websites you visit, games you play, apps you use, etc. Software is now embedded in almost everything, such as all modern day vehicles.
Coding is done with programming languages. There are a ton of computer programming languages: C, C++, C#, Java, Python, Go, Scala, F#, Ruby, etc. Each computer programming language is slightly different and each one has their own keywords.
Coding can be described as a black box. This means that the user does not have to worry about what is going on behind the scenes of an app. If a user opens an iPhone app, they do not care about the code. All they care about is that when they click a button (input), something happens (output).
When something is described as a black box, it is an example of an abstraction. An abstraction hides the inner detail and the user only has to worry about the interface, or how to interact with the program.
There are two primary types of interfaces: a CLI (command line interface) and a GUI (graphical user interface).
A CLI allows the user to interact with your application through commands. This is often done in either a unix terminal or a windows command prompt:Your code will output different things depending on the input given through the CLI.
One of the primary benefits of a CLI is that you can write scripts to interact with your program.
The alternative to a CLI is a GUI, which graphically allows the user to interact with your application through buttons. A perfect example of this would be essentially any app you use on a mobile device, game, etc.
Coders tend to like programs that have a CLI because they can execute commands against the application themselves or create a script to do it for them. Non-coders prefer a GUI as it makes working with the application much easier.
The application that has an interface that can be programmed against is said to have an API. An API (application programming interface). An API will have various commands that can cause the application to do something (this is very similar if not the same thing to a CLI, but you may not interact with it through a terminal). A simple example of an API would be the Twitter API. The Twitter API opens up commands that can be executed programmatically (through another application).
Each command available through the API is known as an API endpoint. Applications will be written to take advantage of these API endpoints. A common example of this would be an application that monitors Twitter accounted through the Twitter API and ranks them socially. Another example would be an application to schedule tweets.
This is a good intro, but now I suggest you Jump into part 2.
This will be a blog dedicated to designing relational database indexes. By this I mean we will be discussing the important topics of indexing a database but we will not be programming any SQL. This is all part of database design.
What is an index?
Wow, such as awesome question! An index allows us to select data from our table much quicker. Think of an index in a book…If you want to know something about a specific topic, you check the index.
In databasing, we can give a specific column or group of columns an index allowing us to search through our data much quicker.
Indexes are used for select statements. A select statement is part of SQL that allows us to select certain data. When we use a select statement we first put the columns that we are selecting. After we finish the statement our RDBMS will go through our data and select all of the data that meets our requirements given in the select statement.
The downside of indexes is that they will slow down our database when we have to update existing data. This is because not only does the data have to be updated but the indexes have to updated as well.
The benefit that indexes have is much more compelling than the downsides they bring. The tip for indexes is: Use them, but don’t overuse them. Only put indexes on columns that you are going to frequently use. Read on for an in depth explanation of indexes and when to use them.
When to use an index
An index should be used whenever you frequently use a column in a WHERE clause within a SELECT statement. An example of a WHERE clause (in English) would be selecting every user WHERE they have the last name Curry, for example. In this case, we are using a SELECT statement on the column that is probably called last_name. If we do this constantly we may want to consider creating an index on the column last_name. If we do not create an index, a table scan will be performed where the database goes through all of the data. This can take a lot of time! With an index, our RDBMS will be able to directly find the last names of Curry and find the appropriate rows using a row address (sort of how an index in a book tells what page to go to). This process of using the index to find data is known as an index seek.
The other common use of indexes is when we join multiple tables. In order to join multiple tables, we tell our RDBMS which rows are talking about the same person. We will often use a foreign key and a primary key of another table. An example would be joining the columns of the user table with the columns of the comments table where user_id of the user table = user_id of the comments table. This user_id is the connection between tables so the RDMBS knows which rows go with each other. Joins are used to create new views. Here is an example of a join:
Here you can see that the generated view attaches the comment to the user because it knows that the user_ids are both connected.
If we did this on a regular basis, we would want an index on this view.
If we continually do select statements where we select multiple columns, we can create a composite index, an index with multiple columns. This can be used to quickly search using the same columns in the future.
Clustered vs Non-clustered Index
A clustered index is just a category of indexes. Clustered indexes are the fastest indexes. They work sort of like a telephone book/directory. In a telephone book you can sort through data in an organized fashion and the phone number is stored right beside it. A non-clustered index is more like an index of a book, where the index and the data are stored separately.
A clustered index is usually the primary key. It reorders the way rows are actually stored. Because this type of index affects the actual storage structure of our table, we can only have one clustered index. With this type of index, the index architecture and the row data is in the same file and it does not require a second look up.
A non-clustered index stores the row data separately and using a another file it points to the row data. This requires an extra step because the actual data is not stored by the non-clustered index. Therefore, clustered indexes are faster. A non-clustered index is required to have the index seek because our clustered index decides how our data is stored. Therefore, a non-clustered index is unable to manipulate the storage of our data.
Clustered indexes are almost always used on the primary key while non-clustered indexes are used for other columns.
Indexes improve SELECT speeds but slow down UPDATE and similar statements.
Clustered indexes deal with the actual storage of the data.
non-clustered is pretty much just a file with a bunch of easy references to data. This does not deal with the actual storage structure.
Primary keys are usually the clustered index.
non-clustered indexes can be whatever columns you frequently use for searching your database.
composite indexes are indexes that have a reference list for multiple columns.
Oracle Indexes. A follower of mine wrote this article. This is a little more advanced if you are completely new to indexes, but this is a great way to learn how indexes are implemented in a particular database system.