A Day in the Life of a (Future) Software Engineer

What exactly do I mean by future software engineer? No, I’m not from the future. I am currently working towards becoming a software engineer. I originally developed software as a contractor before I took a role in marketing, and now I’m trying to get back to my roots and get a software engineering position. Not just any development position, though. I want one that is challenging and one that will give me reputable experience. Because of this, I am preparing for the types of interviews given at the big name software companies (Google, Microsoft, Facebook, Amazon, IBM, Netflix, etc). I documented and recorded my experience for an entire day. The video is at the end. 

Because I work full time, I only get so much time to work on my development skills. This is challenging because I feel it takes me a very long time to get anywhere, but it’s good because it’s teaching me not to waste time. 

This blog showcases about what you would expect on a typical day for me as a social strategist in a tech company looking to switch roles to a development position.

First, I Do My Daily Work.

First thing in the morning I get to work. 

I work 7AM to 4PM in the office about four days a week, and the office is quite the drive. Friday I am fortunate enough to work from home which always helps my creative juices to flow. 

On a typical week day, I get home pretty late and don’t have much energy left, but I’ll occasionally be able to give an hour or two to coding. Friday and the weekend is where the magic happens, though. 

For the sake of this blog, we’re going to look at a typical Friday.

Fridays are special right now because we have something called summer Fridays. This is a season of time where we can use Friday to focus on catching up, finishing certain projects, brain storming, career development training, or on occasion leaving early for a longer Friday evening away from work. When studying development, I do my best to take advantage of this time. This means that I try to finish my required Friday work within the first half of the day to designate the second to these other tasks.

I start work around 7, and try to finish by 11 or 12. 

Then, I Focus Solely on Development.

Once 12PM hit, I joined my scheduled interview on Pramp.

For those of you who don’t know, Pramp is an extremely helpful website that puts you through mock interviews with your peers. These interviews are awesome because they allow you to work on problems given in real software engineering or software development interviews at the big name companies. The first half of the Pramp session I interview a peer (which also helps me grow my skills), and then in the second half the peer interviews me. We both work on distinct problems.

I love Pramp because It’s free and allows you to do unlimited interviews with peers in a manner that fits your schedule. Once you get enough experience you can interview with real companies through Pramp. Sweet!

This specific experience on Pramp was a rough one. The interviewer was great but I was given a very challenging question (a regex parser) that suggests a recursive solution. This question required me to spend time after my session to study, but questions can range from beginner to expert and you are able to tell Pramp how difficult you would like the questions to be. 

Lunch Break!

Directly after my Pramp session I went out to lunch to Chipotle.

My spouse and I love chipotle because it’s reasonably priced, yet you get a large amount of quality food. We tend to do this on the weekends as we are usually both pretty worn out from the week and enjoy the opportunity to skip cooking and go have a date with one another. This is the perfect break from coding. I even treated myself to some frozen custard.

Back to Work…

Upon returning from lunch, I spent a bit more time working on my Pramp solution. After enough time I decided to switch gears to a different medium of practice, reading. I started reading Cracking the Coding Interview (CtCi).

There is a lot of controversy with this book because a lot of people believe that the manner in which software interviews are done is unreasonable and doesn’t directly translate to a candidate being efficient at the job and a good employee. That being said, a lot of companies are using resources like CtCi to define their interviewing processes. Because of this, I have decided to use this book because I am confident it is another resource that will help me train to be ahead of the competition in interviews and potentially even in the job itself. 

The Structure of Cracking the Coding Interview

This book essentially has a topic and then a bunch of practice built around that topic. This is what you experience when you get into some of the more meatier chapters past chapter five. The first 5 chapters are more general and talk about general advice to doing well on an interview. These are the chapters that I read as this was my first time using this book. 

I think I’ll go through most of the beginner exercises in this book, but I’m not going to have time to go through it all. My technical interview is in a few weeks and I only get so much time to work on code (I have other responsibilities like my job, YouTube, family, dog, etc).

My recommendation would be to read the first five chapters in one sitting. The content is fairly non-technical. That being said, the action plans may take longer than one sitting. These chapters force you to think about some of the most common questions and requests given in interviews. The reason I recommend doing this all in one sitting is because it enables you to see interviewing from the big picture. I believe sometimes I get caught up in the detail. When I read through all of these chapters in one sitting relatively quickly, I felt like I understood the technical interview and would be able to go into one with the same thoughts as the interviewer.

I put each question I found in the first chapters into a document and wrote down my answers (like my biggest weakness, etc). This allowed me to think through my responses prior to jumping into an interview and embarrassing myself. I was able to go through this material fairly quickly because I have done similar exercises for previous positions I interviewed for. 

Upon finishing this section, I was at the chapter on Big O. At this point, my brain was totally fried. I put in over a days work and it was now 7PM. In addition to it already being 7PM, I still had a video I had to record for work. I had to stop studying.

Dealing with Anxiety

Part of my challenge is that it is very hard for me to let things go. I will obsess over something, get overwhelmed, crash, and then avoid it like the plague. This happens in cycles. Because of this, I often find myself studying for an entire day, quitting, and then going a few days without studying at all. When I am obsessing, I find myself thinking about coding even when I’m not coding or when my brain should be focused elsewhere (like on sleeping or family time). This is obviously not a sustainable approach to learning and living.

To address this problem, I try to pair productive times with stress reducing and mind relieving activities. After my crazy Friday, I took a long walk outside and then I went to the movies to see Mission: Impossible at 10:30PM. 

I didn’t make it to bed until after 2AM that night, but it was well worth it as the time at the theaters allowed me to get my mind off of coding. After the movie, my mind was brought back to reality and I was able to go into studying again in a balanced manner the following week. 

What Does a Typical Day Hold? 

Although this blog is set on the particular Friday I decided to record my activities on, there is another activity that would usually be part of my day. Source control. Huh? Yeah, you heard me. I try to take extra time to take my code and commit it into GitHub or BitBucket. I see this as the journal of success. Even if I fail my interview, I’ll have all the effort recorded in an organized manner that I can refer back to when I need to refresh my brain or when I need to see how I solved a particular problem.

In addition to being a record of practice, it can show your experience. How could someone say I don’t have experience in C++ when I have an entire repository of numerous solutions to various computer science problems (well, I’m working on the numerous part)?

I basically assume that my brain will forget everything within a year, so I make sure to use source control to my advantage.

The result

I believe this day was a massive success. It may have been that part of this success was fueled by the fact that I was sharing my day with the entire world on YouTube. In fact, that was probably a huge motivator. But even when not broadcasting my life, I am able to make progress towards my goals when I know there will be a reward for my effort.

Even on the weekends I find myself able to get up early (like, no earlier than 8AM, of course) because I know that I have a goal I’m working towards and I don’t want to be defeated. I’m even writing this blog right now on a Saturday afternoon after editing my day in the life video. It sucks, but the result is worth it. 

I feel very similar about my software engineering prep. Although it is hard to see the reward when I give up an entire day to reading or watching tutorials, I can persevere because I know that each day is contributing a piece to the larger puzzle. If I skip out or fail to put the effort forth, I’m going to end up with an incomplete puzzle.

This specific position may or may not be the final picture displayed when I put together my puzzle. And that’s OK. If I do my best and act myself, then I’ll leave the rest to God. What I can control is putting the pieces where they need to go and putting forth the work needed to better my future for me and my family. If the reward’s not at this job interview, it may be at the next. 

Enjoy this content? I’d encourage you to follow my same pattern.  Give Pramp a try and start reading Cracking the Coding Interview! You can also check out my blog on getting your first coding job (coming soon). 

What is Coding? 💻 Part 2

At this point, you’ve probably already read part 1.  This means that you know the basics of what coding is. A better question is: Do you know how to get a job? Knowledge is only part of the battle in the software development industry. I’ve been using this great website called Pramp. Pramp is a website where you get paired with other people and do mock interviews. Technical interviews are very challenging. Pramp gives you what you need to succeed. On top of that, companies are using Pramp to find competent individuals for real jobs.  People have used Pramp to get positions at Facebook, Microsoft, Twitter, and more! I promise you won’t be disappointed, so check Pramp out!

Can We Automate That?

One of the big reasons companies develop software is for automation. Automation is enabling something to happen automatically without you having to dork with it. With software, we can write an application that has code that will execute the same thing over and over again.  This means that something that used to be a complex process can be made very simple using software.  

The whole idea behind self driving cars is automation. I would argue that the reason self driving cars are most likely to sell is because of automation. Why would any modern, busy individual want to drive when they could automate the process? 

Even the buzzwords like AI and machine learning are rooted in automation. For the uninformed, AI is artificial intelligence and allows computers to make decisions. Machine learning is a method of AI that let’s computers learn from historical data. 

If I wanted to do something like fraud detection, no-one has to look at my transactions to see if something is phishy.  Machine learning uses data to automate this process and say “Yes, this is fraud,” or “No, this is not fraud.”  This means I can swipe a credit card and get approved or denied immediately depending on whether or not it’s likely to be fraud. Obviously, this is not perfect. Many of us will go on a trip, swipe our card, and get denied. That being said, I believe the process is still improving and is much better than other options. Automation is still more effective than manually checking for fraud or waiting till fraud is reported and having to deal with it then.

Automation software often has a very specific goal. This means that the software can often be refined to a point where the automation software is actually better than a human at a particular job. Not only because it’s faster, but because software is much less likely to make mistakes. This means that the software can do more of a job than a human could, allowing the company to scale. By automating processes, a company can focus on things of higher value (such as growing their company). 

Another example of automation is when you sign up for a website you automcatically get a verification email for you to verify your account. This is another example of automation.

Software comes down to one question: What can I do to make the user’s life easier?  The user in this case would be the customer, or the person using my software. 99.9% of the time, the answer is automation. As a software developer, I want to create software that does something that typically takes the user a long time and is cumbersome, but my software does it quickly with only a click of a button. If the value my software adds to the customer’s life outweighs the cost of the software, people will buy it!  As a result, a piece of the software development life cycle is finding out a problem people have that software could easily fix. If I automate the process in code, I make a ton of money. That’s a very money-centric view of how typical software development works. I’m by no means trying to say that this is the only reason you should develop software, all I’m saying is you should expect to find that kind of attitude in the industry. 

Software Engineer VS. Software Developer

As an aside, I wanted to briefly talk about job titles. Is there a difference between software development and software engineering?  Short answer: no, not really. Long answer: kind of (read on). 

There are a ton of job openings for both software developers and software engineers. At first glance, it seems they are asking for the same requirements: “Entry level software engineer, 5 years of experience required.” Or, you may find a position for a developer: “Entry level software developer, 5 years of experience required.” Either position you take, you’re more than likely going to be doing about the same thing.  From what I’ve seen, a software engineering role is possibly harder in that you may be expected to have some knowledge of traditional engineering, you may also take a small step back and focus a bit more on software architecture, design patterns, “engineering” the solution, etc. That being said, many software developers do all of this too.

Why am I even bringing this up? I kind of forget why, honestly. But it’s good to know. The main thing to takeaway from this is that many use the titles software engineer and software developer interchangeably, but you may tend to hear software engineer when talking more about designing solutions than building something with a particular programming language. 

The challenges of Engineering Software

We’ve talked about the why of software (💰💰💰), but now let’s talk about the how

A software engineer is forced to generalize his or her solutions. I’ll do my best not to overcomplicate this.

Generalizing is when you try to think of every possibly input and output, rather than some specific example.

One big reason I am glad that I went through a traditional bachelor’s of science is that I received training in math that I would likely have never have pursued on my own.  I went into the program thinking I would dread the math, but ended up benefiting a lot from the material and actually had really good grades in Math. 

One takeaway I got from the math is that it helped teach me, practically, how to generalize. 

If you think of something in math, such as f(x) = 2x, we’re saying that for any input, we get twice the output. 

This is a general solution, rather than saying we input 2 and get 4, we say we input x, and get 2x. 

This skill was practiced even more when I got to proofs. To prove some general rule, you have to think of how to prove that in the general case (meaning without the use of an example to prove your point). 

I’m by no means saying that going through a bunch of math courses is the only way to succeed in software engineering. All I’m saying is it is a good way to practice techniques that directly translate to coding software. 

Let’s look at generalization when it comes to input for our code. 

Dealing with Input and Output

If you consider our code to be a black box (meaning all you need to worry about is the input and output), you’ll quickly see the vast number of possible inputs. A developer must not only consider what happens when the user puts in something expected, but also when the user puts something totally meaningless.  This is especially true for applications that take user input in the form of commands or text.

There are multiple ways to deal with the possible inputs. The first way is to restrict input. The most common way of restricting input is by going from a CLI (command line interface) to a GUI (graphical user interface).  A GUI is able to restrict possible inputs by giving the user a limited number of options to select from. This will work in many cases, but not all (specifically, don’t expect this to work with web development).

The second way to deal with the possible inputs is to validate every input. This is something not often required in school projects, so it is often not well taught. This means that if you expect a particular type input, you must check if the input is valid. If it’s not, the input is rejected and the user must be asked again for another input. Expecting a number?  Reject strings.  Expecting a a letter?  Reject numbers. Etc.

Hard Coded Data

One thing that kills generalization is hard coded data or values. Something is hard coded when it is typed out in the source code. Hard coded values destroy generalization because once the software is delivered to the customer, that value is permanent.  The only way around this would be to change the value in the code and deliver an update to the user.  

Whenever possible, data should be placed outside of the source code.  This can be done in a text file, some sort of configuration file, an excel file, a database, or asking the user a question. The hard coded value is then replaced with code that reads the value from the file or asks for user input.

In this situation, the configuration file is also delivered to the customer with the final software. We can change the way software works by changing the values in the configuration file instead of having to give them a whole new version of a software (imagine having to get a new version of Microsoft Word every time you wanted to change a setting).

Here is an example of hard coded data:

 This is a better:

 You can clearly see the potential for garbage input here.  What if I put “pizza” as my name?  It’s vital that we validate all input prior to processing it to do something useful. 

Branching

Once we are able to take input, we can begin branching the possible paths in our program. For example, if we have a game that you must be 13 or older, we can ask the person for their age. If they say something less than 13, they are rejected, if they say greater than 13, they are given access. Branching is most easily done with an if statement.

Conclusion

As you begin to understand more on what coding is, how we write software, and how to design and engineer solutions, you’ll begin to see that the vast amount of knowledge required can be overwhelming. Because of this, I thought it would be useful for me to recommend some resources that could help guide you through the process.

Because this blog series is looking at general software development / engineering, I am listing some resources that will help you think about how software is built sucesfully, what things to understand when going into software engineering, and some of the top books for different types of projects. 

For everyone – Cracking the Coding Interview. This single resource has helped me tremendously. Not only does this book go over technical questions and answers, but it gives a lot of context as to why technical interviews work the way they do. This summarizes the most important stuff covered in a 4 year computer science program. In my opinion, this content of this book is useful beyond the interview. I’d highly recommend it as the one resource to guide you through your computer science education. Although this book can be more challenging at times, it will at least be a good guide on what you should understand, which will ultimately help you know what you need to research.

For everyone – The Agile Samurai – This book was a tremendous help to me when I wanted to understand more about the software development lifecycle. It guided me on how to get actual software out the door.  This is a great book if you need something to casually read as it is not very technical, but tremendously helpful! I actually listed to the audiobook

For those interested in C – Programming in C – This book is a basic introduction that will guide you through coding in C.  This is a great resource if you are interested in doing more lower level programming. By that I mean you are closer to the operating system and have a lot more to worry about.  That being said, lower level programming often helps you understand things at a much deeper level. I’d highly recommend reading this for even only an hour a day. You will make progress on your understanding of C.

For those interested in C++ – The C++ Programming Language – Highly rated, highly known in the industry, I’d recommend this book to anyone who wants to really understand C++. It was authored by the creator of C++ (Bjarne Stroustrup). This would be a great read if you are interested in lower level programming with more universal capabilities (like object oriented programming). 

For those interested in Python – Python Crash Course – This is a book that will give you what you need to know on Python.  This would be a good place to start if you want to build applications but not get so tied up in all the syntax.  Python is a perfect programming language for beginners as it is very intuitive and easy to read. Not to mention that Python is very popular and is becoming even more so with the rise of machine learning, deep learning, and other varieties of artificial intelligence. 

For those interested in JavaScript – A Smarter Way to Learn JavaScript – This is a very popular book with nearly 1,500 ratings on Amazon. Designed with hands-on in mind, this book has repeatedly taught people how to effectively code in JavaScript. JavaScript is the choice for you if you’re interested in building web applications. It is currently the most popular language.  JavaScript frameworks are hot in the market (Angular, React, etc). These may be important for you to learn, but I highly recommend you invest your time in learning the core JavaScript language as this is going to help you learn frameworks at a much higher speed (which is important because there is a new JavaScript framework every .3 seconds).

For those interested in C# or Java – Murach C# or Murach Java – I own and have read numerous Murach books. One unique thing about these is that they break each section up into one page of text and one page of code examples. This makes going through the content very simple. I read through Murach C# from cover to cover and I loved it. It took me a really long time, but was totally worth it.  Why would you want to go for C# or Java?  Primarily because the language is C++ like in syntax and object oriented programming, but it is managed.  A managed language deals with a lot of the memory with features like garbage collection. Not to mention C# and Java can be used for web development and various other solutions (such as Xamarin for cross platform mobile app development). 

Lastly on the list we have Swift!  Swift is the language for iPhone development. So if you’re looking to make the next big app, consider looking into learning Swift. The book listed is a great option because (at the time of this writing) you can get the kindle version for like $3. 

And here is my video What is Coding Part 2:

I hope these resources are helpful. The journey here is not yet done though, so be on the lookout for part 3!

What is Coding? 💻

So you want to learn about coding, huh? Maybe you’re looking at becoming a software developer or software engineer. This is the blog for you! I’m hoping to give you the 10,000 foot view of coding.

Be sure to check out the sponsor of this post, Pramp, where you can get the best preparation for technical interviews.

Code is the basis for every software application. An application is anything that runs on a computer. This includes any websites you visit, games you play, apps you use, etc. Software is now embedded in almost everything, such as all modern day vehicles.

Coding is done with programming languages. There are a ton of computer programming languages: C, C++, C#, Java, Python, Go, Scala, F#, Ruby, etc. Each computer programming language is slightly different and each one has their own keywords.

Coding can be described as a black box. This means that the user does not have to worry about what is going on behind the scenes of an app. If a user opens an iPhone app, they do not care about the code. All they care about is that when they click a button (input), something happens (output).

When something is described as a black box, it is an example of an abstraction. An abstraction hides the inner detail and the user only has to worry about the interface, or how to interact with the program. I like to think of a car.  In order to drive a car, you only need to know to use the interface (steering wheel, pedals, etc). The inner details (engine…other car stuff) are hid away.

There are two primary types of interfaces: a CLI (command line interface) and a GUI (graphical user interface).

A CLI allows the user to interact with your application through commands. This is often done in either a unix terminal or a windows command prompt: Your code will output different things depending on the input given through the CLI.

One of the primary benefits of a CLI is that you can write scripts to interact with your program.

The alternative to a CLI is a GUI, which graphically allows the user to interact with your application through buttons. A perfect example of this would be essentially any app you use on a mobile device, game, etc.

Coders tend to like programs that have a CLI because they can execute commands against the application themselves or create a script to do it for them. Non-coders prefer a GUI as it makes working with the application much easier.

The application that has an interface that can be programmed against is said to have an API. An API (application programming interface). An API will have various commands that can cause the application to do something (this is very similar if not the same thing to a CLI, but you may not interact with it through a terminal). A simple example of an API would be the Twitter API. The Twitter API opens up commands that can be executed programmatically (through another application).

Each command available through the API is known as an API endpoint. Applications will be written to take advantage of these API endpoints. A common example of this would be an application that monitors Twitter accounted through the Twitter API and ranks them socially. Another example would be an application to schedule tweets. Prefer video format? Check out my video: 

Now that you’ve got the absolute basics figured out, I’d recommend you pick up a book covering an area of interest for you.  In the next blog I go over some of the most popular books used to teach programming for various topics (Cracking the Coding Interview for interviews, algorithms, and data structures.  The The Agile Samurai for the software development process) and languages (C, C++, Python, JavaScript, C#, Java, Swift).

So get to it! Here’s part 2.

Db2 SQL Tutorial 4 – Create and Drop Table

So far we’ve learned about the very basic concepts of Db2, but now I want to dive in a bit and work with tables. Some of the material in blog may take a few more blogs to fully understand, but that is okay. The goal is to get you started. Later we will come back and learn all the details.

SQL

We will be using SQL (Structured Query Language) to communicate to our database. DSM (Data Server Manager) is the UI (user interface) that we can use to work with our database, but we should do as much as we can with SQL. Why use code when there is a user interface? I’m glad you asked.

The user interface is a great way to work with our database, don’t get me wrong. The problem with using the UI exclusevely is that you are limiting your usefulness. When writing an application that interfaces with the database, you are going to have to use code.

Additionally, SQL is a standard language that is used widely by other databases. As a developer, it is important to learn the information that is going to transfer between technologies. If we learn the SQL standard, we can easily work with Db2, Oracle, SQL Server, MySQL, etc, all of which are based on the SQL standard. 

How to Use SQL in DSM

There will be a panel on the left containing a button Run SQL that will open a window similar to this one:


Upon opening this Run SQL tab, you will be prompted to give your credentials. These were set during the installation process (db2inst1 was the default and you chose the password):

You can optionally check Save credentials to repository to save you some effort in the future.  

Creating a Table

Once you have the SQL panel open, type out this code:

This is the SQL needed to create a table with 6 columns. Each column is put on a new line with the data type immediately following the column identifier (column name). Putting each column on a new line is simply a convention. SQL is whitespace insensitive, meaning you can put spaces, tabs, and newlines to space out your code as you like. This is equally valid:

You may think this is harder or easier to read. It is ultimately up to you how you space out your code.

Once you have your SQL ready, you can execute your code by using the Run dropdown or issuing the shortcuts (such as CTRL S):

The Run Selected command will execute the SQL command that is currently clicked, up until the next semicolon (;). Alternatively you can highlight over multiple statements and then click Run Selected to execute them all. 

The Run All button will execute everything in the document.

Once the script executes, there will be a new entry in the result panel below:

 If for any reason there is an issue, the results will look like this: 

Clicking the Log button will give you more information on the error and a link to the documentation for that specific error:

Once you have the table created, you can find in using DSM by going to Administer -> Tables.

From this page you can easily see all of the database tables. You can also see that the users table we just created is in the default db2inst1 schema. As we talked about in the data structure blog, a schema is a logical grouping of database objects.  We will discuss schemas in more detail in the future.

Dropping a Table

We can drop this table that we just created very easily by issuing this SQL:

Be careful executing this command because it will delete everything inside of the table.

Looking to the Future – Primary Keys

We will discuss primary keys in detail, but for now it is important to understand that it is wise to label a particular column as a table’s primary key.  Thus, if we recreate the table we just dropped, we might do it like so:

This addition is essential when it comes to data integrity, which we will discuss in more detail soon.  Essentially, though, adding the PRIMARY KEY keyword forces each row to be unique and will be used as the basis for relationships between tables.  

Conclusion

We’ve learned the very basics of DDL (Data Definition Language). This is the section of SQL that involves defining the structure of our database. Two of the important commands in DDL are CREATE TABLE and DELETE TABLE. Once we have the structure, we can begin working with data. That’s what we will be doing in the upcoming blog. 

Db2 SQL Tutorial 3 – Data Structure and Data Types

Before building a database for yourself or a corporation, you need to understand a bit about database structure.

We are going to go in some depth on how to design a database, which deals with structuring your database the best way possible. However, I thought it would be appropriate to give an overview of the Db2 structure and start using the database right away.

Once we have the fundamentals down, we are going to then go back to database design to learn the details. This will allow us to more concretely discuss database design because we’ve already had some experience with the fundamentals.

The very first thing that you need to know is that data is organized into different tables. We first discussed tables in the Introduction to Db2. Each table describes one type of thing (for example, users). You should see the table as the blueprint describing how the data should be structured. Each row is an instance of whatever the table describes.

When we discuss database design, we will fully understand how to break up our data into different tables. For now, the important thing to understand is that a database consists of multiple tables related to one another.


Db2 Schema

In a Db2 database, tables are organized into schemas. A schema is just a fancy word for group. This allows us to organize our database in a more logical way when working with a large database. Not only are tables organized in different schemas, but all objects are.

The things we create in our database are called database objects and they are all going to have a name. These objects include tables, indexes, functions, stored procedures, triggers, etc.

When we create an object, we are going to give it a custom name to identify it. This name is called the identifier.

The schema name in combination with custom identifier is the fully qualified name of an object.

As an example, if we have a table named users, and this table is inside the db2inst1 (the default) schema, the fully qualified name of our object would be db2inst1.users.

Data Types

For a minute I want you to try to visualize data. If you are looking at the structure of a company’s data from a thousand miles away, you will see a database or multiple databases. The database is the biggest piece containing our data. If we zoom into one database, we see schemas. These are the next biggest piece, second to the database. Zooming into one schema, we begin to see our database objects, such as tables. Zooming into one table we begin to see our columns and rows. Well, if you zoom in on one column, you will find even more detail.

If that made no sense, I’m sorry. Google Maps doesn’t offer the data-architecture zoom feature any more… Essentially, what I am trying to say is that we not only have to think about the structure from the big perspective, but also the very small details all the way down to the column level.

Every column is going to have a data type, which determines the type of data (was that too obvious?) each row can store in that column.

So for example, is the column going to store numbers, strings, a character, or binary? The column’s data type determines what is allowed.

Data types are a huge topic, so we cannot cover it all in this introduction. Let’s talk about the basic data types we’ll be using in the early blogs. First, stare at this image from the Db2 docs diagramming the types of data types:

Source

As you can see from this chart, there are a lot of data types grouped in different categories.

Later on we will go in depth on the different data types, but for now we are going to learn just a few that we will be using immediately in the beginning of this series.

INT

The integer data type (INT) is used for whole numbers and can be used to give each record an ID. Later we will learn about primary keys (unique identifiers), but for now know that the int data type is very important.

CHAR and VARCHAR

CHAR and VARCHAR are used for storing string data. A string is a sequence of characters, where a character is any acceptable letter/number/symbol. Strings are often signified by opening and closing quotes: “This is a string” is a perfect example.

The CHAR data type is used for strings that are supposed to be the same length for every single row. The VARCHAR data type is used for variable length strings. Think VARCHAR = VARiable CHARacter.

BOOLEAN

The BOOLEAN data type is used to store data that is in two states, usually true or false.

DATE, TIME, and TIMESTAMP

The DATE, TIME, and TIMESTAMP data types are all various data types used to stored dates and time. Later on we will get into the specifics of how this data is formatted, but for now just know that they exist and should be used when working with data relating to time and events.

Conclusion

The important thing to take away from this blog is a general understanding of how data is structured in a database and that we have to be explicit when it comes to the type of data. We control the type of data by giving each column a data type.

IMPORTANT: In the next blog we are going to begin using Db2.  This means that you should have an edition of Db2 installed and ready to go.  To get the Community Edition, use this link.  The installation is pretty straightforward and will hopefully not give you any problems!

Db2 SQL Tutorial 2 – Getting Started with Db2 and DSM

Here it is, your guide on getting started with Db2! Working with new software can be stressful, so my goal is to make this as easy as possible for you. Fortunately, Db2 setup is not too challenging.

Db2 Editions

Before we dive in and start downloading stuff, I want to take a moment to explain a few of the different editions of Db2. First off, we will be using Db2 for Linux, UNIX, and Windows. The edition we will be using is called Developer Community Edition (You may hear it as Developer-C) which is a full-fledged version of Db2 with the only limitation that it is only to be used for development.

When you build an application with Db2, you can deploy a free version of Db2 which will work great but has limited capabilities. This is called the  Express-C edition. This edition is great because we are able to use Db2 in a production environment, but this edition may not meet the needs for your application.

If you are releasing an application, there are multiple editions that can meet your needs exactly.


The important thing to know is that the various editions of Db2 share the same code. Thus, switching between versions is not an obstacle when it comes to scalability.

Installation

First, download Db2 Developer Community Edition here.

Go through the simple installation steps.

You are going to want to make sure that you check all boxes (Docker, Data Studio, and Db2) so we download everything we are going to need for these blogs.

It will then ask for some credentials.

The first set of credentials is for Data Server Manager (DSM), which is a tool used to work with the database. The next set of credentials is for the database itself (user…? learn more about authentication). You will also get an option of which sample database to install. I recommend you install the database for transactional workloads if you are unsure:

Once you click continue, Docker will be installed. Docker is a tool that is used by Db2.

Does Db2 Developer edition come with DSM Enterprise or base edition?
https://www.ibm.com/developerworks/downloads/im/dsm/index.html and what changes are there in this newest version? Are parts of data studio moved to DSM or does data studio do everything DSM does plus more? Is DSM enough to do everything with the database or do you need data studio? If you need it why is it optional in the installation?

The primary tool we are going to be using is DSM, which allows us to easily work with our Db2 Database. Within DSM we can execute SQL against our database.

Let’s get into the next part to learn more!

Db2 SQL Tutorial 1 – Intro to Db2

First off, what is a database? A database stores data, which is any information that might be of use to a company.

Db2 is a database created by IBM that is best known as an enterprise database, so we are talking about a database used by huge companies who have more data than they know what to do with. The great part for us is that that there are multiple editions of Db2 going all of the way up to the enterprise editions, but actually starting with a completely free version. That’s what I’ll be using.

So back on the topic of data. A database will often be used behind the scenes when you are using some software. Anytime you put information in software that you can access at a later time, that information will be stored in a database.


What’s the Big Deal with Data?

Databases are often used by businesses and organizations. The data that they store in a database is anything that they need to maintain their business, but also grow their business.

As an example, a company will likely keep track of all their sales and store that information in a database. They can then look back at their data and use that to figure out what products people like, which ones are earning the most money, and which ones they should discontinue. You may have heard of analytics, which is used to analyze data.

Structured and Unstructured Data

The data inside of a database is often represented in tables. A table will have columns and rows.

At the top we will have column headers that describe the structure of our data. Let’s say we have a table, users:

idfirst_namelast_name
1CalebCurry
2AmyWhite
3LeahSmith

Each row is an individual user. This is how we store structured data, which fits nicely in a table. Db2 also supports unstructured data, which doesn’t necessarily fit the structure of a table.

There are endless ways you could represent this as unstructured data. One example would be like so:

Notice that we have the email now included for Amy. This syntax is called JSON which is a standard for representing unstructured data and is supported in Db2. Right now, the exact way we represent our data doesn’t matter. What matters is that we know Db2 is versatile when it comes to storing data.

Why a Database?

Why would you use a database when there are so many other options out there for data storage. As an example, why not use spreadsheets to store our data?

The first reason is data security. If we were trying to use something such as a spreadsheet to store our data, we would be in serious trouble because spreadsheets are all-or-nothing. If you want to share a spreadsheet with coworkers, you have to send them the whole document. This could work okay for small companies, but once we get a few employees this can introduce some serious security issues and data integrity issues when you make all of your data accessible to the organization. Even if you restrict access to certain people, those select people will still have access to the entire spreadsheet.

A database allows you to be extremely selective on who can read certain data and who can add data to the database. Imagine you had a spreadsheet with a social security number column. This is extremely sensitive data that you could not easily hide. With a database, you can restrict the reading of data at the column level through views showing only the data you want readable. There are immense capabilities when it comes to securing our data with a database.

The second reason is scalability. Scalability is the ability for you to easily grow your business without outgrowing your technology infrastructure. A simple example of this is how a database supports a very high number of concurrent users. Databases are built to allow multiple connections at one time without the risk of data conflict. This means as your data needs grow, you already have the infrastructure to keep up. With Db2, there is virtually unlimited scalability thanks to technologies such as pureScale, Db2’s clustering technology, and BlueMix, a cloud platform where you can easily provision 32 x 2.7 GHz Xeon processors and a terabyte of RAM for your Db2 database.

A database offers disaster recovery, data compression, and is designed to work well with programming languages. Db2 can be coded against using all of the major programming languages and frameworks–Java, C/C++, .Net, Ruby on Rails, PHP, Perl, and Python.

If you are building an application that is going to have any complexity or more than a few users, a database is essential. Of all the database options out there, Db2 makes a pretty good choice.

Why Db2?

First, the price of starting off with Db2 is free. That’s usually enough to get me interested, but as I mentioned before, Db2 has multiple editions allowing you to scale from being a sole developer in your mom’s basement, to running a fortune 100 company across the world. Db2 offers more than the competitors at a cheaper cost. Many companies have experienced cost savings of over 50-60% compared to Oracle Database, a competitor of Db2.

The lower price tag is not at the expense of performance. Users of Db2 have also experienced extreme performance through built in technologies such as BLU acceleration, which allows for extremely fast analytical processing.

Throughout these blogs we are going to explore the basics of Db2 and also some of the cool capabilities.

Db2 is a database that you need to add to your resume. Check out the next blog to learn more!

Further Study

Check out the first video in my Db2 Video Series (coming soon)!

Download Db2 and give it a go!

Database Clustering Tutorial 11 – ClusterControl Scripting With Developer Studio

We are finally to the cool part of the series!

ClusterControl allows us to do custom monitoring and automation through scripts that can be run regularly. These scripts are run through a tool called Developer Studio.

To begin looking at this, open your cluster and when you are in the cluster overview, hover over Manage and click Developer Studio.

From here you will see a ton of scripts that were already built into ClusterControl.

For example, s9s -> mysql -> schema -> schema_check_nopk.js.

In schema_check_nopk.js, you will find a comment describing what it does.


Now let’s look at the language used.

DSL

The language used here is an example of a domain specific language (DSL), which is basically a language that has absolutely no value outside of the application that it is in. Fortunately, this one is very similar to JavaScript, so it’s nothing too crazy. It even ends with .JS.

Script Categories

There are basically two categories of scripts:

  1. Those that get ran on a schedule (called advisors).
  2. Those you can run as needed. (This could simplify a process that woud take a while to do by hand).

Advisors

Now let’s take a look at the advisors.

Click the little Advisors button.

This will bring up a menu which shows that these advisors are run on a regular basis. You can scroll down to see the one I just showed you and see that it is being ran every 20 minutes. You can see that these scripts are used to monitor the health of your database and it is all automated because they can be scheduled.

Script Breakdown

I am going to break down one of these scripts in order to go through the process of writing a script. This is going to be from a super high level but hopefully it will get you started and help you understand how this works.

Let’s take a look at s9s -> mysql -> programs -> change_user_password.js

The first thing you should notice is this:

Including this file allows us to connect to and query the database nodes. This is a file that you can view if you just follow the path.

Variables

The next section are these variables in all caps. For example:

I am guessing they are in all caps to treat them as if they were constants that are going to be referenced throughout the script. To appropriately use this script, you plug in a username and password, and then click compile and run.

Function Main

The function main is the part of the script that gets run line by line.

The first line in this script has the hosts variable cluster::mySqlNodes.

This will get all of the nodes in our cluster that are running mysql. We can then loop over them with this hosts variable to change the password for every node.

You can also use cluster::galeraNodes() here instead.

This is a little different but in our situation it will work the same. If you remember, a while ago I mentioned that Galera does synchronous replication. Well, this function will get all nodes that are part of Galera, whereas the mySqlNodes will get all nodes including any additional ones we might have added that are replicating asynchronously.

For Loop

The next section is a for loop which is used to iterate over all of the nodes.

The code inside of the for loop is going to be ran for each node. The very first thing inside of the for loop is getting a reference to that individual host rather than all of them. The first iteration idx is going to be zero, and the second time around idx will be 1. Thus, we can access different nodes each iteration.

Although I don’t understand them in their entirety, the next two lines are important because they allow us to later run a conditional on whether we can successfully connect to the database:

So if this connected variable evaluates as true, we are good. If not, we’re doomed.

Once we are connected to that node, we update the password. If there are any errors, we print those and stop running code. If we are error free, the success message is displayed and we move on to the next database node.

Conclusion

Now that you have a better understanding of writing scripts, maybe you can try writing one to be ran on a regular basis. There are two example articles on the Severalnines website that will guide you through a more advanced script that can be used to automatically monitor your database for you:

 Further Study

  • Read the two Severalnines articles mentioned above to get a more in depth look at examples of working with Developer Studio.

Database Clustering Tutorial 10 – ProxySQL Load Balancer

In this article, I’m going to talk about load balancers and guide you through installing ProxySQL. Now, this is only a light introduction with just enough information to get you started.

The Purpose of a Load Balancer

A load balancer is like another addition to your database that adds a layer of magic.  The load balancer is a piece of software that will automatically balance how much each node is being hit to protect from overworking a particular node and slowing down the entire database cluster.

If you do not have a load balancer, and you were writing a software application, you would manually decide what nodes to read and write from.  This is not always good because you might not know ahead of time how well a particular node is going to keep up with your demands.

With a load balancer, you make every piece of code talk to the load balancer, and then the load balancer will distribute the demands across the nodes.

The load balancer of choice is ProxySQL.


This article is really going to show how great ClusterControl is for automating things, because we can set up ProxySQL in a few clicks.

Installing ProxySQL

On your cluster overview, click the little menu button on the far right.  Then click add load balancer.

It will come up with five options of load balancers to choose from, but it will go to ProxySQL by default. We’re going to stick with ProxySQL.

Now, we are going to want to install ProxySQL on the Controller node (the node ClusterControl is installed on). Why? Well, imagine what would happen if a node happened to go down and it was the same node that your load balancer is on. Since you communicate with the load balancer to talk to your entire cluster, you would now have big a problem.

This is why it is best to install it on the controller node.

Now, under ProxySQL User Credentials give an administration password and monitor password. 

For the Add Database User part, we are just going to ignore that part for now. We can do this very easily later on, so don’t worry about that right now. I will be instructing you through configuring this once we have the load balancer set up. 

For Implicit Transactions, I am going to click No.  If you Google implicit transactions and Severalnines, you’ll probably find an article discussing ProxySQL and how this option affects things, but I’m not going to delve into that right now.

Now click Deploy ProxySQL, and then go to the activity viewer to view the job’s progress. 

Once the job is done, you can click the Full Job Details to get connection details and ports. When you communicate to this proxy, you are going to use the IP address and port that it will show you.  You will use this within the connection string for your application.

Now, go to the cluster overview and click the Nodes tab. In the left panel you will see ProxySQL and under it will be the node that is part of it. Click that node and then you will see some fancy graphs and stuff.

Before we move on, I want to discuss the ProxySQL Host Groups.

There are two hostgroups to choose from: Hostgroup 20 and Hostgroup 10.

The difference between the two is that hostgroup 10 are readers and writers, which means that the databases in it can be read from or written to, while hostgroup 20 are only readers, which means you may only read from them. This information will be helpful when we go to create a new user.

Creating a New User

To create a new user, first click the Users tab.

Then click the Add New User button on the far right.

Now under the Create a new user tab, give it a username and password.

For the Default Hostgroup, we will want to choose hostgroup 10. Remember that hostgroup 10 are readers and writers.

Leave the Default Schema set at Null.

Give a database name.

Now, for the Grants, it is asking what privileges you want this user to have. Set it to All Priviledges (for learning sake).

The way ProxySQL works is that when you try to connect to the database using the proxy, ProxySQL is going to authenticate the user. Once the user is authenticated, then ProxySQL is going to go to the database. Because ProxySQL does its own authentication, the user must exist in ProxySQL and the database.

Now, to finish, click Add User.

Conclusion

Now obviously, I did not explain everything. I could make an entire series just on load balancers, so I understand that my explanations were a bit thin. That’s OK. You now have the basic understanding upon which you can learn more on your own.

Database Clustering Tutorial 9 – Setting up a Cluster in Clustercontrol

We’ve got most of the theory figured out, we managed to get a bunch of virtual machines up and running, and we even installed ClusterControl, but now it’s time for the fun part. We are going to create our first cluster!

Getting Started

If not open already, travel to the webpage you made your admin email and user name on in the last previous blog. The website will be:

localip/clustercontrol/

For example, mine is: 10.71.34.101/clustercontrol/

Once you get there, the “Welcome” page should show up. If not, open it by either clicking the “Get Started” button or the “Deploy” icon.


Once it comes up, click “Deploy Database Cluster.”

Now, this will come up with some forms that we can fill out.

But hold up Bro! Don’t start filling those forms out yet.

If you look at all of the different tabs, you will see several different ways we can make a cluster depending on what database and clustering technology we are using. So, we need to make sure we’re in the correct tab!

As we mentioned before, we are going to be using Galera, which is for the MySQL database. So, select the tab MySQL Galera from the menu.

Now, we can start filling out the forms.

SSH & Cluster Name

The software is going to use SSH to connect to the other machines, and once it does, it will need to use sudo privileges. In order to do that, we need to give it the appropriate user and password that has those permissions. This is the account we’ve been using throughout this blog series.  IMPORTANT: We want this user to exist on all of the machines with the same password.

Now, give a cluster name and check each of the three boxes underneath. Then click Continue.

Choosing A Vendor Database

Now, in the second page, under Vendor, it asks us what database we are going to be using. This is totally up to you. These databases will actually be created on the other machines!

Now this can seem a little confusing because we selected the MySQL Galera tab at the top. So why is it asking about different databases? Well these are different types of Mysql and Galera. There are actually a few other database systems out there that are very similar to MySQL. Percona, for example, is like an extended MySQL. MariaDB is a database that was created by the original creators of MySQL after Oracle acquired MySQL.

The third option is this Codership thing. What in the world is Codership? Codership is the company behind Galera, and they actually built Galera for the default MySQL. Percona and MariaDB provided their own versions.

I chose to go with the MariaDB, but, again, it’s totally up to you which one you choose. If you are working in production be sure that the version you choose is supported on your operating system.  

Next, decide what you want the MySQL root password to be and type it in.

Adding Nodes

Now, under Add Node, put in the IP address of each of your two master nodes, one at a time.

In the box that appears for each node, make sure that each is powered on. The green check marks tell you this.  As a reminder, this requires password SSH, which we set up earlier in this blog.  If you are jumping in out of the blue, you may want to start from the beginning

Creating a Cluster

Now, click Deploy, and then it will begin creating a cluster!!!

This process may take a while. You can watch the progress of this by going to the Activity Viewer.

If, for some reason, you can’t seem to get it working but you desperately want to continue, go back to the installation page and try one of the other vendors: Percona, MariaDB, or Codership.

Once it’s finished, click the little logo to go to an overview page… And there is your cluster!

Conclusion

You have now set up your first cluster! In the next video I am going to be talking about how to start working with this cluster.

Further Study

  • Research the differences between Percona XtraDB, MariaDB, and Codership.