Why Cloud Scalability Matters

When it comes to the nuts and bolts of running a business, a lot has changed over the past 40+ years. It’s hard to believe we used to physically write everything down: appointment times, receipts, data, calculations, the list could go on. And then walked in the personal computer.

Now we had a way to drop the pencil and put our fingers to work. Now we had the ease of locating what you are working on rather than flipping through hundreds of pages. Once computers made it into the mainstream, it allowed businesses to instantly find what they needed and eliminated the possibility of wasted time.

Of course, there was eventually going to be something that would go beyond what a personal computer could do. That’s when cloud computing began to make its debut in the business market.

But what makes cloud computing better than what businesses have been using for 40 years? The answer: scalability.

Here, we will discuss why scalability in cloud computing matters to your business and how turning to cloud computing will be the best decision you ever made for your business.

Maintains Growth of Business

By definition, scalability is the capacity to be changed in size or scale. With scalability in cloud computing, it means having a system in place to easily scale up or down, depending on your business. And you can do all of it instantaneously.

Before the cloud, businesses wholly relied on servers to keep up with their growing business. While this method can essentially get the job done, it’s cloud computing that can do it faster and more efficiently.

As your business grows, there will come a point in time when the infrastructure you have in place to hold all your data will be at capacity. For years, the answer was to add another server to the system (on-premise environment). But this takes time. A budget has to be considered, the equipment has to be bought, the time has to be set aside to set up and connect this server to the ones you already have.

And when you are done with the set-up, the server (along with all the others) also has to be maintained by your IT staff. Murphy’s Law states if something can go wrong, it will. Troubleshooting will be a major component of maintaining your servers. This can take away time people can get their work done, and it takes your IT staff away from being able to work on other important projects.

Scalability in cloud computing keeps up with the size of your business and can do it now, not in weeks. As your business grows, so to will the storage capacity available for your business. Should you choose to downscale, scalability in cloud computing gives you the freedom to reduce your level of storage, so you are not wasting any space, time, or money.

Reduce Cost Without Sacrificing Progress

That brings us to the next reason cloud scalability matters for your business. When it comes to servers and an on-premise environment, if your business is growing, the cost to run your business is ever-expanding.

As you grow, additional servers will need to be filed into your office. For most businesses, there is an entire room dedicated to the servers that run their business. Not only is this hardware taking up physical space, but it also takes a good chunk of change to maintain.

Let’s say you have 7 servers. At any given time, two or three could be down. That takes at least two IT employees away from their everyday responsibilities so that they can fix the servers, get them back up and running, and everyone can continue working. Not only does it cost you money, the downtime means you are usually losing money too.

By using cloud computing services, there is no need for all the hardware or the extra room. There is no need to take your IT staff away from what they need to be doing.

Cloud computing can be as big as you need it to be without taking much time and absolutely no physical space. On-premise environments can also be as big as you want them to be. The difference is that with on-premise, you will continue to buy hardware to hold all of your data. And to keep your business running, you will have to continue to need to buy better hardware when the ones you already have become outdated or break.

On-premise is never a one-and-done kind of thing, there will always be upgrades and updates. When you have an update in cloud computing, it can be done automatically and not interfere with work. No hardware required. No cost needed.

It can be difficult to change what you are used to. You have your system and it works. But what if it could work better and grow faster, all while costing you less? That is what scalability in cloud computing can do for you.

Ready to see how cloud computing can make your business more efficient? Click here to learn more about Code Willing’s cloud-based file system and resource manager, CASFS+.

Increasing Upload Speed With CASFS

100x Faster Database Inserts by Code Willing

Code Willing has determined the most efficient way to upload large amounts of data into a database hosted on their custom-built file system.

Following extensive research, Code Willing greatly decreased the time it takes to insert massive amounts of data into a TimescaleDB SQL database hosted on their file system, CASFS.

During their experiment, Code Willing used two different types of uploading methods: upserting data using Pangres and copying the data by using Timescale-parallel-copy.

The data being used in this research was time-sequential financial data daily files, with each of them having 2,229,120 rows and 19 columns.

The control variables during the course of the investigation were the size of the financial data files remaining identical for both uploading tests and using Pandas to process the data before inserting the data into the database.

While using the Pangres method, it allowed for easy insertion from Pandas dataframes into PostgreSQL databases using their upsert method. The upsert method takes advantage of Postgres’ insert function and handles issues like duplicates, adding missing columns, and datatypes.

By using the Pangres method, the upload time for inserting the data was 14.23 minutes (853.8 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Python Notebook 8 64 1 Pangres No none 14.23

Following this result, Code Willing wanted to find a quicker and more efficient way to upload their data. That is where the second uploading method came in.

Timescale-parallel-copy is, “a command-line program for parallelizing PostgreSQL’s built-in `copy` functionality for bulk inserting data into TimescaleDB.”

Code Willing performed three different ways to insert their data using Timescale-parallel-copy.

The first test was where the server-data relationship was local. The result led to an upload time of 1 minute (60 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 1 Timescale-parallel-copy No local 1.00

 

For their second trial, they considered using parallel processing and increasing workers by 1 to decrease the upload time further.

This resulted in an upload time of .35 minutes (21 seconds). 

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 2 Timescale-parallel-copy Yes local 0.35 

 

During the third trial, Code Willing set up a TimescaleDB on a separate server and ran it continuously. This is described in the variable “Server_data relationship” as “not local.”

With this change in place, everything else remained the same, including parallel processing, except for changing the number of workers. After testing 1, 2, 4, 6, and 8 workers, they determined 8 workers were the most efficient. The upload time with 8 workers showed .13 minutes (8 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 8 Timescale-parallel-copy Yes Not local 0.13 

 

Through this research, Code Willing improved Pangres’ upsert method 99.1%, going from 14.23 minutes to 0.13 minutes, by using TimescaleDB’s parallel copy functionality and implementing Ray’s parallel processing.

To break it down even further, this new method allowed them to copy 278,640 rows per second.

In the end, Code Willing’s experiment determined when uploading copious amounts of data, the most efficient and quickest way is by using the built-in “copy” function (like Timescale-parallel-copy), as well as parallel processing with large amounts of workers to decrease the upload time of time-sequential data into a Postgres database (TimescaleDB).