Why Cloud Scalability Matters

When it comes to the nuts and bolts of running a business, a lot has changed over the past 40+ years. It’s hard to believe we used to physically write everything down: appointment times, receipts, data, calculations, the list could go on. And then walked in the personal computer.

Now we had a way to drop the pencil and put our fingers to work. Now we had the ease of locating what you are working on rather than flipping through hundreds of pages. Once computers made it into the mainstream, it allowed businesses to instantly find what they needed and eliminated the possibility of wasted time.

Of course, there was eventually going to be something that would go beyond what a personal computer could do. That’s when cloud computing began to make its debut in the business market.

But what makes cloud computing better than what businesses have been using for 40 years? The answer: scalability.

Here, we will discuss why scalability in cloud computing matters to your business and how turning to cloud computing will be the best decision you ever made for your business.

Maintains Growth of Business

By definition, scalability is the capacity to be changed in size or scale. With scalability in cloud computing, it means having a system in place to easily scale up or down, depending on your business. And you can do all of it instantaneously.

Before the cloud, businesses wholly relied on servers to keep up with their growing business. While this method can essentially get the job done, it’s cloud computing that can do it faster and more efficiently.

As your business grows, there will come a point in time when the infrastructure you have in place to hold all your data will be at capacity. For years, the answer was to add another server to the system (on-premise environment). But this takes time. A budget has to be considered, the equipment has to be bought, the time has to be set aside to set up and connect this server to the ones you already have.

And when you are done with the set-up, the server (along with all the others) also has to be maintained by your IT staff. Murphy’s Law states if something can go wrong, it will. Troubleshooting will be a major component of maintaining your servers. This can take away time people can get their work done, and it takes your IT staff away from being able to work on other important projects.

Scalability in cloud computing keeps up with the size of your business and can do it now, not in weeks. As your business grows, so to will the storage capacity available for your business. Should you choose to downscale, scalability in cloud computing gives you the freedom to reduce your level of storage, so you are not wasting any space, time, or money.

Reduce Cost Without Sacrificing Progress

That brings us to the next reason cloud scalability matters for your business. When it comes to servers and an on-premise environment, if your business is growing, the cost to run your business is ever-expanding.

As you grow, additional servers will need to be filed into your office. For most businesses, there is an entire room dedicated to the servers that run their business. Not only is this hardware taking up physical space, but it also takes a good chunk of change to maintain.

Let’s say you have 7 servers. At any given time, two or three could be down. That takes at least two IT employees away from their everyday responsibilities so that they can fix the servers, get them back up and running, and everyone can continue working. Not only does it cost you money, the downtime means you are usually losing money too.

By using cloud computing services, there is no need for all the hardware or the extra room. There is no need to take your IT staff away from what they need to be doing.

Cloud computing can be as big as you need it to be without taking much time and absolutely no physical space. On-premise environments can also be as big as you want them to be. The difference is that with on-premise, you will continue to buy hardware to hold all of your data. And to keep your business running, you will have to continue to need to buy better hardware when the ones you already have become outdated or break.

On-premise is never a one-and-done kind of thing, there will always be upgrades and updates. When you have an update in cloud computing, it can be done automatically and not interfere with work. No hardware required. No cost needed.

It can be difficult to change what you are used to. You have your system and it works. But what if it could work better and grow faster, all while costing you less? That is what scalability in cloud computing can do for you.

Ready to see how cloud computing can make your business more efficient? Click here to learn more about Code Willing’s cloud-based file system and resource manager, CASFS+.

Code Willing’s Client Featured in Online Article about Reducing Compute Costs by 90%

Risk.net, a website dedicated to risk management in complex markets, recently published an article highlighting Quantbot Technologies’ innovative strides in Big Data when utilizing Code Willing’s CASFS+ product.

Quantbot presently uses this product to effectively and efficiently run a version of its quantitative investing platform on the cloud. By implementing CASFS+ into their strategy, they discovered a 90% reduction in cost to run their jobs. The article discusses the algorithm used in order to reduce their spending, along with some additional, effective tools available in CASFS+

And it’s not “Quants only.”. Having built something that can provide multiple businesses with a faster and more cost-efficient way to run jobs, Code Willing is currently in the works to offer CASFS+ to other businesses in the AWS Marketplace. You can even contact them here to set up a demo and learn more about the product.

Link to the news article here:

https://www.risk.net/investing/7856556/this-quant-firm-cut-90-of-its-compute-costs-heres-how

Quantbot CEO & Co-Founder Talks Evolution of Data Science

Speaking at the Neudata Summer Online Summit 2021, Quantbot Technologies CEO and co-founder, Paul White, presented “When Data is Everything,” where he talked about how far we have come in the world of data science.

During the presentation, White discussed how Quantbot manages its ever-growing data science demands utilizing tools from Code Willing and Options IT for smart task routing in the private and public clouds.

One of the tools they are currently using is CASFS+, a cloud-based file system and cloud resource manager built by Code Willing. 

In particular, White dove into costs and how working with CASFS+ has given them the ability to control a user’s budget. 

When working with data science, the amount of data stored is continuously growing. While having as much information as possible is desirable, it can also present a problem when researching the data and attempting to keep costs to a minimum.

When using the CASFS+ program, Quantbot can set budget controls for every user, ensuring no one will exceed their budget. Ultimately, allowing the company to rest easy knowing that the cost they foreshadowed will indeed be the actual cost in the end. 

Currently, CASFS+ is available as a fully-managed option directly through Code Willing. For those working with a more refined budget, Code Willing will soon provide CASFS+ as a self-service option through the AWS Marketplace. 

 

Click here to review the entire presentation of “When Data is Everything.”

 

Increasing Upload Speed With CASFS

100x Faster Database Inserts by Code Willing

Code Willing has determined the most efficient way to upload large amounts of data into a database hosted on their custom-built file system.

Following extensive research, Code Willing greatly decreased the time it takes to insert massive amounts of data into a TimescaleDB SQL database hosted on their file system, CASFS.

During their experiment, Code Willing used two different types of uploading methods: upserting data using Pangres and copying the data by using Timescale-parallel-copy.

The data being used in this research was time-sequential financial data daily files, with each of them having 2,229,120 rows and 19 columns.

The control variables during the course of the investigation were the size of the financial data files remaining identical for both uploading tests and using Pandas to process the data before inserting the data into the database.

While using the Pangres method, it allowed for easy insertion from Pandas dataframes into PostgreSQL databases using their upsert method. The upsert method takes advantage of Postgres’ insert function and handles issues like duplicates, adding missing columns, and datatypes.

By using the Pangres method, the upload time for inserting the data was 14.23 minutes (853.8 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Python Notebook 8 64 1 Pangres No none 14.23

Following this result, Code Willing wanted to find a quicker and more efficient way to upload their data. That is where the second uploading method came in.

Timescale-parallel-copy is, “a command-line program for parallelizing PostgreSQL’s built-in `copy` functionality for bulk inserting data into TimescaleDB.”

Code Willing performed three different ways to insert their data using Timescale-parallel-copy.

The first test was where the server-data relationship was local. The result led to an upload time of 1 minute (60 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 1 Timescale-parallel-copy No local 1.00

 

For their second trial, they considered using parallel processing and increasing workers by 1 to decrease the upload time further.

This resulted in an upload time of .35 minutes (21 seconds). 

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 2 Timescale-parallel-copy Yes local 0.35 

 

During the third trial, Code Willing set up a TimescaleDB on a separate server and ran it continuously. This is described in the variable “Server_data relationship” as “not local.”

With this change in place, everything else remained the same, including parallel processing, except for changing the number of workers. After testing 1, 2, 4, 6, and 8 workers, they determined 8 workers were the most efficient. The upload time with 8 workers showed .13 minutes (8 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 8 Timescale-parallel-copy Yes Not local 0.13 

 

Through this research, Code Willing improved Pangres’ upsert method 99.1%, going from 14.23 minutes to 0.13 minutes, by using TimescaleDB’s parallel copy functionality and implementing Ray’s parallel processing.

To break it down even further, this new method allowed them to copy 278,640 rows per second.

In the end, Code Willing’s experiment determined when uploading copious amounts of data, the most efficient and quickest way is by using the built-in “copy” function (like Timescale-parallel-copy), as well as parallel processing with large amounts of workers to decrease the upload time of time-sequential data into a Postgres database (TimescaleDB).

 

A Workaround for Workarounds

Cloud photo

Solving Network File System Scaling Issues

It’s no secret the data science community has been addressing the reproducibility problem for some time now.

Data scientists are searching for better tools to manage their work environments, as extensive machine learning models begin to be deployed in the real world. 

Ultimately, academia followed what commercial people were doing – building cloud platforms

Many among us imagined some platforms would not solve their problem. Rather, it would only introduce even bigger problems and introduce complexity to existing legacy data science code. 

Consequently, several startups have begun to emerge to try and solve the problem of cloud-based file systems and platforms. 

There are some orchestration platforms thought of as a way to reproduce data science by containerizing the environment to run anywhere quickly. All without worrying how the operating system or environment is being managed.

Although this type of platform is beneficial in some areas, it does make workflow inflexible in the context of data science. It doesn’t allow users to work in a convenient and easy to understand workflow. 

Painful things like understanding how new platforms work, debugging running code, interacting with GPUs, file transfers, environment configuration, and deployments take as much as 60 to 80 percent of a developer’s time. Even if it’s elastic, it needs to be easier to use for a data scientist or developer.  

But now there’s an answer.

A new platform currently available on the market, CASFS+, has emerged and successfully solves the problem of data science workflow issues with a file sharing service and secure cloud storage.

The CASFS+ Platform and Cloud Storage Solution

CASFS+ focuses on existing architecture, solving the problems associated with the containerized environments and the inherited large/elastic file issues. 

Even more promising, they do this without changing workflows from writing to running or changing the existing workflow process setup by many companies. 

Rather than forcing you to work with a solution, the CASFS+ platform gives data scientists the freedom to focus on writing code.

CASFS+ Approach

The CASFS+ approach can be summed up in its mantra: “Making the full power of the cloud as easy to use as your private data center.” 

Most competitors have a similar approach of setting up a substantial data storage service. They typically use an SQL-like interface to a distributed model.

But with CASFS+, data scientists work on their servers with their tool of choice. 

This includes GIT, Python’s Pandas, NumPy, Matplotlib, and Jupyterlab. It also includes other tools like VS-Code for writing and GIT/SVN for versioning, while running on the platform’s network file system. 

From there, you can run your code in a cluster environment using SGE or Ray. 

And because the entire process uses live servers instead of containers, access to the running process for debugging is accessible via SSH. 

CASFS+ is the secure cloud storage platform to handle centralization of workflows and simplify interactions.

You can sync your work and datasets directly from your local PC to CASFS+. Think of it as a “lift and shift” of all your data. 

You run your application in the cloud and can sync your work back to the other machine once you’re finished. Even while doing all this, CASFS+ lets you continue working without any change in workflow.

Why Choose CASFS+?

The problem with any platform, including the newer platforms that need scaling, is pretty much the same. 

They want to compel people to use their platform. 

That is precisely the reason most users are currently looking for a system that won’t force them to change their platform. And now users can have it.

CASFS+ stands by that promise and is available on the market now along with a number of incredible features.

Cloud Photo

Innovation in the Cloud

CASFS+ promises several innovative features to the cloud file system platform. 

Some features include deduplication of files, file usage per user stats, Prometheus stats availability, near instant availability of files once uploaded into the file object storage back end, and cost control.

The deduplication feature is highly valued and it’s not difficult to appreciate why. 

Redundant data consume tons of space, and immense datasets often have a considerable amount of duplication. 

The CASFS+ platform supports storage space administrators with the deduplication feature to help them reduce costs associated with duplicated data.

There are also issues relating to the cloud scaling network file system. There are significant scalability issues and remain to be a problem.

Many competitors struggle with the inability of the back end to scale at the same speed users pour into the applications.

CASFS+ has solved the cloud scaling network file system issues with performance and scaling.

Main Features

“The cloud tools available from CASFS+ make the full power of the cloud as easy to use as your private data center.” 

Using the idea of “Lift and Shift,” CASFS+ cloud tools are quick to set up and you can begin running your jobs without wasting days or weeks setting up the cloud.

The plug and play feature lets you simply run your existing on-premise code in the cloud. You won’t require any change while accessing all the power and benefits of the cloud.

The High-Performance Posix File System of CASFS+ is capable of spanning hundreds of machines. 

It can manage billions of files and thousands of petabytes of data while delivering the full aggregate bandwidth of managed object stores.

The API access lets you leverage all the powers of CASFS+ cloud tools and program them. It gives you the power and control over the environment of your cloud computing.

In addition to AWS, your environment provides the security features.

The team management tools offered by CASFS+ help you control, access, and budget your project.

One security feature allows for each budget to be managed by the team leader or administrator. With this feature, each user will be limited to the amount of money they can use each month for their cluster. If they are over 100% of their budget, a user will not be able to create new servers.

In addition to budget control, users will be restricted from root-level access to the underlying S3 bucket or file system.

All of these capabilities you earn empowers you to manage the way your team can leverage the public cloud.

Getting started is easy using VScode, Jupyter, or anything else. All you need is a browser.

Pricing

All the CASFS+ features come at a competitive rate. 

When going to host CASFS+, you can go to the AWS Marketplace to start the setup process. It should be noted there are no additional charges from CASFS+ to run the main server, however, customers will require an AWS server and AWS fees will apply. 

The CASFS+ fee is set up per connection at a rate of $0.30 per hour. Standard email support is included with this rate.

They also offer premium support options. For additional information on these options or to try out CASFS+, you can contact their team directly.