Code Willing’s Client Featured in Online Article about Reducing Compute Costs by 90%

Risk.net, a website dedicated to risk management in complex markets, recently published an article highlighting Quantbot Technologies’ innovative strides in Big Data when utilizing Code Willing’s CASFS+ product.

 

Quantbot presently uses this product to effectively and efficiently run a version of its quantitative investing platform on the cloud. By implementing CASFS+ into their strategy, they discovered a 90% reduction in cost to run their jobs. The article discusses the algorithm used in order to reduce their spending, along with some additional, effective tools available in CASFS+

 

And it’s not “Quants only.”. Having built something that can provide multiple businesses with a faster and more cost-efficient way to run jobs, Code Willing is currently in the works to offer CASFS+ to other businesses in the AWS Marketplace. You can even contact them here to set up a demo and learn more about the product.

 

Link to the news article here:

https://www.risk.net/investing/7856556/this-quant-firm-cut-90-of-its-compute-costs-heres-how

Quantbot CEO & Co-Founder Talks Evolution of Data Science

Speaking at the Neudata Summer Online Summit 2021, Quantbot Technologies CEO and co-founder, Paul White, presented “When Data is Everything,” where he talked about how far we have come in the world of data science.

During the presentation, White discussed how Quantbot manages its ever-growing data science demands utilizing tools from Code Willing and Options IT for smart task routing in the private and public clouds.

One of the tools they are currently using is CASFS+, a cloud-based file system and cloud resource manager built by Code Willing. 

In particular, White dove into costs and how working with CASFS+ has given them the ability to control a user’s budget. 

When working with data science, the amount of data stored is continuously growing. While having as much information as possible is desirable, it can also present a problem when researching the data and attempting to keep costs to a minimum.

When using the CASFS+ program, Quantbot can set budget controls for every user, ensuring no one will exceed their budget. Ultimately, allowing the company to rest easy knowing that the cost they foreshadowed will indeed be the actual cost in the end. 

Currently, CASFS+ is available as a fully-managed option directly through Code Willing. For those working with a more refined budget, Code Willing will soon provide CASFS+ as a self-service option through the AWS Marketplace. 

 

Click here to review the entire presentation of “When Data is Everything.”

 

Increasing Upload Speed With CASFS

100x Faster Database Inserts by Code Willing

Code Willing has determined the most efficient way to upload large amounts of data into a database hosted on their custom-built file system.

Following extensive research, Code Willing greatly decreased the time it takes to insert massive amounts of data into a TimescaleDB SQL database hosted on their file system, CASFS.

During their experiment, Code Willing used two different types of uploading methods: upserting data using Pangres and copying the data by using Timescale-parallel-copy.

The data being used in this research was time-sequential financial data daily files, with each of them having 2,229,120 rows and 19 columns.

The control variables during the course of the investigation were the size of the financial data files remaining identical for both uploading tests and using Pandas to process the data before inserting the data into the database.

While using the Pangres method, it allowed for easy insertion from Pandas dataframes into PostgreSQL databases using their upsert method. The upsert method takes advantage of Postgres’ insert function and handles issues like duplicates, adding missing columns, and datatypes.

By using the Pangres method, the upload time for inserting the data was 14.23 minutes (853.8 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Python Notebook 8 64 1 Pangres No none 14.23

Following this result, Code Willing wanted to find a quicker and more efficient way to upload their data. That is where the second uploading method came in.

Timescale-parallel-copy is, “a command-line program for parallelizing PostgreSQL’s built-in `copy` functionality for bulk inserting data into TimescaleDB.”

Code Willing performed three different ways to insert their data using Timescale-parallel-copy.

The first test was where the server-data relationship was local. The result led to an upload time of 1 minute (60 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 1 Timescale-parallel-copy No local 1.00

 

For their second trial, they considered using parallel processing and increasing workers by 1 to decrease the upload time further.

This resulted in an upload time of .35 minutes (21 seconds). 

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 2 Timescale-parallel-copy Yes local 0.35 

 

During the third trial, Code Willing set up a TimescaleDB on a separate server and ran it continuously. This is described in the variable “Server_data relationship” as “not local.”

With this change in place, everything else remained the same, including parallel processing, except for changing the number of workers. After testing 1, 2, 4, 6, and 8 workers, they determined 8 workers were the most efficient. The upload time with 8 workers showed .13 minutes (8 seconds).

 

Upload Origin Cores Ram Workers Upload Method Parallel Processing Server_data relationship Time (minutes)
Command Line 8 64 8 Timescale-parallel-copy Yes Not local 0.13 

 

Through this research, Code Willing improved Pangres’ upsert method 99.1%, going from 14.23 minutes to 0.13 minutes, by using TimescaleDB’s parallel copy functionality and implementing Ray’s parallel processing.

To break it down even further, this new method allowed them to copy 278,640 rows per second.

In the end, Code Willing’s experiment determined when uploading copious amounts of data, the most efficient and quickest way is by using the built-in “copy” function (like Timescale-parallel-copy), as well as parallel processing with large amounts of workers to decrease the upload time of time-sequential data into a Postgres database (TimescaleDB).

 

A Workaround for Workarounds

Cloud photo

Solving Network File System Scaling Issues

It’s no secret the data science community has been addressing the reproducibility problem for some time now.

Data scientists are searching for better tools to manage their work environments, as extensive machine learning models begin to be deployed in the real world. 

Ultimately, academia followed what commercial people were doing – building cloud platforms

Many among us imagined some platforms would not solve their problem. Rather, it would only introduce even bigger problems and introduce complexity to existing legacy data science code. 

Consequently, several startups have begun to emerge to try and solve the problem of cloud-based file systems and platforms. 

There are some orchestration platforms thought of as a way to reproduce data science by containerizing the environment to run anywhere quickly. All without worrying how the operating system or environment is being managed.

Although this type of platform is beneficial in some areas, it does make workflow inflexible in the context of data science. It doesn’t allow users to work in a convenient and easy to understand workflow. 

Painful things like understanding how new platforms work, debugging running code, interacting with GPUs, file transfers, environment configuration, and deployments take as much as 60 to 80 percent of a developer’s time. Even if it’s elastic, it needs to be easier to use for a data scientist or developer.  

But now there’s an answer.

A new platform currently available on the market, CASFS+, has emerged and successfully solves the problem of data science workflow issues with a file sharing service and secure cloud storage.

The CASFS+ Platform and Cloud Storage Solution

CASFS+ focuses on existing architecture, solving the problems associated with the containerized environments and the inherited large/elastic file issues. 

Even more promising, they do this without changing workflows from writing to running or changing the existing workflow process setup by many companies. 

Rather than forcing you to work with a solution, the CASFS+ platform gives data scientists the freedom to focus on writing code.

CASFS+ Approach

The CASFS+ approach can be summed up in its mantra: “Making the full power of the cloud as easy to use as your private data center.” 

Most competitors have a similar approach of setting up a substantial data storage service. They typically use an SQL-like interface to a distributed model.

But with CASFS+, data scientists work on their servers with their tool of choice. 

This includes GIT, Python’s Pandas, NumPy, Matplotlib, and Jupyterlab. It also includes other tools like VS-Code for writing and GIT/SVN for versioning, while running on the platform’s network file system. 

From there, you can run your code in a cluster environment using SGE or Ray. 

And because the entire process uses live servers instead of containers, access to the running process for debugging is accessible via SSH. 

CASFS+ is the secure cloud storage platform to handle centralization of workflows and simplify interactions.

You can sync your work and datasets directly from your local PC to CASFS+. Think of it as a “lift and shift” of all your data. 

You run your application in the cloud and can sync your work back to the other machine once you’re finished. Even while doing all this, CASFS+ lets you continue working without any change in workflow.

Why Choose CASFS+?

The problem with any platform, including the newer platforms that need scaling, is pretty much the same. 

They want to compel people to use their platform. 

That is precisely the reason most users are currently looking for a system that won’t force them to change their platform. And now users can have it.

CASFS+ stands by that promise and is available on the market now along with a number of incredible features.

Cloud Photo

Innovation in the Cloud

CASFS+ promises several innovative features to the cloud file system platform. 

Some features include deduplication of files, file usage per user stats, Prometheus stats availability, near instant availability of files once uploaded into the file object storage back end, and cost control.

The deduplication feature is highly valued and it’s not difficult to appreciate why. 

Redundant data consume tons of space, and immense datasets often have a considerable amount of duplication. 

The CASFS+ platform supports storage space administrators with the deduplication feature to help them reduce costs associated with duplicated data.

There are also issues relating to the cloud scaling network file system. There are significant scalability issues and remain to be a problem.

Many competitors struggle with the inability of the back end to scale at the same speed users pour into the applications.

CASFS+ has solved the cloud scaling network file system issues with performance and scaling.

Main Features

“The cloud tools available from CASFS+ make the full power of the cloud as easy to use as your private data center.” 

Using the idea of “Lift and Shift,” CASFS+ cloud tools are quick to set up and you can begin running your jobs without wasting days or weeks setting up the cloud.

The plug and play feature lets you simply run your existing on-premise code in the cloud. You won’t require any change while accessing all the power and benefits of the cloud.

The High-Performance Posix File System of CASFS+ is capable of spanning hundreds of machines. 

It can manage billions of files and thousands of petabytes of data while delivering the full aggregate bandwidth of managed object stores.

The API access lets you leverage all the powers of CASFS+ cloud tools and program them. It gives you the power and control over the environment of your cloud computing.

In addition to AWS, your environment provides the security features.

The team management tools offered by CASFS+ help you control, access, and budget your project.

One security feature allows for each budget to be managed by the team leader or administrator. With this feature, each user will be limited to the amount of money they can use each month for their cluster. If they are over 100% of their budget, a user will not be able to create new servers.

In addition to budget control, users will be restricted from root-level access to the underlying S3 bucket or file system.

All of these capabilities you earn empowers you to manage the way your team can leverage the public cloud.

Getting started is easy using VScode, Jupyter, or anything else. All you need is a browser.

Pricing

All the CASFS+ features come at a competitive rate. 

When going to host CASFS+, you can go to the AWS Marketplace to start the setup process. It should be noted there are no additional charges from CASFS+ to run the main server, however, customers will require an AWS server and AWS fees will apply. 

The CASFS+ fee is set up per connection at a rate of $0.30 per hour. Standard email support is included with this rate.

They also offer premium support options. For additional information on these options or to try out CASFS+, you can contact their team directly.

Luke Davis Becomes Chief Project Manager at Code Willing

Code Willing Data Management Service Logo

Code Willing, a leading independent provider of trading and data management technology for quantitative analytics and global multi-asset electronic trading, is pleased to announce that Luke Davis, who has served as a Director of Business Operations for the Company since December 2015, has assumed the role of Chief Project Manager. 

Based out of Code Willing’s new office in New Orleans and reporting to CEO Baron Davis, Luke’s key responsibilities include working closely with the development teams to help them achieve their goals, meeting clients’ needs in a timely and efficient manner, identifying opportunities to improve service delivery, and collaborating with the wider executive team on strategy and product innovation.

Baron commented, 


We are delighted that Luke has accepted this opportunity to be a bigger part of Code Willing. Luke brings valuable insights as a problem solver which will be pivotal as we continue to build momentum on our strategic growth objectives. We are excited to have Luke as an official permanent member of the executive team.”


Luke said, 


Having seen how forward-thinking, innovative, and client-focused the Code Willing team is over the past 4 years, I am delighted to be formalizing my position as Chief Project Manager. I look forward to continuing to work towards leading our development team to reach its goals and contributing to the company’s strategic plans.”


Prior to Code Willing, Luke has spent much of his career in the legal field and property management.  


About Code Willing

Code Willing is a leader in data management solutions for the financial industry. Built on 20+ years of experience in fintech and trading, Code Willing offers data management services, cloud analysis tools, low latency market data feed handlers and scalable high-performance file storage. For more information, please visit www.codewilling.com. Follow on Twitter @codewilling.