A Workaround for Workarounds

Cloud photo

Solving Network File System Scaling Issues

It’s no secret the data science community has been addressing the reproducibility problem for some time now.

Data scientists are searching for better tools to manage their work environments, as extensive machine learning models begin to be deployed in the real world. 

Ultimately, academia followed what commercial people were doing – building cloud platforms

Many among us imagined some platforms would not solve their problem. Rather, it would only introduce even bigger problems and introduce complexity to existing legacy data science code. 

Consequently, several startups have begun to emerge to try and solve the problem of cloud-based file systems and platforms. 

There are some orchestration platforms thought of as a way to reproduce data science by containerizing the environment to run anywhere quickly. All without worrying how the operating system or environment is being managed.

Although this type of platform is beneficial in some areas, it does make workflow inflexible in the context of data science. It doesn’t allow users to work in a convenient and easy to understand workflow. 

Painful things like understanding how new platforms work, debugging running code, interacting with GPUs, file transfers, environment configuration, and deployments take as much as 60 to 80 percent of a developer’s time. Even if it’s elastic, it needs to be easier to use for a data scientist or developer.  

But now there’s an answer.

A new platform currently available on the market, CASFS+, has emerged and successfully solves the problem of data science workflow issues with a file sharing service and secure cloud storage.

The CASFS+ Platform and Cloud Storage Solution

CASFS+ focuses on existing architecture, solving the problems associated with the containerized environments and the inherited large/elastic file issues. 

Even more promising, they do this without changing workflows from writing to running or changing the existing workflow process setup by many companies. 

Rather than forcing you to work with a solution, the CASFS+ platform gives data scientists the freedom to focus on writing code.

CASFS+ Approach

The CASFS+ approach can be summed up in its mantra: “Making the full power of the cloud as easy to use as your private data center.” 

Most competitors have a similar approach of setting up a substantial data storage service. They typically use an SQL-like interface to a distributed model.

But with CASFS+, data scientists work on their servers with their tool of choice. 

This includes GIT, Python’s Pandas, NumPy, Matplotlib, and Jupyterlab. It also includes other tools like VS-Code for writing and GIT/SVN for versioning, while running on the platform’s network file system. 

From there, you can run your code in a cluster environment using SGE or Ray. 

And because the entire process uses live servers instead of containers, access to the running process for debugging is accessible via SSH. 

CASFS+ is the secure cloud storage platform to handle centralization of workflows and simplify interactions.

You can sync your work and datasets directly from your local PC to CASFS+. Think of it as a “lift and shift” of all your data. 

You run your application in the cloud and can sync your work back to the other machine once you’re finished. Even while doing all this, CASFS+ lets you continue working without any change in workflow.

Why Choose CASFS+?

The problem with any platform, including the newer platforms that need scaling, is pretty much the same. 

They want to compel people to use their platform. 

That is precisely the reason most users are currently looking for a system that won’t force them to change their platform. And now users can have it.

CASFS+ stands by that promise and is available on the market now along with a number of incredible features.

Cloud Photo

Innovation in the Cloud

CASFS+ promises several innovative features to the cloud file system platform. 

Some features include deduplication of files, file usage per user stats, Prometheus stats availability, near instant availability of files once uploaded into the file object storage back end, and cost control.

The deduplication feature is highly valued and it’s not difficult to appreciate why. 

Redundant data consume tons of space, and immense datasets often have a considerable amount of duplication. 

The CASFS+ platform supports storage space administrators with the deduplication feature to help them reduce costs associated with duplicated data.

There are also issues relating to the cloud scaling network file system. There are significant scalability issues and remain to be a problem.

Many competitors struggle with the inability of the back end to scale at the same speed users pour into the applications.

CASFS+ has solved the cloud scaling network file system issues with performance and scaling.

Main Features

“The cloud tools available from CASFS+ make the full power of the cloud as easy to use as your private data center.” 

Using the idea of “Lift and Shift,” CASFS+ cloud tools are quick to set up and you can begin running your jobs without wasting days or weeks setting up the cloud.

The plug and play feature lets you simply run your existing on-premise code in the cloud. You won’t require any change while accessing all the power and benefits of the cloud.

The High-Performance Posix File System of CASFS+ is capable of spanning hundreds of machines. 

It can manage billions of files and thousands of petabytes of data while delivering the full aggregate bandwidth of managed object stores.

The API access lets you leverage all the powers of CASFS+ cloud tools and program them. It gives you the power and control over the environment of your cloud computing.

In addition to AWS, your environment provides the security features.

The team management tools offered by CASFS+ help you control, access, and budget your project.

One security feature allows for each budget to be managed by the team leader or administrator. With this feature, each user will be limited to the amount of money they can use each month for their cluster. If they are over 100% of their budget, a user will not be able to create new servers.

In addition to budget control, users will be restricted from root-level access to the underlying S3 bucket or file system.

All of these capabilities you earn empowers you to manage the way your team can leverage the public cloud.

Getting started is easy using VScode, Jupyter, or anything else. All you need is a browser.

Pricing

All the CASFS+ features come at a competitive rate. 

When going to host CASFS+, you can go to the AWS Marketplace to start the setup process. It should be noted there are no additional charges from CASFS+ to run the main server, however, customers will require an AWS server and AWS fees will apply. 

The CASFS+ fee is set up per connection at a rate of $0.30 per hour. Standard email support is included with this rate.

They also offer premium support options. For additional information on these options or to try out CASFS+, you can contact their team directly.

Code Willing Enters into Strategic Partnership with Phitopolis

Code Willing and Phitopolis partner up to bring the best financial data management technology and market data feed handlers. to clients all over the world.

Code Willing, a leading financial data management service in the fintech industry, announced today that it has formed a partnership with Phitopolis, a high-end technology company located in the Philippines, to assist in the software development process and to extend the global reach of Code Willing’s data services.

According to Code Willing, the duo has already successfully completed several proofs-of-concept that will enable Code Willing to utilize the latest technology to deliver better results to clients from their data services. The company further explained:


This allows Code Willing’s existing and future clients to leverage the improved performance of our services. They will get reliable results so that they can continue to focus on building strategies and trading.”


Code Willing also noted that the partnership aims to deliver the best-of-breed hardware and software solutions to clients that need market-leading latency and performance as well as global scale and coverage from their latency-sensitive trading applications. The two companies will work together to identify opportunities to bring the technology to market through Code Willing’s range of data management services, market data feed handlers and analysis tools. While sharing more details about the partnership, Mark Walbaum, Chief Technology Officer and Co-Founder of Phitopolis, stated: 


This partnership with Code Willing aligns with our strategy to provide the latest and greatest technology to users all over the world. We are focused on providing developmental and operational support to Code Willing. We have an experienced and talented team so that Code Willing can reach new heights.”


Baron Davis, CEO of Code Willing added:


We are delighted to be working with Phitopolis as Code Willing continues to focus on providing clients with flexible and transparent high-performance solutions for their latency-sensitive trading strategies. Leveraging Phitopolis’ multi-talented team to accelerate and enhance Code Willing’s data management offering is only the first step. We expect that this partnership will be the first of many opportunities to further enhance the data services being delivered to Code Willing clients.”


About Code Willing

Code Willing is a leader in data management solutions for the financial industry. Built on 20+ years of experience in fintech and trading, Code Willing offers data management services, cloud analysis tools, low latency market data feed handlers and scalable high-performance file storage. For more information, please visit www.codewilling.com. Follow on Twitter @codewilling.


About Phitopolis

Phitopolis enables financial companies to evaluate and run multiple Big Data solutions quickly, simply, reliably, securely and cost-effectively. Phitopolis is committed to delivering purpose-built Big Data solutions and services for the management and integration of commercial and proprietary technologies across multiple platforms. For more information, please visit www.phitopolis.com.

The 3 Most Common Data Management Challenges in the Financial Industry

Financial institutions look to grow their revenue by reducing risk, cutting costs, and making wise business decisions. Business decisions increasingly rely upon volumes of data which can pose serious production challenges in areas including data ingestion, data quality, and data production.

Production Challenges

If these problems are not overcome, they will become detrimental to your business by unnecessarily wasting time, man-power and money.

1. Manual Data Ingestion

It can be difficult and time-consuming to track and ingest so much data from so many vendors. Leveraging an automated approach can help re-focus your efforts towards strategy and trading if much of your organization’s time and resources are committed to this. Thoroughly tested extract-transform-load (ETL) pipelines managed by an experienced operations team coupled with meaningful and well-laid-out dashboards make data ingestion easy providing your team with confidence and a strong foundation necessary to build a performant data analytics system.

2. Poor Data Quality

Not having the proper analytics is like steering a ship blind. Poor data quality is just as bad if not worse. If you cannot rely on your data for accuracy then you will not be able to rely on your forecasts drawn from that data. Data quality should be built into the data production pipeline early rather than later so issues can be found, marked and fixed before production data sets are built.

Cleaning data draws focus away from prime goals. Usage of machine learning and other statistical approaches can put your team back in the business of trading confidently knowing that your data is of the highest quality.

3. Slow Data Production

Many financial firms are still working with legacy software that is not geared for today’s data volume. Processing large data volumes quickly will require new techniques such as:

  • Parallelization across many nodes
  • Vectorized processing
  • Distributed file systems
  • Efficient file formats such as Parquet and HDF
  • Pattern-based (machine learning) and statistical algorithmic approaches
  • GPU and other SIMD techniques

It will be difficult to keep up with increased volume, variability and breadth of data in the future if these techniques are not implemented.

The Outcome for Financial Firms

If these challenges are not met, financial firms will experience inefficiencies during all phases of their ETL. Problems that arise from manual data ingestion will only be exacerbated by a slow production pipeline. Without advanced customized software, your data will be of lower quality, more difficult to maintain and contain less actionable insights. Without better reliable data quality processing, you won’t be able to detect anomalies. All of this together produces bad data and leads to higher data management costs, increased risk and ultimately revenue loss.

This is definitely not the desired outcome, but it is not easy to adapt to an ever-changing technology landscape. Most financial firms do not have the man-power, time or resources to easily address these issues. That is why there are third-party companies that exist who have already solved these problems.

Third-Party Data Management Solution

Technology and the financial industry landscape are evolving quickly making it difficult to keep up with while maintaining your core business. With the help from a financial data management company, one with a proven track record and decades of experience such as Code Willing, you can stop fighting with your data and start leveraging it.

Only a few fintech firms provide end-to-end management of data production resources at this time but with more on the rise. They have a staff of Data Scientists, Data Experts and DevOps engineers that have experience ingesting cleaning, organizing, building and cross-referencing financial data sets from many vendors. Some have already developed complete solutions to these common data management problems and can help jump-start your technology and workflows into the future right now allowing your team to spend less time on administrative tasks and more time closing deals.

End-to-end financial data management firms can ensure a high-quality data product complete with analytical dashboards to provide insight into data content and the tools necessary to allow your team to extract targeted data allowing for decisions to be made that reduce risk, lower costs and increase revenue.

The data revolution is here and is firmly rooted in the financial industry. Data will increase in volume, variability, and complexity stressing ETL pipelines. Both data quality and processing speed will become more important and more difficult to handle requiring an experienced team of specialized data, coding and operation engineers to solve.

With the right fintech team working for you, your firm can overcome common data management challenges, become more efficient and gain an edge over the competition.

Take Control Of Your Data With The Code Willing Platform

Code Willing Alternative Data Management Solutions

Researchers should be able to focus on research, not on managing data.

Utilizing cloud services to manage and process time series data is rapidly becoming the go-to data solution. This is not surprising given the cost savings and processing power available in cloud-based platforms. 

Code Willing is an independent global provider of quantitative research and trading software specializing in cloud-based technology. We provide data management services to handle ingesting, cross referencing, cleaning and storing data. We pride ourselves on building efficient, clean and complete data solutions that allow our clients to focus on research.

Code Willing makes data easy and usable. 

By leveraging the Code Willing platform, processing raw vendor data into a clean, readily accessible and usable format is made easy allowing research staff to focus on the core of their business. These robust services include: cross referencing, data quality and scalable storage.

Aligning time-series data from different sources is a significant challenge. Cross-referencing independent vendor data sets and keeping them all in time synchronous order can be even more of a challenge. We accomplish this by assigning a unique identifier to all listings. This identifier, known as the “Code Willing Stable ID” or SID, is used to track specific assets through time and across multiple data sets. Additionally, we manage daily processing jobs in a very controlled and batch-oriented manner using Code Willing’s proprietary job scheduler HAL. HAL handles job dependencies, market schedules, time zone conversions and provides an extremely formattable alerting system to notify operation teams of all on-going processes.  

Data Quality is an integral part of the Code Willing suite. Anomalies in data behavior can be difficult to detect and can be very frustrating throughout the research process. We approach data quality using a two-phased approach. Our rules-based algorithms check against vendor supplied documentation to ensure correct formatting and overall content while our machine-learning/pattern-recognition processes can find deeper, more embedded anomalies. We are making great strides towards even more robust machine learning techniques. In general, our data quality processing detects outliers regardless of formatting and changes as it  automatically re-trains over time. By applying both a rules-based and an algorithmic approach to spot anomalies, the data quality program ensures that the data is clean and lowers the error rate as part of our daily data pipeline.

Content Addressable Storage (CAS) is a flexible platform for optimized content storage with customizable, role-based permissions. CAS stores the data securely in your data center or in the cloud so that it is readily accessible and it features a single, consolidated interface to all files, regardless of their physical location. The CAS platform works for anyone who requires local or cloud-based storage of large numbers of files and the ability to configure fine-grained permissions for access to those files. 

High-performance data access and data processing are core aspects of the Code Willing platform. Through these services, research staff can focus on research and investments without having to worry about routine data management. 

Learn more about our Data Management Services.

MIT Students Leverage Code Willing Technology for Quantitative Finance Research Project

MIT utilizes Code Willing for Research

Code Willing is working with the academic sector to advance research in the field of Quantitative Finance.

Quantbot Technologies LP, a global quantitative hedge fund and Code Willing client, is constantly looking for new research opportunities and partnerships but is often restricted due to the highly proprietary nature of their work. With Code Willing managing the Quantbot data and cloud infrastructure, it enables Quantbot to engage external research teams via consulting arrangements. Through the use of the Code Willing technology suite, Quantbot can provide tightly controlled access to clean data and the virtually unlimited resources of cloud computing infrastructure.

Recently, Quantbot worked with several student groups participating in a month-long MIT Sloan 2019 Finance Research Practicum course. The students performed real research for Quantbot in a secure environment running on Amazon AWS with strictly controlled access to the clean datasets they needed.


Without the Code Willing platform there’s really no way we would have been able to host these projects. Now we have the ability to run almost limitless amounts of additional research projects with no risk of impact to our internal systems.”

– Paul White, President and CTO of Quantbot