Skip to main content

Why Patterns?

Patterns is an accessible, scalable, and highly expressive framework for building unified analytical and operational data systems.

We created Patterns to solve for problems we had while running cross-functional data and business operations teams obsessed with creating business value from data. We would identify and build data products for the most important business objectives in scaled, high growth and operationally complex situations.

We stumbled across these problems throughout our careers and carefully took note, these notes have directly informed the functional properties of Patterns.

Product Comparisons

Design Problems + Solutions

Below are six of the most painful experiences we've had when developing data stacks, running data teams, and how Patterns is designed to solve for these problems.

Problem → Orchestration when you’re an analyst

Solution → Functional reactive graph execution framework and node abstractions

All orchestration tools on the market today are onerous, overly complex and inaccessible (Airflow, Prefect, Dagster, etc). By definition, they serve as middle later glue for disconnected systems with different operating protocols. Because of their complexity and role as a middle man, orchestration tools create bottlenecks between data engineering teams the rest of the organization. Literal data, functional, and capability silos. As an experienced analyst who knows SQL, a little python, and understands how data pipelines work — adding a new SQL query to a pipeline is not possible without intimate understanding of the low-level orchestration abstractions, infrastructure, CI/CD, and git.

In Patterns we modeled our graph execution engine after functional reactive programming -- that is execution dependencies are declared by the inputs/outputs used in a node. This framework doesn’t require another layer for orchestration, it’s implied by the data tables and streams you read/write from in your code. So if you can write SQL and know a little bit of Python, you can build data pipelines without being or pulling in an engineer to help you.

Problem → Planning, designing, and working with new or less technical team members on shared projects that involve technical systems

Solution → Low-code interactive graph with documentation, catalog, comments and audit logs

Most data projects are doomed at inception because requirements aren't well understood or participants have differing mental models for how stuff works. This is especially true when building data solutions for operations teams who don't have the technical background to understand how data systems work.

Patterns provides a low-code canvas where the requirements of a project can diagrammed upfront, without writing code! This enables discussion on the business and technical requirements to happen within the same interface, enabling non-technical team members to outline a project before handing over to developer and supporting non-technical team members with an accessible interface to understand how data flows through the system.

Problem → Moving your analysis to production, or just sharing with a colleague

Solution → Development infra == Production infra

Every data scientist has been there - prototyped a model in a python notebook, had a meeting to share it with colleagues, everyone loves the analysis and agrees that the model should be productionized. Next step, get the model implemented... not so fast. First, your colleague can't reproduce your model because of subtle differences in development environment. Next, there either exists no infrastructure that can serve this model, or the infra that exists in production has subtle differences in database drivers, python packages, CPU/memory etc. Real engineering work is required to get this working and this is not the skillset of a data scientist.

The source of this problem is that tooling required to develop a model is quite different that to serve a model. Patterns provides a solution to this problem through it's cloud based, unified framework, that combines feature sets for both the development (querying, modeling, visualization) and production (webhooks, data streams, APIs).

Problem -> Where do I start?

Solution -> Practice plan and open app marketplace

Breaking into data science is hard. What tools should I learn? What’s a good project for my portfolio? I know statistics, but how can I apply it to business data? Even 10 years after having this experience, there still isn’t a platform that provides an analytics product with a database, data sets, and a community of content around it for getting started with data science.

Humans are really good at pattern recognition and we learn by doing. The combination of a feature complete data science platform and open app marketplace with an integrated community is fertile ground for distributing knowledge and content to educate the next decade of data scientists.

Problem -> Meet me where I am, don’t lock me in

Feature -> Code first, python and SQL

Every product requires a little upfront learning, but many data products require learning proprietary abstractions or workflows that depend on a UI -- this adds friction to user onboarding and code portability.

We're designing for anyone who can write SQL and/or Python to be effective in Patterns in minutes after signing up. We have some specific syntax for table and object references, but using these follow already well known patterns such as python dataframes and jinja table references where all business logic is preserved.

Problem → Needing a capability your current stack doesn’t support, needing to buy yet another data tool

Solution → Highly expressive low-level building blocks and cloud interoperability

If you treat your data stack as a product, you will start small and iterate - understanding the needs of the busienss and scaling complexity as needed. Current tooling in the market makes this hard to do, because starting small means picking a minimal toolset expanding into more complex use cases required adding more tools. Before you know it, you have 5-10 tools with overlapping functionality that don't cleanly integrate with eachother.

Patterns solves for this with low-level building blocks - python, sql, table, stream - that can be configured in any imaginable way to construct business logic to solve a business problem.

Who is Patterns for?

  • Data engineers
  • Data scientists
  • Data analysts
  • Non-technical stakeholders

What needs does Patterns solve for?

Infrastructure

  • I need a database to store my data
  • I need to run functions in the cloud
  • I need to store files in the cloud

Basic analytics

  • I need to ingest data from databases and APIs
  • I need to query my data
  • I need to model and clean by data by building SQL pipelines
  • I need to visualize my data

Operational analytics

  • I need to ingest events via webhooks
  • I need to script python
  • I need to ingest events via event streams
  • I need to process data streams
  • I need to interact with external systems
  • I need to take manual action on my data

Data science and ML

  • I need to do statistical analysis
  • I need to train and and prototype ML models
  • I need to pre-process data, store features, and create data artifacts
  • I need to run batch inference pipelines
  • I need to serve decisions from my models in real-time

Systems organization

  • I need to orchestrate all my work by connecting with external systems
  • I need to manage execution complexity by abstracting away details
  • I need to manage knowledge complexity by taking nodes and writing documentation
  • I need to collaborate with non-technical team members

Developer features

  • I need to share/clone apps of other developers
  • I need an integrated IDE accessible to all team members
  • I need a local devkit so I can use my favorite IDE
  • I need an app marketplace so I can discover pre-built solutions
  • I need integrated CI/CD so I can deploy apps safely

How does Patterns compare to other tools?

Product comparisons are extremely difficult in the data systems and tooling market. In our case, there are hundreds of companies we have overlapping functionality with because Patterns solves for customer data problems by providing building blocks which otherwise served by by individual point solutions.

Two companies that have the most similar capabilities, architecture, and value proposition are Palantir’s Foundry and Dataiku. There are others, like Alteryx and Knime, but Foundry and Dataiku are the most cloud adopted for this category of products.

From the perspective of the modern data stack, Patterns provides the most value through it’s integrated environment for data pipelines, modeling, analysis, and operationalizing data science — it is designed as an improvement on the concatenation of tools such as cloud notebooks, orchestration, transformation, and MLOps products required to do data science such as:

  • Cloud notebooks - Hex, Streamlit, Deepnote, Google Colab
  • Orchestration tools - Airflow, Dagster, Prefect, Astronomer
  • MLOps - Sagemaker, Dremio, Tecton
  • Transform - dbt. While Patterns has support for SQL transformations, some users have existing projects on dbt. You can either import your dbt project into Patterns or use Patterns to orchestrate your dbt pipelines.

What tools can I use with Patterns?

Patterns compliments and has integrations for a number of tools including:

  • ETL/ELT - Fivetran, Airbyte, Stitch, and Matillion. Where Patterns does not have support, you can use these providers to import your data.
  • Transform - dbt. While Patterns has support for SQL transformations, some users have existing projects on dbt. You can either import your dbt project into Patterns or use Patterns to orchestrate your dbt pipelines.
  • Reverse ETL - Census, Hightough. Where Patterns does not have support for reverse ETL, you can use these providers to get data out of Patterns to your business systems.
  • BI tools - Looker, Tableau, etc. While Patterns provides basic support for charts and dashboards, self-serve analytics and drag/drop quick no-code charting for less technical team members is one use case that's not well supported.