Scaling data driven decision making

We built atoma in response to our experience scaling self-serve data platforms at Square, Google, and various startups over the past 10 years. At it’s core, a data team’s responsibility is to nudge the business towards better outcomes with data informed decision making — today teams manifest this by:

  1. providing direct support to decision makers with ad-hoc manual analysis
  2. providing self-serve data tools for decision makers to derive their own insights

While effective for few important decisions, and accessible to experienced practitioners — business intelligence has a lot of room for improvement.

BI pain-points

  1. Dashboard as hammers: dashboards are the data interface for non-technical people at a company. This is great when the information people are interested in remains stable overtime, but falls apart in a dynamic business environment. This is a big problem for the analyst-stakeholder relationship because simple iterations on a dashboard need to be handcrafted. The outcome is dashboard sprawl, maintenance and ultimately data trust issues.
  2. BI’s metadata experience: Data literacy rates are perhaps the highest correlate to effective self-serve data platforms. To be capable, non-technical decision makers need to be informed on what data is available, how to ask good questions in the context of their business, how to calculate metrics, and how to interpret the results. Today this is the job of data documentation tools and co-workers. BI today is overfit on the dashboard product experience forgets about what else can be done to empower self-serve data.
  3. Economics of BI: analytics is a scarce resource at every company, there are simply more questions that can be asked than answered with our current toolset. While not every question deserves an answer, picking which ones get attention is muddled in resourcing politics and jira tickets. The ideal would be a perfect prioritization of questions with known upfront ROI impact, while this would be nice, is not possible still leaves many questions unanswered.

The elephant in the room, GPT

It’s impossible to ignore what’s going on with AI currently. Less than a year into the boon and there are a few industry data points that inform our vision for where the industry could be going:

  1. GPT is a good data analyst: head-to-head comparative test between humans and GPT-4 on end-to-end database analytics indicate that GPT-4 can achieve comparable performance to humans, for a fraction of the time/cost.
  2. Code interpreter: OpenAI’s code interpreter performs data analytics and visualization at astonishing speed and correctness across a wide domain.
  3. Autonomous systems: AutoGPT chains together LLM "thoughts", to create and execute tasks autonomously to achieve whatever goal you set.
  4. GPT productivity research: A study using AI-based conversational assistance with data from 5, 179 support agents found a 14% average productivity increase, especially benefiting novice workers. This AI tool also promoted knowledge sharing among employees, enhanced customer sentiment, reduced managerial intervention needs, and improved employee retention.

Introducing Atoma

Atoma is our response to the problems in BI in consideration of the capabilities of LLMs. We aim to filp the economics of BI on it’s head and unlock the latent value in unanswered questions.

  • Atoma (LLM agent): a conversational end-to-end analytics answering engine. Ask a question, Atoma will search it's knowledge base for relevant data and docs to create and execute a plan of analysis that answers your question.
  • Knowledge graph: give Atoma access to all your data, both in the form of structured data and unstructured docs to improve it's performance. Atoma will automatically infer semantic information about your database and build vector embeddings of your structured data (tables) and unstructured data (documents). You can manually add docs that Atoma will use to auto-correct itself in the future.
  • Dev Mode: a Python/SQL streaming and batch data workflow engine for building automations and data pipelines to ingest data, orchestrate pipelines, and sync data with external systems. Develop in a web IDE and/or CLI for interacting with the knowledge graph.

Features and capabilities

  1. Instant, iterative answers: while dashboards provide answers to questions already asked, Atoma can generate insights just like you would do while messaging back/forth with an analyst on your team. Iterate in real-time, ask clarifying questions, and get explanations of calculation logic and results. BI tools solve parts of this in a piecemeal way; but no tool on the market can simulate the personalized and on-demand capabilities unlocked by LLMs.
  2. Hybrid query engine: semantic search and SQL queries are powerful yet distant cousins that together cover the entire scope of enterprise data search. Combining both in a familiar conversational interface simplifies the way people answer questions on both structured or unstructured data.
  3. Human-in-the-loop: embrace the creativity of LLMs and handle potential hallucinations just like you would regular data questions by using Atoma as a central question routing system that will ping the right person on the data team to answer your question and teach Atoma.
  4. Personalized learning and development: while a company wiki is a great data store, it’s information needs to be searched, found, interpreted, and applied to a users question. LLMs customize answers to the users context and can be interrogated for additional supporting information.