GenAI in
Production

Mark Edmondson

Chapters

  1. The post-GenAI Era
  2. What is needed for production ready GenAI
  3. Case Studies of GenAI in Production

1. The post-GenAI Era

GenAI is transformational

A new era is starting for human creativity, communication and understanding.

Its impact should be comparable to the printing press, telephone or the internet

If we freeze the capabilities of GenAI to what they can do only today, we will still see transformational change.

But GenAI still has much more potential

Gemini Ultra is trained on 30+ trillion tokens.

A human would take 100,000 years to read the same amount of material.

We perform better than the most advanced GenAI models on a fraction of the reading material.

Vision is much higher bandwidth {than text}: about 20MB/s… In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.

Yann LeCun, Chief AI Scientist at Meta

The GenAI we have today is the worse it will ever be

And today it can do this

We now have a new meta layer for interacting with information.

Language → GenAI → Python/SQL → C++ → Binary → …

Poets → Programmers → Engineers → Physicists → …

To take advantage of emergent properties of information we will need poets and philosophers rather than programmers or engineers to interact with data

How we interact with data will be transformational

But improving communication between people will have biggest impact

Biggest Impact for GenAI in Production

  1. Enable poets to talk to databases
  2. Enable people to communicate better with each other

Lets try it

Most presentations people only remember 20% of the content.

Can we improve this using GenAI?

multivac.sunholo.com

Log in to multivac.sunholo.com and ask a question! This presentation and background material is uploaded to it.

2. What is needed for production ready GenAI

Components for production GenAI

Table 1: Considerations for GenAI production
Feature Notes
Data Your unique data is the difference
Model Capabilities of the model
Prompt Instructions for the model

More components for production GenAI

Table 2: Seconday Considerations for GenAI production
Feature Notes
UI How people interact with your output
Monitoring Checking performance
Authentication Protecting your data
Scaling Keeping up with demand
Speed New data updates, time to first token, etc

Data

Key differentiator

  • Unstructured data like PDFs, images, docs can be extracted and embedded to be useful for GenAI techniques such as RAG
  • Structured data within databases can be queried by GenAI agents via generated code
  • Domain knowledge locked up within your collegues

Treasure your data - it is the unique offering your company can bring to GenAI

Personal data is much more valuable now

Past emails, tweets, photos, messages, SMS take on a new power to communicate who you are.

Creation of digital Philosophical Zombies are possible

We can already transfer your likeness

Will we be able to transfer your behaviour too?

Embeddings are a new data type

What do vector embeddings measuring semantic meaning - mean?

  • [0.4344, 1.232323, 0.232323, -2.1, ... ]

  • ["Redness", "Fear", "Loving", "Cats", ...] ??

  • Convincing representations of humanity can be contained within ~1000 dimensions

Use cases for embedding

  • Find similar movies/pictures/people - recommendation engines
  • Match user question to a document chunk (RAG)
  • Match user question history to another user’s data (Profile matching)
  • Find anti-match between uploaded picture and uploaded description (Clean up dirty data)

Vector stores (Google)

Vectorstores on GCP
Vector Store Notes
AlloyDB PostgreSQL, pgvector and in-database embeddings
BigQuery New! Your data probably in there already
Vertex AI Search and Conversation Abstracts away embedding, scales up
Vertex Vector Search Enterprise usage, top performance but costs $$$

Vector stores (Non-Google)

Table 3: non-Google Vectorstores
Vector Store Notes
Supabase Cheap and open-source but tricky to host and slows down
LanceDB Cheap and quick, backed up by Cloud Storage, immature
Pinecone Popular 3rd party hosted service
Qdrant Rust based fast enterprise service

How to get GenAI data for a company?

  • Look at all your current methods of communicating ideas - workshops, emails, meetings. Can they be improved?
  • Unlock your data - use embedding techniques to extract information out of your unstructured (PDFs, Videos, etc)
  • Use Language to SQL/code to extract from structured (databases)

Components for production GenAI

Table 4: Considerations for GenAI production
Feature Notes
Data Your unique data is the difference
Model Capabilities of the model
Prompt Instructions for the model

No time for these others today, but ask within multivac.sunholo.com

Summary - what is needed for production ready GenAI data?

  • Your data is key to your production GenAI application
  • Reconsider all of your existing data streams, both structured and unstructured
  • Embeddings are a new data type that allows more use cases above traditional search
  • Store those embeddings in a vector store that suits your application

3. Case Studies of GenAI Production

Multivac

Lets look at how Multivac works with prompts using Langchain’s Langsmith

amass.tech

Helping advance knowledge within life sciences

  • Unique parsing of life science formats
  • Helping life science professionals find research related to their own technology
  • Use public and private data to enable new discovery

Our New Energy

Summary

  1. Enabling GenAI offers new communication possibilities
  2. Your data is the key to unlocking its potential within your own business
  3. Embeddings unlock your existing data
  4. Allow your users to experiment with these new tools

Become part of the Multivac

  • Free Virtual Agent Computers (VACs) open-source code within this github repository
  • github.com/sunholo-data/vacs-public
  • We will host your own VACs

Thanks

  • Questions?
  • Ask here! https://multivac.sunholo.com
  • multivac@sunholo.com
  • linkedin.com/company/sunholo/
  • github.com/sunholo-data

Appendix - about Sunholo

Mark Edmondson - Founder

  • Founder of Holosun ApS from Nov 1st, 2023
  • Google Developer Expert - Google Cloud since 2015
  • MSci Physics, Kings College London
  • Wrote an O’Reilly book on Google Analytics 4 and Google Cloud integrations

code.markedmondson.me

Electric Sheep - Company Brain

  • An LLM bot, prototyped in the blog post.
  • Evolved into main executor agent
  • Infinite memory
  • Langchain Retrieval Augmented Generation (RAG) bot

Conversations with a bot

Voight-Kampff - Junior Developer

  • Writes and executes code based on prompts
  • Uses same GCP infrastructure as Electric Sheep
  • Interacts with other bots
  • openinterpreter.com bot

Watching a bot code

An army of bots

Sunholo aims to be a post-LLM company

  • Custom bots for each business function
  • Agents running in private secure environments
  • Private data mainly interacted with via LLMs

  1. Parsing input
    • LLM rephrasing
    • Image/Text/Audio
    • Prompt engineering

1. Parsing input

  1. The model
    • Cognition
    • Tailoring size of model to task
    • Finetuning (MLOps)

2. Cognition

  1. Document store
    • Source of truth
    • Data pipelines (DataOps)
    • Structured data (LLMs writing SQL?)
    • Unstructured data

3. Document store

  1. Vectorstore embeddings
    • A new datatype for most companies
    • New uses beyond LLMs
    • Embedding type
    • Chunking
    • Parsing of documents

4. Vectorstore

Enabling LLMOps on GCP

Open source LLM Agents

  • Langchain - modular LLM flows
  • LlamaIndex - advanced RAG
  • LiteLLM - proxy to standardise interacting with all LLMs, local and API based
  • Unstrucutured - easy parsing of documents to chunks
  • Autogen - Multiple agents talking to one another
  • OpenInterpreter - Agent executing its own code

LLMOps for Electric Sheep

  • Retrieval augmented generation (RAG)
  • Documentation is the new oil
  • All Sunholo documents, git repos, emails, notes, conversations, R&D etc.

Langchain ConversationalRetrievalChain

LLMOps for Voight-Kampff

  • Using LLMs to create code and scripts it then executes in a virtual environment
  • Non-interactive mode
  • Pick LLM to run locally or via API

Voight-Kampff and post-LLM software engineering

  • Executing Code within Docker containers
  • Terraform IaC gives agents superpowers
  • Best practices of GitOps/CI/CD/Testing/Documentation all enable agents

Voight-Kampff Triggers

  • Triggers:
    • CI/CD alerts to prompt agent fixing code
    • Scheduled Code development and refactoring
    • GitHub issue triage
  • It will build itself, the more systems are in code

Summary

  • This is just the beginning of an LLM revolution
  • post-LLM companies will use multiple agents
  • LLMOps builds on top of DevOps and MLOps
  • Sunholo offers LLMOps for GCP offering to accelerate your own use cases