GenAI in
Production

Mark Edmondson

Chapters

The post-GenAI Era
What is needed for production ready GenAI
Case Studies of GenAI in Production

1. The post-GenAI Era

GenAI is transformational

A new era is starting for human creativity, communication and understanding.

Its impact should be comparable to the printing press, telephone or the internet

If we freeze the capabilities of GenAI to what they can do only today, we will still see transformational change.

But GenAI still has much more potential

Gemini Ultra is trained on 30+ trillion tokens.

A human would take 100,000 years to read the same amount of material.

We perform better than the most advanced GenAI models on a fraction of the reading material.

Vision is much higher bandwidth {than text}: about 20MB/s… In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.

Yann LeCun, Chief AI Scientist at Meta

The GenAI we have today is the worse it will ever be

And today it can do this

We now have a new meta layer for interacting with information.

Language → GenAI → Python/SQL → C++ → Binary → …

Poets → Programmers → Engineers → Physicists → …

To take advantage of emergent properties of information we will need poets and philosophers rather than programmers or engineers to interact with data

How we interact with data will be transformational

But improving communication between people will have biggest impact

Biggest Impact for GenAI in Production

Enable poets to talk to databases
Enable people to communicate better with each other

Lets try it

Most presentations people only remember 20% of the content.

Can we improve this using GenAI?

multivac.sunholo.com

2. What is needed for production ready GenAI

Components for production GenAI

Table 1: *Considerations for GenAI production*
Feature	Notes
Data	Your unique data is the difference
Model	Capabilities of the model
Prompt	Instructions for the model

More components for production GenAI

Table 2: *Seconday Considerations for GenAI production*
Feature	Notes
UI	How people interact with your output
Monitoring	Checking performance
Authentication	Protecting your data
Scaling	Keeping up with demand
Speed	New data updates, time to first token, etc

Data

Key differentiator

Unstructured data like PDFs, images, docs can be extracted and embedded to be useful for GenAI techniques such as RAG
Structured data within databases can be queried by GenAI agents via generated code
Domain knowledge locked up within your collegues

Treasure your data - it is the unique offering your company can bring to GenAI

Personal data is much more valuable now

Past emails, tweets, photos, messages, SMS take on a new power to communicate who you are.

Creation of digital Philosophical Zombies are possible

We can already transfer your likeness

Will we be able to transfer your behaviour too?

Embeddings are a new data type

What do vector embeddings measuring semantic meaning - mean?

[0.4344, 1.232323, 0.232323, -2.1, ... ]
["Redness", "Fear", "Loving", "Cats", ...] ??
Convincing representations of humanity can be contained within ~1000 dimensions

Use cases for embedding

Find similar movies/pictures/people - recommendation engines
Match user question to a document chunk (RAG)
Match user question history to another user’s data (Profile matching)
Find anti-match between uploaded picture and uploaded description (Clean up dirty data)

Vector stores (Google)

*Vectorstores on GCP*
Vector Store	Notes
AlloyDB	PostgreSQL, pgvector and in-database embeddings
BigQuery	New! Your data probably in there already
Vertex AI Search and Conversation	Abstracts away embedding, scales up
Vertex Vector Search	Enterprise usage, top performance but costs $$$

Vector stores (Non-Google)

Table 3: *non-Google Vectorstores*
Vector Store	Notes
Supabase	Cheap and open-source but tricky to host and slows down
LanceDB	Cheap and quick, backed up by Cloud Storage, immature
Pinecone	Popular 3rd party hosted service
Qdrant	Rust based fast enterprise service

How to get GenAI data for a company?

Look at all your current methods of communicating ideas - workshops, emails, meetings. Can they be improved?
Unlock your data - use embedding techniques to extract information out of your unstructured (PDFs, Videos, etc)
Use Language to SQL/code to extract from structured (databases)

Components for production GenAI

Table 4: *Considerations for GenAI production*
Feature	Notes
~~Data~~	~~Your unique data is the difference~~
Model	Capabilities of the model
Prompt	Instructions for the model

No time for these others today, but ask within multivac.sunholo.com

Summary - what is needed for production ready GenAI data?

Your data is key to your production GenAI application
Reconsider all of your existing data streams, both structured and unstructured
Embeddings are a new data type that allows more use cases above traditional search
Store those embeddings in a vector store that suits your application

3. Case Studies of GenAI Production

Multivac

Lets look at how Multivac works with prompts using Langchain’s Langsmith

amass.tech

Helping advance knowledge within life sciences

Unique parsing of life science formats
Helping life science professionals find research related to their own technology
Use public and private data to enable new discovery

Our New Energy

Summary

Enabling GenAI offers new communication possibilities
Your data is the key to unlocking its potential within your own business
Embeddings unlock your existing data
Allow your users to experiment with these new tools

Become part of the Multivac

Free Virtual Agent Computers (VACs) open-source code within this github repository
github.com/sunholo-data/vacs-public
We will host your own VACs

Thanks

Questions?
Ask here! https://multivac.sunholo.com
multivac@sunholo.com
linkedin.com/company/sunholo/
github.com/sunholo-data

Appendix - about Sunholo

Mark Edmondson - Founder

Founder of Holosun ApS from Nov 1st, 2023
Google Developer Expert - Google Cloud since 2015
MSci Physics, Kings College London
Wrote an O’Reilly book on Google Analytics 4 and Google Cloud integrations

code.markedmondson.me

Electric Sheep - Company Brain

An LLM bot, prototyped in the blog post.
Evolved into main executor agent
Infinite memory
Langchain Retrieval Augmented Generation (RAG) bot

Conversations with a bot

Voight-Kampff - Junior Developer

Writes and executes code based on prompts
Uses same GCP infrastructure as Electric Sheep
Interacts with other bots
openinterpreter.com bot

Watching a bot code

An army of bots

Sunholo aims to be a post-LLM company

Custom bots for each business function
Agents running in private secure environments
Private data mainly interacted with via LLMs

Parsing input
- LLM rephrasing
- Image/Text/Audio
- Prompt engineering

1. Parsing input

The model
- Cognition
- Tailoring size of model to task
- Finetuning (MLOps)

2. Cognition

Document store
- Source of truth
- Data pipelines (DataOps)
- Structured data (LLMs writing SQL?)
- Unstructured data

3. Document store

Vectorstore embeddings
- A new datatype for most companies
- New uses beyond LLMs
- Embedding type
- Chunking
- Parsing of documents

4. Vectorstore

Enabling LLMOps on GCP

Open source LLM Agents

Langchain - modular LLM flows
LlamaIndex - advanced RAG
LiteLLM - proxy to standardise interacting with all LLMs, local and API based
Unstrucutured - easy parsing of documents to chunks
Autogen - Multiple agents talking to one another
OpenInterpreter - Agent executing its own code

LLMOps for Electric Sheep

Retrieval augmented generation (RAG)
Documentation is the new oil
All Sunholo documents, git repos, emails, notes, conversations, R&D etc.

Langchain ConversationalRetrievalChain

LLMOps for Voight-Kampff

Using LLMs to create code and scripts it then executes in a virtual environment
Non-interactive mode
Pick LLM to run locally or via API

Voight-Kampff and post-LLM software engineering

Executing Code within Docker containers
Terraform IaC gives agents superpowers
Best practices of GitOps/CI/CD/Testing/Documentation all enable agents

Voight-Kampff Triggers

Triggers:
- CI/CD alerts to prompt agent fixing code
- Scheduled Code development and refactoring
- GitHub issue triage
It will build itself, the more systems are in code

Summary

This is just the beginning of an LLM revolution
post-LLM companies will use multiple agents
LLMOps builds on top of DevOps and MLOps
Sunholo offers LLMOps for GCP offering to accelerate your own use cases