2 posts tagged with "Ai_agent"

What I learned from the Mastering LLMs course

July 13, 2024 · 3 min read

AI engineer

I recently signed up and completed the Mastering LLMs course on Maven and this post is to share what I learned from the course.

(My certificate can be viewed online at https://maven.com/certificate/BBBPHevx)

This course was initially focused on LLM fine-tuning, but as more instructors joined, it somehow evolved into an LLM conference. Being based in Singapore, I usually could only watch the recordings later as the live sessions were often at inconvenient hours. That said, the advantage of watching the recordings is I could speed up playback by 1.5 to 2 times (depending on the speakers), pause when needed, rewind, and rewatch parts for better understanding.

Do not finetune until you prove that Prompt Engineering and RAG don't work for you

A common approach I was already familiar with is to start with Prompt Engineering, followed by Retrieval Augmented Generation (RAG), and then only consider fine-tuning. The reason is obvious -- fine-tuning is more costly, more work and has a slower feedback loop than the other two approaches. I already knew about this prior to the course but what I learned from the course is the added benefit of using Prompt Engineering and RAG results as baselines for fine-tuning.

Finetuning may yield worse results

Beware, fine-tuning isn’t always a guaranteed improvement! For instance, during one of the sessions, an example was shared where a model was finetuned on Slack chat data actually performed worse because the model kept trying to emulate the messaging style of the Slack users rather than actually answer questions based on the past chat conversations. (I was trying to recall the specific example but I couldn't find it after the course).

Axolotl + Cloud infrasturcture make finetuning very approachable

The course participants received in total $3,500 worth of compute credits for various providers (JarvisLabs, Modal, OpenAI, HuggingFace, Weights and Biases, etc.)

Thanks to the compute credits, I had the chance to experiment with fine-tuning a model using Axolotl on JarvisLabs and Modal. Modal is great for running code remotely with minimal setup. However, for fine-tuning, I preferred JarvisLabs as it offered me the control I needed without unnecessary complexities. I really don't need the bells and whitles that Modal provides.

Evals

This is interesting as the instructors have different opinions.

One basic method involves writing pytest unit tests to assert expected behaviors. This is unconventional for a software engineer like myself because unit tests usually avoid external calls, but here they recommend running such tests in production to act as guardrails.

There were also insights on using LLMs for evaluations, showcasing different perspectives on best practices.

A great insight I learned from one of the instructors is always try to turn what you're trying to evaluate into a binary classification. This makes it a lot easier to implement, reason about and evaluate in isolation.

Just use APIs (or No GPU before PMF)

While it’s intellectually stimulating to learn about setting up infrastructure for training and inference, many instructors emphasized the practicality of using cloud providers like Replicate and OpenAI for customer projects. Using APIs is great for prototyping and trying things out. Setting up your own infrastructure makes sense only if you have strict data privacy and security requirements.

Conclusion

This course was incredibly insightful and practical, providing a broad range of perspectives and hands-on experiences in the world of LLMs. The private Discord channel is a goldmine. Though the course lacks more structured and guided homework assignments that we can use to practice and assess our learning.

Controlled Agentic Workflows Are All You Need

July 3, 2024 · 3 min read

Sheng-Loong Su (SSL)

AI engineer

Understanding AI Agents and Agentic Workflows

My introduction to the concept of "AI agents" began with Reinforcement Learning (RL), a field where agents learn by interacting with and observing their environment to maximize a reward function. Some of the most prominent examples of RL-based AI agents include AlphaGo and self-driving cars.

Despite the success of these applications, developing a reward function that effectively guides RL agents towards their objectives is a significant challenge. Recent advancements in Large Language Models (LLM) and Large Multi-modal Models (LMM) have shifted the focus towards LLM/LMM-powered agents. Lilian Weng from OpenAI provides an excellent overview of LLM-powered agents in her blog post. For further motivation, Andrej Karpathy explained why you should work on AI agents in his talk.

Andrew Ng offers a compelling analogy for understanding LLMs: generating the next token is like writing an essay in one pass, whereas AI agents iteratively refine the output in a loop. He elaborates on this in his talk. Ng is optimistic about AI agents, as highlighted in his recent LinkedIn post, where he notes a preference for discussing agentic workflows over AI agents due to the reduced likelihood of marketing jargon.

From Unbounded AI Agents to Bounded Agentic Workflows

The initial surge in popularity for AI agent frameworks such as AutoGPT and BabyAGI, driven by the rise of LLMs, was short-lived. These frameworks struggled with overly general and open-ended tasks, leading to a decline in interest.

In a latent.space podcast, Mike Conover of Brightwave articulated the limitations of unbounded agentic behaviors:

"I don't think that unbounded agentic behaviors are useful. Instead, a useful LLM system is more like a finite state machine where the behavior of the system occupies one of many different behavioral regimes, making decisions about which state to occupy next to achieve the goal."

This perspective underscores the non-deterministic nature of LLM systems, which contrasts with the predictability of traditional coded systems. In real-world applications, reliability is paramount, necessitating controllable AI agents.

The Spectrum of Autonomy

Langchain illustrates the varying levels of autonomy for AI agents and agentic workflows, emphasizing the importance of controllability and reliability in practical applications.

Langchain's Levels of Autonomy (source: https://blog.langchain.dev/what-is-an-agent/)

2024 is the Year of AI Agents as Finite State Machines and Workflows

As a professional AI engineer, I've been closely following various startups in the AI agent space. A common trend I've noticed is the modeling of AI agents as Finite State Machines or through more deterministic workflows. Here are a few notable examples:

AlphaCodium's From Prompt Engineering to Flow Engineering
Gradient Labs's Building agentic workflows
Parcha's Agents aren't all you need
Peter Richens (Cleric)'s "How to Evaluate Your AI Agent"

Langchain is also heading in this direction, with the launch of LangGraph Cloud. They are actively educating the industry on creating reliable agents. Lance Martin's presentation on Building and Testing Reliable Agents provides excellent insights into these developments -- I wish this was available when I first started out.

Do not finetune until you prove that Prompt Engineering and RAG don't work for you​

Finetuning may yield worse results​

Axolotl + Cloud infrasturcture make finetuning very approachable​

Evals​

Just use APIs (or No GPU before PMF)​

Conclusion​

Understanding AI Agents and Agentic Workflows​

From Unbounded AI Agents to Bounded Agentic Workflows​

The Spectrum of Autonomy​

2024 is the Year of AI Agents as Finite State Machines and Workflows​