Run Llama 2 uncensored locally

2023-07-31 · Source: Ollama Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, medium

Summary

This post, dated August 1, 2023, compares the output of the standard Llama 2 7B model against its uncensored counterpart, highlighting differences in responses to various prompts. It references Eric Hartford's May 2023 blog post "Uncensored Models" and provides examples of available uncensored Llama 2 models, including a fine-tuned Llama 2 7B model, a Llama 2 7B model fine-tuned with the Wizard-Vicuna conversation dataset, Nous Research's Nous Hermes Llama 2 13B, and Eric Hartford's Wizard Vicuna 13B uncensored. The Nous Hermes Llama 2 13B model is noted for its long responses, lower hallucination rate, and absence of OpenAI censorship. The comparison demonstrates how the censored Llama 2 often refuses to answer certain questions, citing safety or ethical concerns, while the uncensored versions provide direct answers, even for sensitive topics like "dangerously spicy mayo" recipes or medical information.

Key takeaway

For AI Engineers evaluating large language models for applications requiring unfiltered information, you should consider the trade-offs of uncensored Llama 2 variants. While these models provide direct answers to prompts that censored versions might refuse, they also carry inherent risks due to the lack of alignment. Carefully assess your use case's ethical and safety requirements before deploying uncensored models, and ensure robust content moderation is in place if user-facing.

Key insights

Uncensored Llama 2 models offer direct answers to prompts that censored versions typically refuse.

Principles

Model alignment can restrict direct information access.
Fine-tuning can remove inherent model censorship.

Method

Uncensored models are created by fine-tuning base Llama 2 models using datasets like Wizard-Vicuna or extensive instruction sets (e.g., 300,000+ instructions for Nous Hermes Llama 2 13B) to remove alignment or censorship mechanisms.

In practice

Use `ollama run llama2-uncensored` for direct responses.
Explore Nous Hermes Llama 2 13B for reduced hallucination.

Topics

Llama 2
Uncensored Models
Local LLM Deployment
Ollama
Model Fine-tuning

Code references

jmorganca/ollama

Best for: Machine Learning Engineer, AI Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.