Deepseek

Website: https://github.com/deepseek-ai/DeepSeek-LLM

Options: Local/Offline

Price: Free (downloadable)

The Deepseek LLMs are open-source models trained on both Chinese and English sources. Their largest LLM is 67B parameters with strong coding, math, and reasoning abilities. If your hardware can’t handle 67B parameters, there is also a smaller 7B parameter version you can download.

Deepseek also recently released a 1.3B parameter multimodal LLM called Janus. This smaller LLM has image recognition and generation capabilities.

Some more info from their website:

Superior General Capabilities: DeepSeek LLM 67B Base outperforms Llama2 70B Base in areas such as reasoning, coding, math, and Chinese comprehension.

Proficient in Coding and Math: DeepSeek LLM 67B Chat exhibits outstanding performance in coding (HumanEval Pass@1: 73.78) and mathematics (GSM8K 0-shot: 84.1, Math 0-shot: 32.6). It also demonstrates remarkable generalization abilities, as evidenced by its exceptional score of 65 on the Hungarian National High School Exam.

Mastery in Chinese Language: Based on our evaluation, DeepSeek LLM 67B Chat surpasses GPT-3.5 in Chinese.

Tweets by DeepseekAI

Reddit discussions about Deepseek

12/02 – Open-weights AI models are BAD says OpenAI CEO Sam Altman. Because DeepSeek and Qwen 2.5? did what OpenAi supposed to do!

11/28 – Janus, a new multimodal understanding and generation model from Deepseek, running 100% locally in the browser on WebGPU with Transformers.js!

11/20 – Deepseek announces R1 lite , beats o1 preview on some benchmarks

11/20 – Is this how chain of thought model works?

11/10 – is 1tb enough ram for deepseek v2.5 fp8 with maximum context length?

10/18 – DeepSeek Releases Janus – A 1.3B Multimodal Model With Image Generation Capabilities