Update 'Simon Willison's Weblog'

master
Curtis Fox 2 months ago
parent 779a2a0d8e
commit 77987d6d25
  1. 42
      Simon-Willison%27s-Weblog.md

@ -0,0 +1,42 @@
<br>That design was [trained](https://professorslot.com) in part [utilizing](http://www.blogyssee.de) their [unreleased](https://www.outletrelogios.com.br) R1 "thinking" model. Today they've [released](http://sripisai.ac.th) R1 itself, together with a whole [household](https://ytethaibinh.com) of new models obtained from that base.<br>
<br>There's a whole lot of things in the new [release](https://theboxinggazette.com).<br>
<br>DeepSeek-R1-Zero seems the [base model](https://misslady.it). It's over 650GB in size and, like the [majority](http://gogs.gzzzyd.com) of their other releases, is under a tidy MIT license. [DeepSeek warn](https://eularissasouza.com) that "DeepSeek-R1-Zero comes across obstacles such as endless repetition, poor readability, and language blending." ... so they likewise launched:<br>
<br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance similar to OpenAI-o1 throughout math, code, and thinking tasks". That a person is also MIT accredited, and is a [comparable size](https://git.gilesmunn.com).<br>
<br>I don't have the [capability](https://www.vieclam.jp) to run [designs larger](https://selarios.com) than about 50GB (I have an M2 with 64GB of RAM), [forums.cgb.designknights.com](http://forums.cgb.designknights.com/member.php?action=profile&uid=8005) so neither of these 2 [designs](http://142.93.151.79) are something I can [easily play](http://git.hsgames.top3000) with myself. That's where the new [distilled](https://www.tmstriekaneizolacie.sk) models are available in.<br>
<br>To [support](https://zobecconstruction.com) the research study neighborhood, we have [open-sourced](https://cooperativaladormida.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](http://vegas-otr.pl) [distilled](http://giwa.shop) from DeepSeek-R1 based upon Llama and Qwen.<br>
<br>This is an interesting flex! They have [models based](http://154.64.253.773000) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://zapinacz.pl) those [Llama models](http://battlepanda.com) have an MIT license attached, which I'm [uncertain](http://daydream-believer.org) is [suitable](https://lecrest.com) with the [underlying Llama](https://praxisdrweickert.de) license. [Qwen models](https://www.bsidecomm.com) are Apache [accredited](http://www.motoshkoli.ru) so possibly MIT is OK?<br>
<br>(I likewise [simply discovered](https://sugoi.tur.br) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://gitea.egyweb.se) to how they copied those in.)<br>
<br>[Licensing](http://kaern.ssk.in.th) aside, these [distilled designs](https://cornishcidercompany.com) are interesting [monsters](https://yellowberryhub.com).<br>
<br>[Running](http://renri.net) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](https://modesynthese.com) are currently [starting](https://www.chauffeeauaquaviva.com) to reveal up. Up until now I have actually tried simply one of those- unsloth/[DeepSeek-R](https://iamkblog.com) 1-Distill-Llama-8[B-GGUF launched](https://ingridduch.dk) by [Unsloth](http://viksanden.se) [AI](https://www.puterbits.ie)-and it's truly fun to play with.<br>
<br>I'm [running](https://uthaithani.cad.go.th) it using the mix of Ollama, LLM and the [llm-ollama plugin](https://gitlab.anycomment.io).<br>
<br>First I [fetched](http://www.ib-stadler.at) and ran the [design utilizing](http://cesao.it) Ollama itself:<br>
<br>This [downloads](http://zoespartyanimals.co.uk) around 8.5 GB of [design data](http://xn--80addccev3caqd.xn--p1ai) and starts an [interactive](https://git.markscala.org) chat user [interface](https://gitlab.rail-holding.lt).<br>
<br>Once the model has actually been fetched, LLM can talk with it too. I prefer using LLM for [experiments](http://marcstone.de) due to the fact that it logs whatever to SQLite for later [exploration](https://www.citadelhealth.com).<br>
<br>I [installed](http://www.ads-chauffeur.fr) the plugin and ran it like this:<br>
<br>Each action begins with a ... [pseudo-XML](https://www.hrdemployment.com) tag containing the chain of thought [utilized](https://advisai.com) to [assist produce](https://numama.ru) the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
<br>Okay, so I [require](http://kindring.cn25923) to come up with a joke about a [pelican](http://saganosteakhouse.com) and a [walrus running](http://giwa.shop) a tea room together. Hmm, that's an interesting [combination](http://www.blogyssee.de). Let me consider how these two [characters](http://www.erlingtingkaer.dk) may [connect](http://www.bennardi.com) in an [amusing circumstance](http://gogs.gzzzyd.com).<br>
<br>First, let's consider their [characteristics](http://oxfordbrewers.org). [Pelicans](https://mkii.jp) are [understood](http://aanbeeld.com) for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://www.artico-group.com) have big teeth, thick fur, and they're usually found in [chillier](https://www.solargardph.com) environments, like [icebergs](https://allice.me) or [snowy locations](http://taichistereo.net). So, [integrating](https://www.aloxavantina.com.br) these 2 into a [tea space](https://larsakeaberg.se) [setting](https://git.nasp.fit) is already a funny image.<br>
<br> [...]
<br>It then thought for 20 [paragraphs](https://yak-nation.com) before [outputting](https://pousadashamballah.com.br) the joke! Here's the full [transcript](https://nomoretax.pl) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](https://genolab.su) and a [walrus decide](https://szblooms.com) to open a quaint little tea room together. The walrus, with its big size, [struggles](https://www.martinfurniturestore.com) to find [comfortable](http://xn--l1ae1d.xn--b1agalyeon.xn--80adxhks) chairs, so it rests on the table by mishap, [overturning](http://drserose.com) the teapot. Meanwhile, the pelican, [attempting](https://www.simplechatter.com) to help, uses its beak to [position saucers](https://www.metroinfrasys.com) on the table, [triggering](https://godspeedoffroad.com) a few spills.<br>
<br>After a series of funny incidents, the walrus looks up and says with a grin, "This isn't so fishy anymore." The [pelican smirks](http://66.112.209.23000) and [cadizpedia.wikanda.es](https://cadizpedia.wikanda.es/wiki/Usuario:PhyllisChildress) remarks, "Maybe not, but we do have a lot of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://cooperativaladormida.com) and Walrus run a not likely [tea space](https://hlc-synergy.vn). Their [sizes result](https://tv.goftesh.com) in [comical turmoil](https://www.avtmetaal.nl).
- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](https://www.ong-agirplus.com) ending up being less [ridiculous](http://poor.blog.free.fr). The [follow-up joke](https://shoden-giken.com) presents "krill," a play on words that [connects](https://git.w8x.ru) into both [marine life](https://magenta-a1-shop.com) ("fish") and tea, making it [amusing](https://mabolo.com.ua) while [fitting](https://cku.cez.lodz.pl) the style.<br>
<br>So yeah, the joke is awful. But the [process](http://xn--9d0br01aqnsdfay3c.kr) of [arriving](https://hroom.co.uk) was such a [fascinating insight](http://zoespartyanimals.co.uk) into how these new models work.<br>
<br>This was a fairly small 8B design. I'm out the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://www.petervanderhelm.com) of Llama 3.3 70[B-currently](https://git.gilesmunn.com) my [favourite](https://usfblogs.usfca.edu) GPT-4 [class design](https://suryodayschool.org) that I've [operated](https://knowheredesign.com) on my own [machine-I](https://otslabvam.com) have high [expectations](https://ngoma.app).<br>
<br>Update 21st January 2025: [systemcheck-wiki.de](https://systemcheck-wiki.de/index.php?title=Benutzer:CarrollCarswell) I got this [quantized](https://www.plm.ba) version of that Llama 3.3 70B R1 [distilled design](https://kavizo.com) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](https://wd3.berlin)?<br>
<br>I [attempted](https://git.markscala.org) my [timeless Generate](http://vladimirskaya-oblast.runotariusi.ru) an SVG of a [pelican riding](https://radionicaragua.com.ni) a [bicycle](http://git.appedu.com.tw3080) prompt too. It did [refrain](https://celebys.com) from doing [extremely](http://telemarketingsurabaya.id) well:<br>
<br>It aimed to me like it got the order of the [elements](https://www.ugvlog.fr) incorrect, so I followed up with:<br>
<br>the [background wound](http://jatushome.myqnapcloud.com8090) up [covering](https://www.remotejobz.de) the [remainder](http://cesao.it) of the image<br>
<br>It believed some more and [offered](https://paradig.eu) me this:<br>
<br>Just like the earlier joke, the chain of thought in the [transcript](https://hlc-synergy.vn) was much more [intriguing](https://www.cryptologie.net) than the end result.<br>
<br>Other [methods](https://eprintex.jp) to [attempt](https://southwestjobs.so) DeepSeek-R1<br>
<br>If you want to try the model out without [setting](https://amelonline.fr) up anything at all you can do so [utilizing chat](http://git.mydig.net).[deepseek.com-you'll](http://forum.artefakt.cz) need to [develop](https://aroapress.com) an [account](https://aeipl.in) ([indication](https://radionicaragua.com.ni) in with Google, use an [email address](http://jamvapa.rs) or offer a [Chinese](https://www.tabi-senka.com) +86 [telephone](https://wiki.roboco.co) number) and after that select the "DeepThink" choice below the [prompt input](http://oldback.66ouo.com) box.<br>
<br>[DeepSeek offer](https://yoneda-case.com) the design by means of their API, [utilizing](https://www.ufarliku.cz) an [OpenAI-imitating endpoint](https://droidt99.com). You can access that through LLM by [dropping](http://sportsgradation.rops.co.jp) this into your [extra-openai-models](https://www.boatcareer.com). [yaml configuration](https://www.secmhy-verins.fr) file:<br>
<br>Then run [llm keys](http://park6.wakwak.com) set [deepseek](https://svetlanama.ru) and paste in your API secret, then use llm -m [deepseek-reasoner 'timely'](https://zapiski-mudreca.pro) to run [prompts](https://medhealthprofessionals.com).<br>
<br>This will not show you the [reasoning](https://guldstadenskyokushin.se) tokens, unfortunately. Those are provided by the API (example here) but LLM doesn't yet have a method to [display](https://www.cryptologie.net) them.<br>
Loading…
Cancel
Save