Let's not confuse the company with the country by over-fitting a narrative. Popular media is reenforcing hatred or anything that sponsors them, especially to weaker groups. Less repercussions and more clicks/money to be made I guess.
While Politicians may hate each other, Scientists love to work with other aspiring Scientists who have similar ambitions and the only competition is in achieving measurable success and the reward it means to the greater public.
Without any bias, but it's genuinely admirable when companies release their sources to enable faster scientific progress cycles. It's ironic that this company is dedicated to finance, yet shares their progress, while non-profits and companies dedicated purely to AI are locking all knowledge about their findings from access.
Are there other companies like DeepSeek that you know of that commonly release great papers? I am following Mistral already, but I'd love to enrich my sources of publications that I consume. Highly appreciated!
Is DeepSeeks openness in part to reduce the big American tech companies?
Why do you imply malice in OSS companies? Or for profit companies opensourcing their models and sourcecode?
I tend to believe this is a "commoditize your complement" strategy on Meta's part, myself. No idea what Deepseek's motivation is, but it wouldn't surprise me if it was a similar strategy.
What's wrong with China? They're wonderful in the OSS ecosystem.
There's also significant alpha in releasing open weights models. You get to slow down the market leaders to make sure they don't have runaway success. It reduces moats, slows funding, creates a wealth of competition, reduces margin. It's a really smart move if you want to make sure there's a future where you can compete with Google, OpenAI, etc. There's even a chance it makes those companies bleed a little. The value chain moves to differently shaped companies (tools, infra) leaving space for consumer and product to not necessarily be won by the "labs" companies.
Quickest way to show this:
- Table 2, top of page 7
- Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
- Gemma 2 27B, with all their interventions, has 86/64/69.
- Gemma 2 27B, with all their interventions, sampled 32 times, is at 90.4/67.2/70.3.
- Gemma 2 27B came out in...June 2024. :/
Quick heuristics employed here:
- What models did they compare against? (this isn't strictly an issue, the big screaming tell is "What models did they compare against compared to their last N papers?"
- How quickly does the paper have to move towards N samples, and how big does N get before they're happy enough to conclude? (32). How much does that improve performance on their chosen metric? (1.8%)
Paste in a snippet from a book and ask the model to continue the story in the style of the snippet. It's surprising how bad most of the models are.
Grok-3 comes in a close second, likely because it is actually DeepSeek R1 with a few mods behind the scenes.
2) Grok-3 comes out a month after DeepSeek R1 was open sourced. I think Grok-3 is DeepSeek R1 with some added params and about a month of training on the giant cluster, possibly a bit of in-house secret sauce added to the model or training methodology.
What are the chances that XAI just happened to have a thinking model close to as good as revolutionary DeepSeek but happened to launch it 30 days later?
It was both smart and pragmatic for XAI to simply use the best available open source stuff and layer their own stuff on top of it. Imagine they doubled the parameter count and trained it for 30 days, that would not even use half of the GPU power!
It is not. I remember Karpathy being really excited about the "1 million gpt personas" dataset and highlighted it as a way to avoid reward hacking in RLAIF. That was 3-6 months ago I believe.
Of course paper / code / weights beats idea, and it's exciting to see how far this can go.