2 Comments

Overall, Moshi's advantages are clear: compared to other voice dialogue bots, Moshi is more human-like, with great immediacy, quick responses, and rich expressiveness.

However, unlike GPT-4o, Moshi lacks the ability to handle multiple languages.

Currently, Moshi's core generation capabilities are not as strong as Llama3 8B, but it can potentially be used with RAG or fine-tuned for specific tasks.

In summary, Moshi has shown me the true potential of natural communication between AI and humans.

Supporting more voices and languages may just be a matter of time, and its potential as a coach, companion, or in various role-playing applications is very exciting.

Expand full comment

I agree. Moshi is off to a great start, especially with the real-time capabilities and emotional range. It may not yet match the linguistic versatility of larger models, but sometimes specialization in startups and their targeted innovations can really challenge industry leaders, which makes this exciting indeed.

Expand full comment