• benbayard 2 days ago

    I worked on a project at Patreon to do something similar many years ago. We used a babel plugin to do the translation with as few changes to the code base as possible.

    This application does not handle many important considerations for translation. Such as pluralization. In many languages there are multiple more ways to pluralize words. Russian has many different ways to pluralize. More problems will occur when you have words within words.

    There is no way to do this without working on changing your codebase. I think what would work better is if you can create ICU compliant JSON.

    How are you supposed to have this work in Japanese when it's RTL instead of LTR? That will require UI and localization challenges.

    I think using AI to do translation will be fine for startups, but I'm not sure how well this will work on real production apps. I think significant work will be required to actually get this working:

    https://stelejs.com

    • maxpr 2 days ago

      I think modern Japanese is LTR, but besides that - I believe the project you worked in the past solves an important problem.

      Besides pluralization (and e.g. Arabic having 6 forms zero/one/two/few/many/other), turned out number internationalization and currency conversion are big next challenges the community wants to address next.

      > create ICU compliant JSON.

      I think this is an excellent idea. I have a feeling in the future we will need ICU v2.0, sort of, but unfortunately it's an extremely hard problem and the probability to fail is pretty high (looks like project fluent is not actively maintained anymore: https://github.com/projectfluent/fluent)

      • koito17 2 days ago

        > I think modern Japanese is LTR

        Depends on the medium. EPUB 2.0 (and later revisions) specifically supports vertical RTL text for use-cases like Japanese novels. Additionally, many novel reading websites support toggling between vertical and horizontal text. Vertical text implicitly switches to RTL text direction.

        Of course, this is not a general use case. But saying "modern Japanese is LTR" is not quite accurate. Computer / digital media is commonly LTR and horizontal, but a single step outside exposes one to vertical text, and RTL text in physical newspapers, literature, comics, a subset of textbooks, and handwritten signs that try to look "traditional" in style.

        • rafram a day ago

          Yeah, vertical RTL is very common in Japanese books.

        • benbayard 2 days ago

          All good points, thank you for the information.

      • thrance 2 days ago

        Please, please, please, do not use auto translators to localize your pages. There's nothing worse than an half-assed translation that was obviously made by a machine.

        Auto-translated sentences are awkward and I feel extremely insulted every time someone chooses to impose this garbage watered-down version of their products on me.

        Hire a translator or don't localize your site.

        • maxpr 2 days ago

          > half-assed translation that was obviously made by a machine

          That's exactly what we want to solve.

          Here's the thing:

          It turned out, AI translates better than humans when provided with enough correct context. Both macro context, like what the product does, and micro context, like what the component represents on screen and how it relates to other components.

          As a result, algorithms extract the needed contextual hints, and a correctly configured LLM model finishes the rest.

          • rafram 2 days ago

            > AI translates better than humans when provided with enough correct context

            This is definitionally untrue. Humans define human language; a "correct" translation is one that an experienced translator would write.

            • marc_abonce 2 days ago

              I assume that they mean that an LLM is better at translating than a high-rotation, contracted (i.e. not employed, no benefits, no stability) team of MTurk-like translators who are paid cents per translated token, are given little to no context of what they're translating beyond the individual sentence, and are dealing with 10 projects at once as quickly as possible because otherwise they wouldn't be able to make a decent wage.

              But that doesn't mean that LLMs have become as good as human translators, but rather that corporations have set up a system that treats translators as if they were machines and then we act surprised when machines are better at acting machine-like than humans.

            • makeitdouble 2 days ago

              What I always wondered: why is your automatic translation better than the browser's or the user's own auto translation ?

              In particular, having it user side makes it fully opt-in, and the user has full control and will accept the quality as it is, whereas your service-side auto translate is your responsibility when shit hits the fan.

              • maxpr 2 days ago

                Historically, there are a couple of reasons why developers prefer to i18n their app instead of letting users do that.

                1. PostHog has a great tool that lets developers "watch the video" of how users interact with their app's UI. Turns out, automated chrome plugins/built-in features often mess up the HTML so much that apps simply crash. I've seen devs adding translate="no" [0] in bulk to their apps because of this. Therefore, Chrome's built-in auto translation isn't the best solution (yet). 2. Product/marketing folks want users to see content in their language immediately after landing on the website 3. App developers often want to control what users see, update it, rephrase it

                If I had to guess, I'd say the approach Lingo.dev Compiler package is using today should end up being a natural part of frameworks like Remix, Next.js and Vue.

                [0] https://www.w3schools.com/tags/att_translate.asp

                • makeitdouble 17 hours ago

                  PostHog didn't cross my radar before, so it was an interesting discovery.

                  I am quite surprise the apps crashes on translation, but then there is a whole user action analytics engine running in parrallel, so it sounds like a problem of having too many things running at the same time ?

                  Companies that want high control on their translations have already the choice to straight translate their i18n strings, AI or not. That sound to me like a better choice and not much more onerous than the half-baked post filtering we're seeing in this article.

                  I'd argue if we're going the AI route, having it extract the user text and push it into i18n resources could be a better approach ?

              • thrance 2 days ago

                Do you speak more than one language? Because claiming "AI translates better than human" is ludicrous. Anyone with a modicum of experience on browsing the internet can immediately tell when a page was auto-translated, based on how awkward or outright nonsensical some of the text can be.

                Also, I doubt other translators work by localizing <p> elements one by one, without context. The entire HTML is localized, semantic and all. I fail to see how translating JSX instead of HTML can improve the situation much.

                • maxpr 2 days ago

                  1. I do speak more than one language. I agree with your point that perfect localization requires seeing a <p> element in the broader context of the parent component, parent page, the product, the industry, the audience and their expected level of tech savviness, the culture, and eventually preferences regarding tone of voice.

                  Typically, a human would need to be educated about these aspects to translate perfectly. In the future, in my opinion, humans will be educating—or configuring—the AI to do that.

                  The "localization compiler", which we've built to solve our own problem in the first place, is just a handy bunch of scripts aimed to help extract needed contextual hints that would then be passed on to the [preconfigured] LLM for translation, and it should go beyond just the names of the tags.

                  FWIW, by saying AI translations I don't mean Google Translate or machine translation tech that browsers come with. I mean actual foundational AI models that OpenAI, Anthropic, Google, Meta, Mistral and others are developing.

                  The difference is significant, and there's no worse thing than half-assed robotic translation produced by an MT.

                  2. Regarding "AI translates better than humans." I think some commenters have already mentioned this, but the point is that outsourced translations can be worse than what LLMs can produce today, because when translations are outsourced, nobody seems to care about educating the native speaker about the product and the UI. And localizing the UI, which consists of thousands of chunks of text, is nontrivial for a human. On the flip side, a correctly configured LLM, when provided with enough relevant contextual tips, shows outstanding results.

              • maelito 2 days ago

                And just say "sorry" to all the people asking you for translation of your great product ?

                • amake 2 days ago

                  If it's worth doing, then it's worth doing correctly.

                  If not, then don't.

                  • makeitdouble 2 days ago

                    They're asking for a reliable translation, otherwise they'd just let their browser auto-translate the page.

                    • StefanBatory a day ago

                      Whenever I see automatic translation into my language, I leave the page as most of the time it's unreadable. Microsoft docs is the worst offender.

                      Yeah, I'd prefer no translation over bad translation.

                      • thrance 2 days ago

                        Hire a translator then, don't give us a garbage localization and call it a day.

                        It's like if someone requested a feature and you gave them the first thing an LLM spewed out when asked to code it, without review.

                        You should at least have someone on your team be able to understand the program's output and correct it when things inevitably sound off.

                      • thierrydamiba 2 days ago

                        Would you literally rather have nothing than a poor translation?

                        • autumnstwilight a day ago

                          If I want a machine translation of something, I can throw the text into DeepL myself. Getting text that was machine translated Japanese<->English with no access to the original is pretty much never what I want, and yet sites insist on doing it based on my IP address or system language.

                          Also, if a website offers a language I take that as an indication that the organization is prepared to deal with speakers of that language/people from the country in question (customer support, shipping, regional/legal concerns). Whether the site offers a certain language is a useful signal to figure this out quickly, and if poking around reveals machine translation into dozens of languages, it's a signal that they're probably not prepared to provide reliable services/support.

                          • rafram 2 days ago

                            In some cases, yes. A non-native but passable speaker/reader of English might prefer to struggle through the English UI themselves than deal with your bad AI-generated translation. If they do it themselves, at least they can skip the parts they know, see multiple possible translations, and take advantage of their partial knowledge of the UI language. If you dump everything into an LLM with no knowledge of your target languages at all, you’re setting yourself up for disaster when a critical string is mistranslated.

                            • thomasfromcdnjs 2 days ago

                              There are some passionate naysayers in here.

                              I love "Translate this page" in Chrome, better than nothing.

                              • amake 2 days ago

                                Yes.

                                • thrance 2 days ago

                                  Yes, very much so.

                                  If you're bilingual you must know this feeling of reading an awful translation; of knowing someone wanted to offer their product to people speaking your language but couldn't be bothered to do it well, and so used google translate and called it a day, thinking those dumb users won't notice the slop they're feeding them. Fuck that.

                              • cluckindan 2 days ago

                                This needs to integrate with translation management formats/services instead of an LLM—it might work for some cases but will absolutely butcher jargon translations! In its current state it is worse than useless for sites managing content geared for technical/professional audiences.

                                • maxpr 2 days ago

                                  Perhaps I could've communicated this better, but we've built Lingo.dev Compiler for web apps and user interfaces, not for technical/professional content.

                                  And since we had to exclude certain terms like "Lingo.dev Compiler" itself from i18n, we've shipped support for data-lingo-skip and data-lingo-override-<locale-code> as well.

                                  Regarding using LLMs for production content localization, I recommend checking out how Reddit translates their entire user-generated content base in 35 languages using AI:

                                  https://techcrunch.com/2024/09/25/reddit-is-bringing-ai-powe...

                                  If it already works great for Reddit today, I believe it's safe to assume it will become accessible to the wider audience soon as well.

                                  • cluckindan 2 days ago

                                    But what if the web app has an user interface for technical/professional content?

                                • exhaze 2 days ago

                                  Cool project! I built a similar tool [0] last year, but:

                                  1. Targeting fbt (Meta's internal i18n tool)

                                  2. Used CST (<3 ast-grep) instead of AST - really useful here IMO esp. for any heuristic-based checks.

                                  3. Fun fact: this was made entirely on my phone (~2.5h) while I was walking around Tokyo. Voice prompting + o1-pro. Why? My friend was working on porting fbt to TS and said he was planning to build this. I wanted to one-up him + convince him to start using LLMs =)

                                  One thing you should be aware of is that for at least Japanese, localization is far from just translating the text. There are lots and lots of Japan-specific cultural nuances you have to take into account for web users and even down to actually just having an entirely different design for your landing page often because those you'll find those just convert better when you know certain things are done that are typically not done for you know non-Japan websites.

                                  Notta (multi-lingual meeting transcriptions + reports) is a great example if you compare their Japanese [1] and English [2] landing pages.

                                  Note how drastically different the landing pages are. Furthermore, even linguistically, Japanese remains a challenge for proper context-dependent interpretation. Gemini 2.5 actually likely performs best for this thanks to Shane Gu [3], who's put in tons of work into having it perform well for Japanese (as well as other "tough" languages)

                                  [0] https://github.com/f8n-ai/fbtee-migrate

                                  [1] https://www.notta.ai (Japanese version)

                                  [2] https://www.notta.ai/en (English version)

                                  [3] https://x.com/shaneguML

                                  • maxpr 2 days ago

                                    Thanks! =)

                                    > localization is far from just translating the text

                                    For sure, that's spot on.

                                    What I'm excited about the most is that linguistic/cultural aspects are close to being solved by LLMs, including Gemini 2.5 that's got a huge performance boost vs the previous iteration. So, the automated approaches make more sense now, and have a chance of becoming the default, reducing i18n maintenance down to zero - and as a dev I can't be not excited about that.

                                    P.S. fbt is great by the way, as is the team behind it. It's a shame it's archived on GitHub and isn't actively maintained anymore.

                                  • halflife 2 days ago

                                    I thought this was awesome until you included an LLM into the mix.

                                    I hate the current react i18n solutions, and the fact that they only work in runtime, as opposed to Angular’s build time i18n solution.

                                    If your compiler could plugin to existing localization workflows in large organizations to at would be great (ie: extraction, load from configuration).

                                    • maxpr 2 days ago

                                      Thanks for the perspective!

                                      We support larger org workflows with the Lingo.dev Engine product, but that's not the point: Lingo.dev Compiler is unrelated to that, 100% free and open source.

                                      We started with a thought - what if i18n is actually meant to be build-time, LLM-powered, and that's enough for it to be precise? Not today, but in the future, it feels like this type of solution could elegantly solve i18n at scale, in software, as opposed to the existing sophisticated workflows.

                                      WDYT?

                                      • halflife a day ago

                                        That’s fine for small projects or startups. Once you go into large organizations, where you have an ever changing glossary, and product wants to be in control of the texts in the application, doing it all in prompts and dev changes completely breaks translation workflows.

                                    • lukol 2 days ago

                                      How do you deal with specific wording that needs to be used in certain languages (often required for legal topics) or specific brand-related messages that need to be in place without any modifications? Does a developer still have the ability to manually translate certain strings?

                                      • maxpr 2 days ago

                                        Hey lukol, that's an exciting problem to solve.

                                        The best solution right now is prompt engineering: turns out, AI can be tuned to provide top quality results with the correct system prompt/few shot setup, and custom prompts can be provided in the compiler config.

                                        Longer term, I want this to never be an issue, and I feel we'll get there together with the help from the open source community!

                                        • grncdr 2 days ago

                                          One simple idea* would be an option to exclude certain React elements from auto translation. That would allow users to handle these specific cases “manually” while still reaping the benefits of the automated system the other 99% of the time.

                                          * worth exactly what you paid for it ;)

                                          • MangoToupe 2 days ago

                                            ...or having an "override" file that allows manual specification regardless about what the LLM spits out.

                                            • maxpr 2 days ago

                                              we've added support for both these cases actually! :)

                                              1. `data-lingo-skip` - excludes a jsx node from i18n 2. `data-lingo-override-<locale code>` - overrides version in <locale code> language with a custom value 3. also `data-lingo-context`

                                              (docs, perhaps, aren't yet the best, but here they are: https://lingo.dev/compiler/configuration/advanced)

                                              • grncdr 2 days ago

                                                Ah that’s great. I simply hadn’t read the docs that far

                                      • jjani 2 days ago

                                        Hi, early user here :) Liking the product so far. We just started using the CLI parts for our RN app, with your cloud translations.

                                        Few things to put on your roadmap if they aren't on it yet:

                                        - Would like it if we could set the model per language. I'm sure you do your best trying to find the best one for each language, but in our experience some of them aren't optimal yet.

                                        - Multiple source languages would be cool. Example: It can make sense to have JA as source for KO but EN as source for FR. Or probably better, sending both (e.g. EN + JA when doing KO).

                                        - MCP doesn't seem to be working (we posted a log on the discord)

                                        - We seem to have spotted cases where key names were taken into account a little too much when translating, but understand this is super hard to tune and can be fixed by improving our keys.

                                        • maxpr 2 days ago

                                          Sure Jjani, perhaps our docs could be slightly better! :)

                                          Typically quality changes significantly with the right setup of translation fine-tuning settings, so send me a DM with your current setup and we'll help you out in a couple of minutes.

                                          Alternatively, Lingo.dev CLI is open source and you can give it a try with your own API key/model, and if your preferred provider ID isn't yet supported - pull requests are welcome, let's add it! (adding new providers is pretty simple).

                                          Checking your MCP scenario right now, but meanwhile regarding the keys: they're indeed important and are great ways to give the LLMs another tip regarding the meaning of the label and its intent.

                                        • stared 2 days ago

                                          > essentially rewriting your entire codebase before you can even start translating

                                          I’d say it just takes a few prompts in Cursor or a similar tool.

                                          Then, you simply ask it to translate into other languages. Here’s how I did it for one of my projects - a quantum optics simulator: https://p.migdal.pl/blog/2025/04/vibe-translating-quantum-fl...

                                          Doing it at runtime might make sense for a typical translation. But for scientific (or engineering) content, we often want to verify the output. Translating in production can be wonderful, hilarious, or just inconsistent.

                                          • maxpr 2 days ago

                                            Yep, Cursor indeed helps!

                                            (Here's a battle tested prompt example we found working pretty nicely with claude o3 + claude 3.7: https://lingo.dev/cli/extract-keys)

                                            > Then, you simply ask it to translate into other languages.

                                            Yep! With Lingo.dev Compiler though, we were scratching our own itch, and particularly it was maintenance of the localized code. Turned out, extracting is fine, but then further down the road we found ourselves digging through the code and jumping back and forth between the code and i18n files.

                                            I think it won't be a problem anymore after "Just In Time software" becomes a thing, and vibe coding tools seem to be getting us closer to that point.

                                            Great example!

                                            • stared a day ago

                                              It is nice to hear that!

                                              Thank you for making it easier to localize content. Wishing you well on the path.

                                          • D_R_Farrell a day ago

                                            This is an ultra cool product - stoked someone is building this. We've internationalized our product using ChatGPT O-1 in the JSON files and it's been a real pain to try to decode the strings and otherwise for the text that is in the code.

                                            Definitely going to ask my cofounder to look into implementing this.

                                            - Are there other use cases for this tech besides language translation? Wondering if there are other eng problems that something like this could solve.

                                            - Where are you guys thinking of taking this next? Seems like you're able to handle a lot of languages, so are you just continuing to refine the product or what's the plan for the next couple months and years?

                                            Thanks for tackling this!

                                            • monssoen 2 days ago

                                              I just remembered I built a tool to list all hardcoded strings from a React project in a similar way by parsing the AST.

                                              https://github.com/benmerckx/find-jsx-strings

                                              • maxpr 2 days ago

                                                Exactly, this is a great direction!

                                                We believe automatic discovery + i18n processing is the most natural next step for i18n on the web, since LLMs now exist.

                                                And we feel that not only will industry standard i18n libraries like i18next or react-intl adopt it soon, but frameworks like next.js or remix.js themselves will make it one of their core features.

                                                We originally built Lingo.dev Compiler scratching our own itch, but we're really excited to see how the industry will evolve from here!

                                              • darepublic 2 days ago

                                                This is great. I have worked at many companies and dealt with i18n many times. It was often half baked and frustrating. Not to mention the annoyance of needing to introduce localization to a project that had none before. I often pondered creating a util to improve this space, but looks like you've done the work already. And using LLMs, injecting into the builds, I think this is a great choice for a problem that can be made more efficient by LLMs in their current state.

                                                • maxpr 2 days ago

                                                  Genuinely excited to read comments like yours. We started the project scratching our own itch, and are touched it resonated!

                                                  It will remain 100% free and open-source. We're already dogfooding it on our website and app, so if you'd like to join and contribute at some point, we'd be very happy!

                                                • splix 2 days ago

                                                  I'm trying to understand if it works with an Electron (or Tauri) app on desktop? Cannot find any mention on the website. And how it works with apps that are not bases on React Router or anything similar, so it cannot learn all the possible screens.

                                                  • maxpr 2 days ago

                                                    With the community support we hope to support more platforms soon vs now, but I can confidently say that adding support for techs stacks using one of the following:

                                                    Vite Rollup webpack esbuild Rspack Rolldown Farm

                                                    should be reasonably straightforward, and we expect pull requests adding other setups soon.

                                                    That's a great question!

                                                  • maelito 2 days ago

                                                    This is exactly what I was looking for to translate cartes.app, an open-source alternative to Google maps. Thank you, I'll try.

                                                    • maxpr 2 days ago

                                                      woah, i like this domain name! I bet it cost you a fortune :)

                                                      • maelito 4 hours ago

                                                        No, almost nothing. Map.app is 9 million € though.

                                                    • jackconsidine 2 days ago

                                                      > We wanted to find a way to deterministically group elements that should be translated together, so, for example, a phrase wrapped in the `<a>` link tag wouldn't get mistranslated because it was processed in isolation. We also wanted to detect inline function calls and handle them gracefully during compile-time code generation.

                                                      Very cool

                                                      • maxpr 2 days ago

                                                        Thanks! To make it work predictably, we actually tested quite a few different algorithms before landing on one that produces outputs LLMs can reliably understand.

                                                        Conceptually, we're relying on common sense assumptions about how developers structure JSX. We assume you write reasonably semantic markup where the visual hierarchy matches the code structure - no CSS tricks that make the UI render completely different from what the JSX suggests.

                                                        This let us create translation boundaries that make intuitive sense to both developers and AI models.

                                                      • techn00 2 days ago

                                                        Does the compiler work with react native (expo)?

                                                        • jerrygoyal 2 days ago

                                                          does it work with Next.js page router apps?

                                                          • cyberax 2 days ago

                                                            Many moons ago, a small project called GNU Gettext did this. It worked by annotating the translatable string literals like this _("SomeLine"), then a separate compiler extracted all the strings into a `.po` file.

                                                            And then you just translated them. The English text, essentially, becomes your string ID.

                                                            It worked super well, with very low friction for programmers. You didn't have to invent an identifier when writing code, and then switch to a horrible JSON with indefinitely long lists of strings.

                                                            But somehow, the world chose the horribly bad "string IDs" method of translation. Sigh.

                                                            • maxpr 2 days ago

                                                              Good point. There are JavaScript tools that do that for js devs, but since oftentimes you end up having <a>links</a> and <b>nested <i>elements</i></b> in the code, wrapping becomes problematic and hard to maintain at scale.

                                                              I think there's a chance compile-time, AST/CST solutions might be the ultimate, O(1) i18n approach that doesn't distract. Ideally it should come out of the box with the framework, but perhaps this future is a little bit too far away just yet.

                                                              • cyberax 2 days ago

                                                                For HTML, it probably needs to be extended to the HTML fragments themselves. And with React, it's pretty easy to actually extract the text fragments in {} segments.

                                                            • pimlottc 2 days ago

                                                              “Translate” has a lot of meanings in CS, I thought this was going to be able porting to a different framework or something. “Localize” would be clearer

                                                              • maxpr 2 days ago

                                                                Yep, localization is the best term here, but sometimes folks confuse it with other things, for example geo localization or localization of errors.

                                                                So we usually try to use both terms at the same time, often interchangeably, though translation is ultimately a subset of localization.

                                                                • dang 2 days ago

                                                                  Ah yes good point - I think that was my fault. Localized now :)

                                                                  • devmor 2 days ago

                                                                    I would rethink this decision. Localization requires understanding intent and adjusting terminology to apply non-literal interpretations to convey meaning; this project uses an auto-translator.

                                                                    • maxpr 2 days ago

                                                                      Interesting perspective.

                                                                      Unsure if I communicated it well, but unlike auto-translators such as Google Translate, this project leverages a context-aware LLM to recreate the meaning and intent of the original text in another language.

                                                                      • devmor 2 days ago

                                                                        I have never seen an LLM with that function - if you have created one, you would likely be a shoe-in for a nobel prize for breaking down language barriers across humankind.

                                                                        I think you are just misrepresenting the capabilities of an auto-translator LLM.

                                                                • runako 2 days ago

                                                                  > ``` function WelcomeMessage() { return ( <div> Welcome to <i>our platform</i>! <a href="/start">Get started</a> today. </div> ); } ```

                                                                  Not the point here, but is there any move yet in React to separating the presentation from the logic[1]?

                                                                  1 - https://martinfowler.com/eaaDev/SeparatedPresentation.html

                                                                  • jfengel 2 days ago

                                                                    I wouldn't expect there to be any.

                                                                    Sometimes your presentation varies depending on the data, in ways that are ultimately Turing-complete. Any domain-specific-language is eventually going to grow to incorporate some kind of logic.

                                                                    React seems to have found a sweet spot for that with JSX, which presents as if it's mostly HTML with some Javascript mixed in. (In reality, it's actually Javascript with HTML-esque syntactic sugar, but it works very hard to present the illusion.) That means that it's working in two well-understood and widely-supported languages, rather than creating yet another presentation language.

                                                                    HTML+CSS has deep flaws as a presentation language, but it's also universal. I don't expect React to reinvent that particular wheel.

                                                                    • maxpr 2 days ago

                                                                      I believe tRPC could be a great solution to separate logic, generally speaking. However it also depends on what type of logic - some logic, like state/behaviour of the sidebar/modals will always remain client side.