-
siyes96787 posted an update in the group
What are the biggest hurdles in extracting structured data from non-Latin scripts like Arabic, Hindi 2 weeks ago Funny how these script-specific headaches just keep hanging around year after year. I remember seeing a presentation ages ago where someone showed how much extra compute gets burned only because the models have to guess character boundaries in real time instead of having clean static images. Nowadays you notice it even more when people casually point their phone at multilingual street signs — the difference in quality between Latin and everything else is still pretty obvious to the naked eye. Makes you wonder how long it’ll take before the gap finally closes.