Why Structured Data Is Hard for LLMs
LLMs are built to predict the next word in a sequence of text. They excel at language, reasoning, and summarization. But structured data (like numbers, tables, and spreadsheets) is fundamentally different:
- Tokenization Bias
Numbers are split into text tokens, not treated as real values.- Example: The number 12,345 might be broken into tokens like “12” and “,345.”
- The model doesn’t “know” this is twelve-thousand-three-hundred-forty-five.
- Lack of Mathematical Precision
LLMs do not calculate – they approximate.- They generate answers based on patterns in their training data.
- This is why they sometimes produce “hallucinated” or flat-out wrong sums, averages, or percentages.
- No Native Understanding of Schema
An Excel sheet or database has rows, columns, and relationships.- LLMs don’t inherently understand schema, keys, or data constraints.
- Without explicit instructions, they treat tables as chunks of text, not as structured entities.
How LLMs “Fake It” with Structured Data
When GPT or Gemini appear to handle structured data, they’re often relying on clever workarounds:
- Pattern Matching Instead of Math
If asked to sum 2 + 2, the model predicts “4” because it has seen this pattern thousands of times in its training. For less common calculations, accuracy drops dramatically. - Internal Tools & Plugins
Advanced setups route the question to an actual calculator, database, or Python interpreter. The LLM acts as a “front end” that interprets natural language, sends a query to the right tool, and then reformulates the result in text. - Simulated Table Reasoning
For tasks like “read a CSV,” the model uses heuristics: it scans the text layout, guesses the relationships, and then outputs answers in natural language. This works for simple tasks but fails at scale.
Why This Matters for Enterprises
When clients want to start with reporting automation, we explain:
- Reporting involves precision, schema logic, and regulatory reliability.
- An LLM alone cannot guarantee correctness.
- Errors in compliance or finance reports can be costly.
That’s why starting with structured data automation is usually the wrong entry point.
Where to Start Instead: Unstructured Data
Unstructured data – documents, emails, manuals, contracts – is where LLMs shine:
- Summarization
- Contextual search (RAG: Retrieval-Augmented Generation)
- Q&A over large knowledge bases
- Drafting reports from messy inputs
Here, LLMs provide real value in days, not months – without requiring perfect numerical precision.
Hainzelman’s Approach
At Hainzelman, we recommend:
- Start with unstructured data workflows (Knowledge Explorer, Expert Companion).
- Introduce connectors for structured data sources, but keep precision-critical tasks in traditional systems.
- Combine strengths: let LLMs interpret natural language, while reliable engines handle the math and schema operations.
This hybrid approach ensures trustworthy results, measurable ROI, and compliance – without overloading LLMs with tasks they’re not built to handle.