LLMs love to add unnecessary complexity

11/09/2025

TLDR: Don't trust LLMs blindly. They love creating unnecessary complexity (new paths, useEffect everywhere). Set temperature=0 for consistent parsing, normalize your date formats early, and always verify production implementations match your tests.

LLMs love to make new paths without considering existing paths.
1. Solution: Babysit the suggestions and recommendations. Do not blindly accept or allow changes until you’ve fully understood the next steps. Question their thought process and challenge them to simplify. Harder to undo later if the changes aren’t easy to restore (modifying files, deleting, updating auth paths, etc)
Tests & experiments can be implemented differently in production & produce different results (incorrect).
1. Make sure when integrating experiments that you instruct the LLMs to follow the exact same methods with the same files, permissions, models, and weights.
When using LLMs, you want to eliminate any noise going into parsing, validating, etc. So it can be cheaper, more efficient, and have higher accuracy.
LLMs love useEffect in react as 1st recommendations. Fight back by questioning the necessity of new effect state changes. Push for smaller, one-line code changes when possible.
Adding parsing logic without a learning system will result in more random generations from the same sample dataset.
1. The problem? No temperature control in AI call
  1. Uses default temperature setting which introduces randomness
  2. Each parse \= slightly different interpretations
  3. If cached \= empty → every parse hits AI fresh
2. Solution(s)?
  1. Deterministic Parsing → add temperature of 0 to AI call for consistency
  2. Few-shot learning with examples → build database of “gold standard” examples and inject them in system prompt, store user-corrected parsing results in a parsing_examples table, when parsing fetch 2-3 similar examples and add to prompt, AI learns from these examples without formal training (can’t train gemini)
  3. Feedback loop → create a system where users can mark parsing results as “good” or “bad”, users can correct parsed data, store corrections in a parsing_feedback table, use the best examples as few-shot prompts for future parses
  4. Prompt evolution → periodically analyze failed/corrected parses to refine the system prompt itself
Date display can be easy or complicated, its your choice. Make sure there is consistency across your experience for the display logic and what is accepted in your database schema. Do not assume they align just because you add support for it. Be specific about date formatting (display & inputs).
1. EX: Start/end date formats vary:
  1. YYYY
  2. YYYY-MM
  3. YYYY-MM-DD
  4. Present
2. Solution? Add single, robust data normalizer to standardize dates:
  1. YYYY-MM-DD (keep as is)
  2. YYYY-MM → append “-01”
  3. YYYY → append “-01-01”
  4. Present or empty → NULL
  5. Anything else → NULL

Back to Blog