𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗳𝗼𝗿 𝗦𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗗𝗮𝘁𝗮

📅3 hours ago⏱2 min read

Using LLMs to create synthetic data is a popular strategy for QA teams. You can generate hundreds of complex records in seconds.

But generic prompts lead to a trap. If you ask an LLM to "generate 50 test users," it gives you predictable, repetitive data. This creates a false sense of coverage. You get many records that only test the "happy path" while missing critical edge cases and business logic.

To fix this, you must move from being a requester to an orchestrator. You need to apply testing principles directly to your prompt engineering.

Use these three patterns to improve your data quality:

Equivalence Partitioning and Boundary Value Analysis Instead of asking for data, force the LLM to map out test classes first. Use Chain-of-Thought prompting.

Define your role as a Senior QA Engineer.
Provide specific business rules (e.g., coupon limits or minimum spend).
Instruct the LLM to list all valid and invalid equivalence classes in a table.
Demand exactly one JSON payload per identified scenario.

This ensures you test exact transition points, like $99.99 vs $100.00, without wasting space on redundant records.

State Transition Testing For systems like payment flows or order management, data must reflect different stages of a lifecycle.

Provide a list of all possible states (e.g., Created, Paid, Shipped, Delivered).
Ask the LLM to generate a CSV covering a State Transition Matrix.
Demand three types of flows: Linear (valid), Exception (deviations), and Violation (invalid transitions).
Set a rule to generate only one row per unique state combination.

This prevents duplicate records and forces the creation of negative test cases.

Variance Control and Negative Prompting LLMs often produce homogeneous data, such as using the same regions or age groups. Use Negative Prompting to stop this.

Set explicit requirements for distribution (e.g., specific age ranges or geographical regions).
Add a "PROHIBITIONS" section.
Explicitly forbid generic names like "John Doe."
Forbid repeating the same combinations of variables.
Forbid sequential or identical ID numbers.

This eliminates bias and ensures your backend handles diverse, realistic data.

La velocità dell'IA offre valore solo se i tuoi dati sono intenzionali. Il tuo ruolo di professionista QA è codificare i vincoli che governano questi modelli generativi.

Fonte: https://dev.to/lopesdoamaral/engenharia-de-prompts-para-massa-de-dados-escalando-testes-com-cobertura-e-sem-duplicidade-oba

Community di apprendimento opzionale: https://t.me/GyaanSetuAi

𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗳𝗼𝗿 𝗦𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗗𝗮𝘁𝗮

Continue reading

𝗬𝗢𝗨𝗥 𝗔𝗚𝗘𝗡𝗧 𝗙𝗔𝗜𝗟𝗘𝗗 𝗜𝗡 𝗣𝗥𝗢𝗗. 𝗚𝗢𝗢𝗗 𝗟𝗨𝗖𝗞 𝗥𝗘𝗣𝗥𝗢𝗗𝗨𝗖𝗜𝗡𝗚 𝗜𝗧.

𝗧𝗵𝗶𝘀 𝗜𝘀 𝗔 𝗚𝗨𝗜𝗗𝗘 𝗧𝗢 𝗖𝗛𝗔𝗧𝗚𝗣𝗧 𝗣𝗥𝗢𝗠𝗣𝗧 𝗘𝗡𝗚𝗜𝗡𝗘𝗘𝗥𝗜𝗡𝗚

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗗𝗼𝗺𝗮𝗶𝗻 𝗦𝗽𝗲𝗰𝗶𝗳𝗶𝗰 𝗟𝗟𝗠 𝗘𝘃𝗮𝗹 𝗦𝗲𝘁𝘀

𝗜𝗻𝘁𝗿𝗼 𝘁𝗼 𝗚𝗲𝗻 𝗔𝗜 𝗳𝗼𝗿 𝗣𝘆𝘁𝗵𝗼𝗻 𝗕𝗲𝗴𝗶𝗻𝗻𝗲𝗿𝘀

𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗳𝗼𝗿 𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀