How are software and SaaS teams using AI?
SaaS teams are putting AI inside the product loop, not just the paperwork around it: generating unit tests, suggesting refactors, watching CloudWatch logs for unusual errors, summarizing recurring Zendesk themes, and forecasting which Salesforce deals will close. The pattern is that AI drafts and surfaces; engineers and leaders still decide what merges, ships, and gets committed to the board.
Most industries use AI for the paperwork around the real work. Software and SaaS teams are doing something different: they’re putting AI inside the product itself. Look at what teams actually reach for and a clear shape appears. Three of the most common jobs sit inside the engineering loop (generating unit tests for new functions, suggesting how to break apart a sprawling React component, watching CloudWatch logs for the API gateway error that’s about to become an outage). The other two sit on the business edges of a SaaS company: pulling recurring themes out of a week of Zendesk tickets, and reading a Salesforce pipeline to predict which deals close this month. That spread is the honest answer to “how are SaaS teams using AI”: it’s in building, running, supporting, and selling the product, not just the admin.
It’s hard work because the context is deep and spread out. A test isn’t useful unless it understands the function’s real edge cases. A refactor suggestion is noise unless it reads the surrounding code, not just the one file. A log anomaly is only an anomaly relative to what normal traffic looks like for your service. The judgment depends on context the AI has to actually see.
What actually decides the outcome
A few judgment calls separate a real time-saver from generated noise you have to clean up.
- Behavior vs. implementation. A generated unit test that asserts what the code does today (rather than what it should do) locks in bugs and breaks on the next refactor. The deciding skill is choosing the edge cases that matter and asserting behavior, not internal detail.
- Draft vs. committed. AI suggesting a refactor or writing a test is safe. AI merging that code, auto-remediating an alert, or sending a number to your board is not. Almost all the risk lives on the line between an opened pull request and a merge, between a draft test and a commit.
- Grounding in the real source. A pipeline forecast or a support-theme summary is only worth reading if it came from your actual Salesforce records and your actual Zendesk queue, not a plausible-sounding guess. The same goes for a log anomaly: it has to be measured against your real traffic baseline.
- Signal over volume. CloudWatch produces an ocean of log lines. The whole point is distinguishing the one critical failure from routine noise. An AI watcher that flags everything is as useless as one that flags nothing.
How to do it by hand
Take generating unit tests for a new Python function. By hand, you read the function, work out its contract, then enumerate cases: the happy path, empty and null inputs, boundary values, the weird type someone will eventually pass in. You write each test, mock the dependencies, run the suite, and check that a test failing actually means the behavior is wrong. For a non-trivial function that’s twenty minutes, and the boring cases are exactly the ones tired engineers skip.
Or take summarizing recurring support themes. By hand, you export the week’s Zendesk tickets, skim a few hundred subjects and bodies, mentally bucket them (“password reset,” “billing confusion on annual plans,” “export feature broken”), tally the buckets, and write up the top five with example ticket IDs so product can act on them. It’s a couple of hours, it’s tedious, and the long tail gets missed because attention fades around ticket 150.
Both are legitimate, free to do yourself, and need no special software. They just need time you’d rather spend shipping.
Where it goes wrong
The failures are predictable. Generated tests pass against buggy behavior and give false confidence. A refactor looks clean in isolation but breaks a caller two files away because the model only read the one component. CloudWatch alerts get tuned so loosely the real outage hides in the noise, or so tightly nobody is watching the gateway when it fails at 2am. Support themes get summarized from a small sample, so the issue quietly driving churn never surfaces. And the pipeline forecast gets treated as fact instead of a draft, so a number nobody pressure-tested goes into the board deck. Each one is a small lapse with an outsized bill: a production incident, a missed quarter, a churned account.
Doing it yourself vs. handing it to Physea
Doing it yourself, you keep full control and pay in hours. Using a single AI chat tool, you paste in one function and get one test file back, then do the connecting work yourself: opening the PR, pulling the Zendesk export, copying the forecast into your CRM. The chat does one step; you’re still the glue between GitHub, Zendesk, Salesforce, and your logs.
Physea’s Liminality runs the whole route end to end across the tools you already use. You connect GitHub, Zendesk, Salesforce, AWS, or your data store over MCP, and the system carries the job from start to finished result, grounded in your actual repo, tickets, and pipeline, and reusing routes that have worked before so it isn’t re-improvising each run. You get the draft PR with tests, the ranked theme report with ticket IDs, the forecast with its reasoning, not a half-step you then have to finish. The irreversible actions (merging code, remediating production, committing a forecast) stay behind your approval. The chore becomes a result you review.
For the underlying tasks, see data analysis and reporting (which covers the Salesforce pipeline and Zendesk theme work), customer and support emails, payment reminders and failed-card recovery, and marketing content.
Common questions
- What should a SaaS team automate with AI first?
- Start where the work is high-volume, repetitive, and easy to review before it does any damage. For most SaaS teams that's three things: drafting unit tests for new functions (you read them before they merge), summarizing recurring support themes from your ticket queue, and getting a first-pass read on which pipeline deals look likely to close. All three save real hours and all three have a clear right answer you can check. Avoid starting with anything that auto-merges, auto-remediates production, or auto-emails customers without a human in the loop. Physea can run any of these end to end across your connected tools once you've decided what good looks like.
- Can AI write unit tests I can actually trust?
- It writes a good first draft, but trust depends on what the tests assert. AI is strong at producing the scaffolding, the happy path, and a spread of obvious edge cases (empty input, nulls, boundaries). The trap is that it tends to assert against current behavior, so if the function has a bug, the generated test locks the bug in. It also over-tests implementation details that break on the next refactor. Read every generated test and ask: is this asserting the behavior I want, or just what the code happens to do today? Used that way it's a genuine time-saver. Physea can generate the tests against your repo and open them as a draft PR you review, never a direct commit.
- How do SaaS teams use AI on support tickets and pipeline data without leaking customer data?
- Keep the data inside the tools you already trust (Zendesk, Salesforce, your logs) and use AI that connects through your own authenticated access instead of pasting exports into a public chat window. Check whether your AI provider trains on your inputs and turn that off. For anything covered by a customer contract or a SOC 2 commitment, confirm where the data is processed and get it in writing. Physea runs over MCP against your own connected accounts, so the analysis happens against your data without it being copied into a chat box or used to train anything.