Booking.com sister company says its AI travel agent is “fully agentic.” Our test says: not so fast.

Priceline added Claude to Penny and calls the new version more agentic. We tested it across hotels, flights and rental cars. The result was useful, but not yet a full AI booking agent.

Priceline’s latest update to Penny is worth paying attention to because Priceline is not a small AI travel startup trying to win attention with a demo. It is part of Booking Holdings, the same group that owns Booking.com, Agoda, KAYAK, OpenTable and other travel brands. Booking Holdings reported $186.1 billion in gross travel bookings in 2025, so when one of its consumer brands changes how people search and compare travel, it is a useful signal for where the industry may be heading.

Priceline itself is the group’s more deal-driven online travel agency. It describes its mission as being the “world’s best travel dealmaker,” with hotels, flights, rental cars, cruises, vacation packages and its own Priceline Deals Engine across more than 100 countries. So Penny is not just another chatbot placed next to a booking site. It sits on top of a real travel marketplace, with inventory, prices, customer behavior, deal logic and checkout paths already in place.

That is what makes the announcement interesting. Priceline says the new version of Penny brings Anthropic’s Claude into its proprietary AI stack, with Google Cloud and OpenAI also supporting search and voice capabilities. The company describes Penny as a system that can understand complex trip requests, evaluate real-time pricing and availability, surface trade-offs, use an interactive map, learn from preferences, and recommend options through features such as Penny’s Pick and Penny’s Take. PhocusWire also reported that Penny is now described as a coordinated system of more than 10 specialized agents across hotels, flights, rental cars and customer service.

The easy version of this story would be: Priceline launched an AI travel agent.

But after testing it, that feels too generous — and also not specific enough.

Penny is better understood as an AI-powered travel search and decision layer. It can make the messy middle of travel planning feel more structured. It can compare options, challenge cheap-but-bad choices, and expose hidden friction. But when it comes time to select an option, the journey still moves into a regular Priceline web checkout page. The AI does not remain the full transaction interface. It helps you decide; it does not yet fully own the booking experience from conversation to completed purchase.

That distinction matters.

We tested Penny where travel search usually breaks

A clean travel prompt is not very useful for testing an AI assistant. “Find me a hotel in Miami” or “show me flights to Paris” mostly proves that the interface can run a search. The more interesting test is whether the assistant can handle the messy questions travelers actually ask before they feel safe paying.

So we pushed Penny across three verticals: hotels, flights and rental cars. The prompts were deliberately annoying. Not impossible, but realistic.

For hotels, we asked for a Miami Beach stay for two adults, an 11-year-old child and a small dog. The catch was pet policy. We did not want a generic “pet-friendly” filter. We asked for size limits, pet fees, cleaning charges, deposits, restricted room types, pool or restaurant access, and the safest property to book if the goal was to avoid a check-in surprise.

For flights, we asked for a flexible New York to Europe trip in September, open to Lisbon, Madrid, Rome or Paris. The catch was value. We did not want the cheapest fare by default. We wanted a recommendation that considered total trip cost, timing, nonstop options and painful layovers. Then we pushed harder: no basic economy, no hidden baggage fees, no overnight layovers, no airport changes.

For rental cars, we asked for a four-day Los Angeles booking with pickup at LAX, return in Santa Monica if possible, and arrival at 11:30pm. The catch was counter-level risk: opening hours, one-way drop fees, debit-card acceptance, insurance, fuel policy and deposit.

These are not exotic cases. They are the kinds of questions that make travelers hesitate before checkout.

And that is where Penny became interesting.

The hotel test: “pet-friendly” is not enough

The hotel test was the strongest result.

A normal hotel search page can show a pet-friendly filter. That filter is useful, but shallow. A hotel may allow pets and still charge a high fee, restrict certain room types, ban pets from public areas, apply a deposit, or create an awkward check-in moment because the policy was never clearly understood.

Penny handled this better than a basic search interface. It did not only list hotels. It broke the pet question into operational details: whether dogs were allowed, whether there was a weight limit, whether there was a fee, whether there was a cleaning charge or deposit, whether room types were restricted, and whether the dog could access public areas. In our pasted test result, Penny compared Kimpton Surfcomber, Loews Miami Beach and Nobu Hotel Miami Beach, then recommended Kimpton Surfcomber as the lowest-risk option because the policy appeared clearer, lower-cost and more transparent.

This was the first important signal: Penny did not simply pick the nicest hotel. It picked the option with the lowest risk of policy mismatch. It explained that Kimpton’s policy looked safer because it showed no weight limit, no pet fee, no confirmed cleaning fee, and no confirmed deposit, while the other two hotels had conflicting or incomplete information around fees, deposits and restrictions.

That is a real consumer benefit. Travel search is not only about ranking options. It is about reducing uncertainty before payment.

But there was also a clear limitation. Penny still had to rely on a mix of official and third-party sources. For some hotels, it openly flagged conflicting information. That is good behavior — it did not pretend everything was verified — but it also shows why travel AI is hard. The assistant can organize the decision, but it cannot create perfect supplier data when the underlying policy information is fragmented.

This is probably where AI assistants will matter most in hotel booking: not in giving inspiration, but in translating vague labels into practical risk. “Pet-friendly” becomes “your dog is likely accepted, the fee appears to be zero, the pool deck is restricted, and these three things should still be confirmed before arrival.”

That is much closer to useful travel advice than a filter.

The flight test: cheap is not always good value

The flight test showed a different strength.

Penny compared Lisbon, Madrid, Rome and Paris in one view instead of forcing a destination-first search. That is useful because many flexible travelers do not start with a fixed city. They start with a shape of trip: Europe, about a week, reasonable fare, tolerable schedule, no miserable connection.

Penny initially picked Madrid as the best value. It did not choose purely on the lowest fare. It looked at schedule quality, nonstop options, layover pain and price. That already felt more like travel-advisor behavior than a normal fare list.

The more revealing moment came when we stress-tested the recommendation. We told Penny to assume the traveler dislikes basic economy, hidden baggage fees, overnight layovers and airport changes. This changed the answer. Penny started separating fare price from fare quality. It flagged baggage uncertainty, fare-family risk, airport convenience and the difference between a cheap ticket and a clean ticket.

That is the right direction. In air travel, the lowest fare can be a trap. A fare that looks attractive can lose its advantage once bags, seats, timing and flexibility are considered. Penny’s useful role was not simply finding the cheapest flight. It was explaining when “cheap” becomes expensive.

But the limitation was also obvious. Penny admitted it did not have complete live fare-rule and baggage data for every ticket. That honesty is important, but it means the assistant was still operating partly as an advisor, not a final authority. For a traveler, the answer is useful for narrowing the choice. Before payment, the fare rules still need to be checked at checkout.

That gap is commercially important. The next generation of AI travel interfaces will not be judged only by how well they summarize options. They will be judged by whether they can read the actual commercial terms attached to the product: fare family, baggage inclusion, cancellation rules, seat selection, connection risk and total payable cost.

Penny moved in that direction. It did not fully close the loop.

The rental-car test: the hardest problem is branch-level truth

Rental cars were the hardest test.

That is not surprising. Car rental is full of small operational details that can ruin the booking experience: branch hours, shuttle instructions, one-way drop rules, debit-card restrictions, deposits, local insurance rules, fuel policy and counter staffing. A car can look cheap online and become a problem at pickup.

Penny understood the risk. It compared Hertz, Alamo and Fox, then tried to evaluate total estimated cost, one-way drop-off uncertainty, counter hours, debit-card policy, insurance, fuel policy and deposit. The useful part was not the car list. The useful part was that Penny made the hidden questions visible.

But this was also where the limits became clearest. Several important details came back as “not verified.” Could the car definitely be returned in Santa Monica? Not fully confirmed. Would the counter be staffed at 11:30pm? Not fully confirmed. Would a debit card be accepted under the traveler’s conditions? Not fully confirmed. What exact deposit would be held? Not fully confirmed.

That is not a failure of conversation. It is a data problem.

A large language model can reason through the request, but rental-car confidence depends on branch-level supplier rules. Without live, structured, supplier-confirmed data, the assistant can warn the traveler but cannot fully remove the risk.

This distinction is important for travel companies. The AI layer may make the experience feel intelligent, but the quality ceiling is set by the data and systems underneath. If the supplier rules are incomplete, inconsistent or not connected at the right level of detail, the assistant becomes a polished risk detector rather than a trusted booking agent.

The checkout moment matters

The most revealing part of the test came after the recommendation.

When we selected an option from the search results, Penny did not continue as a fully conversational checkout agent. The journey moved to the regular Priceline web checkout page. That makes the current product feel less like an end-to-end AI travel agent and more like an AI search, comparison and recommendation layer attached to a conventional online travel agency flow.

That is not a minor detail.

The industry often talks about AI agents as if the traveler will describe a trip, approve a plan and let the system complete the booking in one continuous flow. Penny is not quite that. It helps shape the decision, but the final purchase still happens in the familiar web commerce environment.

In the short term, that may be the right design. Travel bookings are high-trust transactions. Users need to see the price, terms, cancellation rules, room type, passenger details, payment page and confirmation flow. Moving too quickly into invisible agentic booking could create more anxiety, not less.

But strategically, it shows where the next battle will be. The first phase is AI-assisted search. The second phase is AI-assisted decision-making. The harder phase is AI-controlled transaction orchestration: the assistant not only recommends, but verifies terms, applies preferences, fills details, checks loyalty logic, handles payment consent and stays present through post-booking service.

Penny is moving toward that world, but our test suggests it is not fully there yet.

What comes from Claude, and what comes from Priceline?

For non-technical readers, this distinction is worth making simple.

Claude is part of the reasoning layer. It helps Penny understand messy language, keep track of preferences, compare trade-offs and respond conversationally. That matters when the traveler does not speak in filters. A traveler says: “I want something easy, not too expensive, with no nasty surprises.” The model helps translate that into a search and comparison process.

Priceline provides the travel layer. That means inventory, prices, deals, maps, booking history, customer behavior, search results, product cards, checkout and service context. Priceline’s own announcement says Penny combines preference learning, memory, deals technology, leading AI models and real-time travel inventory.

This is the main lesson for travel businesses watching from the outside. The model alone is not the product. The model becomes useful when it is connected to travel-specific data, commercial logic and fulfillment infrastructure.

A general chatbot can suggest “nice hotels in Miami.” A travel assistant connected to inventory can show bookable hotels. A stronger assistant can explain why one property is safer for a traveler with a dog, why one flight is better after baggage is included, or why one rental-car option carries counter risk.

That is the difference between AI content and AI commerce.

Why this matters commercially

Priceline’s CTO Sejal Amin told PhocusWire that what looks like hesitation is often an unanswered question: whether a traveler can bring a pet, whether there is a cot, or what the cancellation policy is. That is a very accurate way to describe the commercial value of AI in booking.

Most travelers do not abandon because they dislike search results. They abandon because they are not confident enough to pay. Something is unclear. A fee is uncertain. A policy is buried. A room type may not fit. A flight looks cheap but suspicious. A rental car has counter conditions the traveler does not understand.

That is where Penny points to the real opportunity.

AI travel assistants are often discussed as inspiration tools: plan my weekend, recommend a beach, build an itinerary. Those use cases are visible, but they may not be the most commercially valuable. The bigger value may sit much closer to conversion: resolving doubts at the point where the traveler is almost ready to book.

In our tests, Penny was strongest when the prompt was specific, messy and full of trade-offs. It was useful because it did the work a cautious traveler would normally do across multiple tabs: compare, question, verify, challenge and narrow down.

The assistant was weakest where the underlying data was not fully verified. Pet policies depended on mixed sources. Fare rules were not always complete. Rental-car rules were often branch-specific and uncertain. These are not cosmetic issues. They are the hard edge of travel commerce.

The takeaway

Priceline’s Penny is not yet the fully autonomous AI travel agent that some people imagine. The checkout still falls back to a regular web flow, and several high-friction travel details still require confirmation outside the assistant.

But that does not make it unimportant.

It may actually make the product more interesting. Penny shows the practical middle step between traditional online travel search and fully agentic booking. It does not need to replace the entire booking journey to create value. It can create value by making search more conversational, comparison more intelligent, and booking decisions less anxious.

The tests showed a clear pattern. In hotels, Penny turned “pet-friendly” into a real policy comparison. In flights, it challenged the idea that the cheapest fare is the best fare. In rental cars, it exposed the operational details that usually remain hidden until the counter.

That is a useful direction for travel AI.

The real question is not whether AI can suggest a trip. It can. The harder question is whether AI can verify the small commercial details that make a traveler comfortable enough to book.

Priceline has built a serious step toward that future. The next step is deeper data certainty — supplier policies, fare rules, branch-level conditions and checkout continuity — so the assistant can move from “smart search companion” to trusted booking agent.

For now, Penny is not replacing the travel booking funnel. It is moving closer to the moment where decisions are made.

Booking.com sister company says its AI travel agent is “fully agentic.” Our test says: not so fast.

We tested Penny where travel search usually breaks

The hotel test: “pet-friendly” is not enough

The flight test: cheap is not always good value

The rental-car test: the hardest problem is branch-level truth

The checkout moment matters

What comes from Claude, and what comes from Priceline?

Why this matters commercially

The takeaway

Keep Reading

Subscribe for new reads…

Quick Links

Subscription

Socials