fix(shared): accept tool role and multimodal content in chat schemas

The v1 chat completion proxy rejected requests with `role: "tool"` or
array-typed `content` (multimodal image/video payloads) because the
shared zod schemas were too restrictive:

- `ChatRoleSchema` was `z.enum(['system','user','assistant'])` — now
  `z.string()` so any role the backend supports passes through. The
  router is a proxy and has no reason to constrain which roles are
  valid; the upstream provider decides that.

- `ChatMessageSchema.content` was `z.string()` — now
  `z.union([z.string(), z.array(z.any()), z.null()]).optional()` to
  accept the three shapes the OpenAI spec defines: plain text, an
  array of content-part objects (images, video frames, etc.), or null
  (e.g. assistant messages that only carry tool_calls). `.passthrough()`
  on the message object ensures extra fields like `tool_call_id`,
  `name`, `tool_calls`, etc. are forwarded untouched.

- `ChatCompletionChoiceSchema.finish_reason` was `z.string()` — now
  `z.string().nullable().optional()` since some providers return null
  for streaming chunks or incomplete generations.

Fixes #2, Fixes #3

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
JellyBrick 2026-04-11 18:28:35 +09:00
commit db58054fdb

View file

@ -220,13 +220,32 @@ export type CreateAdminTokenInput = z.infer<typeof CreateAdminTokenInputSchema>;
* OpenAI v1
* */
export const ChatRoleSchema = z.enum(['system', 'user', 'assistant']);
/**
* The router is a proxy it must not reject roles or content shapes that
* a backend legitimately supports. The OpenAI spec defines `system`,
* `user`, `assistant`, `tool`, and `function`; other providers may add
* more. Accept any string so messages pass through unaltered.
*/
export const ChatRoleSchema = z.string();
export type ChatRole = z.infer<typeof ChatRoleSchema>;
export const ChatMessageSchema = z.object({
role: ChatRoleSchema,
content: z.string(),
});
/**
* `content` may be:
* - a plain string (most common)
* - `null` (e.g. assistant messages that only carry tool_calls)
* - an array of content-part objects (multimodal: images, video, etc.)
*
* We validate the structural envelope but leave the inner content
* unconstrained so the backend decides what's valid.
*/
export const ChatMessageSchema = z
.object({
role: ChatRoleSchema,
content: z
.union([z.string(), z.array(z.record(z.unknown())), z.null()])
.optional(),
})
.passthrough();
export type ChatMessage = z.infer<typeof ChatMessageSchema>;
export const ChatCompletionRequestSchema = z
@ -250,11 +269,13 @@ export const ChatCompletionUsageSchema = z.object({
});
export type ChatCompletionUsage = z.infer<typeof ChatCompletionUsageSchema>;
export const ChatCompletionChoiceSchema = z.object({
index: z.number().int(),
message: ChatMessageSchema,
finish_reason: z.string(),
});
export const ChatCompletionChoiceSchema = z
.object({
index: z.number().int(),
message: ChatMessageSchema,
finish_reason: z.string().nullable().optional(),
})
.passthrough();
export type ChatCompletionChoice = z.infer<typeof ChatCompletionChoiceSchema>;
export const ChatCompletionResponseSchema = z