docker model bench

DescriptionBenchmark a model's performance at different concurrency levels
Usagedocker model bench [MODEL]

Description

Benchmark a model's performance showing tokens per second at different concurrency levels.

This command runs a series of benchmarks with 1, 2, 4, and 8 concurrent requests by default, measuring the tokens per second (TPS) that the model can generate.

Options

OptionDefaultDescription
--concurrency[1,2,4,8]Concurrency levels to test
--duration30sDuration to run each concurrency test
--jsonOutput results in JSON format
--promptWrite a comprehensive 100 word summary on whales and their impact on society.Prompt to use for benchmarking
--timeout5m0sTimeout for each individual request