一、环境搭建核心架构设计
在SpringAI与大语言模型(LLM)集成场景中,系统架构需兼顾灵活性与可扩展性。典型分层架构包含以下模块:
- 模型服务层:封装与LLM交互的核心逻辑,支持多模型动态切换
- 业务适配层:将模型能力转化为业务可用的API接口
- 监控管理层:实现调用日志、性能指标的采集与告警
// 示例:模型服务抽象接口public interface ModelService {String generateText(String prompt, Map<String, Object> params);Stream<String> streamGenerate(String prompt);boolean validateInput(String input);}
建议采用依赖注入模式管理不同模型实现,例如:
@Configurationpublic class ModelConfig {@Bean@Qualifier("llmService")public ModelService llmService() {// 动态选择模型实现return new LlamaAdapter(); // 或QianWenAdapter()}}
二、核心依赖与版本管理
构建稳定环境需严格管理依赖版本,推荐组合:
- Spring Boot 3.2.x + Spring AI 1.1.x
- HTTP客户端:WebClient(响应式)或RestTemplate
- 序列化:Jackson 2.15+
关键依赖示例(Maven):
<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-core</artifactId><version>1.1.0</version></dependency><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-http</artifactId><version>1.1.0</version></dependency>
版本冲突解决方案:
- 使用
mvn dependency:tree分析依赖树 - 通过
<exclusions>排除冲突传递依赖 - 统一依赖管理至父POM
三、API调用层深度实现
1. 基础请求封装
public class LlmApiClient {private final WebClient webClient;private final String apiKey;public LlmApiClient(String baseUrl, String apiKey) {this.webClient = WebClient.builder().baseUrl(baseUrl).defaultHeader("Authorization", "Bearer " + apiKey).build();this.apiKey = apiKey;}public Mono<String> callCompletion(String prompt) {return webClient.post().uri("/v1/completions").bodyValue(new CompletionRequest(prompt)).retrieve().bodyToMono(CompletionResponse.class).map(CompletionResponse::getContent);}}
2. 流式响应处理
对于长文本生成场景,需实现SSE(Server-Sent Events)解析:
public Flux<String> streamCompletion(String prompt) {return webClient.post().uri("/v1/completions/stream").bodyValue(new StreamRequest(prompt)).accept(MediaType.TEXT_EVENT_STREAM).retrieve().bodyToFlux(String.class).map(this::parseStreamEvent);}private String parseStreamEvent(String event) {// 解析"data: {"content":"..."}"格式String[] parts = event.split("data: ")[1].trim().split("\n")[0].split("\\}\"");return parts[0].replace("{\"content\":\"", "") + (parts.length > 1 ? parseStreamEvent("data: " + parts[1]) : "");}
四、多模型适配方案
1. 适配器模式实现
public abstract class ModelAdapter implements ModelService {protected final RestTemplate restTemplate;public ModelAdapter() {this.restTemplate = new RestTemplateBuilder().setConnectTimeout(Duration.ofSeconds(10)).setReadTimeout(Duration.ofSeconds(30)).build();}@Overridepublic boolean validateInput(String input) {return input != null && input.length() <= getMaxInputLength();}protected abstract int getMaxInputLength();}public class QianWenAdapter extends ModelAdapter {@Overridepublic String generateText(String prompt, Map<String, Object> params) {// 实现特定模型调用逻辑HttpHeaders headers = new HttpHeaders();headers.setContentType(MediaType.APPLICATION_JSON);// ...构建请求体}}
2. 动态路由策略
实现基于配置的模型路由:
@Servicepublic class ModelRouter {@Autowiredprivate List<ModelService> modelServices;private final Map<String, ModelService> routeMap = new ConcurrentHashMap<>();@PostConstructpublic void init() {// 从配置加载路由规则routeMap.put("default", modelServices.get(0));routeMap.put("high_quality", modelServices.stream().filter(s -> s instanceof PremiumModelService).findFirst().orElseThrow());}public ModelService getModel(String routeKey) {return Optional.ofNullable(routeMap.get(routeKey)).orElseThrow(() -> new IllegalArgumentException("Invalid route key"));}}
五、生产环境优化实践
1. 性能调优要点
-
连接池配置:
@Beanpublic HttpClient httpClient() {return HttpClient.create().responseTimeout(Duration.ofSeconds(30)).doOnConnected(conn ->conn.addHandlerLast(new ReadTimeoutHandler(30)).addHandlerLast(new WriteTimeoutHandler(10)));}
-
异步处理优化:
- 使用
@Async注解实现非阻塞调用 - 配置自定义线程池:
@Configuration@EnableAsyncpublic class AsyncConfig {@Bean(name = "modelExecutor")public Executor modelExecutor() {ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();executor.setCorePoolSize(10);executor.setMaxPoolSize(20);executor.setQueueCapacity(100);return executor;}}
- 使用
2. 异常处理机制
实现分级异常处理:
@ControllerAdvicepublic class ModelExceptionHandler {@ExceptionHandler(ModelTimeoutException.class)public ResponseEntity<ErrorResponse> handleTimeout(ModelTimeoutException ex) {return ResponseEntity.status(429).body(new ErrorResponse("MODEL_TIMEOUT", "Model response exceeded timeout"));}@ExceptionHandler(ModelRateLimitException.class)public ResponseEntity<ErrorResponse> handleRateLimit(ModelRateLimitException ex) {return ResponseEntity.status(429).header("Retry-After", String.valueOf(ex.getRetrySeconds())).body(new ErrorResponse("RATE_LIMITED", ex.getMessage()));}}
六、安全与合规实践
-
敏感信息脱敏:
- 实现请求/响应日志的自动脱敏
- 使用AOP拦截模型调用日志
-
鉴权体系集成:
public class AuthInterceptor implements ClientHttpRequestInterceptor {@Overridepublic ClientHttpResponse intercept(HttpRequest request, byte[] body,ClientHttpRequestExecution execution) throws IOException {// 动态添加鉴权头String token = TokenProvider.getToken();request.getHeaders().set("X-API-KEY", token);return execution.execute(request, body);}}
-
数据加密传输:
- 强制使用HTTPS
- 敏感参数加密(如使用JWE)
七、监控与运维体系
1. 指标采集方案
@Beanpublic MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {return registry -> registry.config().commonTags("application", "llm-service");}@Timed(value = "model.call", description = "Time spent calling model API")public String callModel(String prompt) {// 模型调用逻辑}
2. 日志追踪实现
- 使用MDC实现请求ID追踪
- 结构化日志示例:
{"timestamp": "2023-11-15T10:30:45.123Z","level": "INFO","traceId": "abc123","service": "llm-gateway","message": "Model call completed","model": "qianwen-v2","durationMs": 452,"tokens": 128}
通过上述架构设计与实现细节,开发者可构建出高可用、可扩展的SpringAI与大模型集成环境。实际开发中需特别注意:1)模型API的兼容性测试 2)异步处理的上下文传递 3)生产环境的全链路压测。建议采用蓝绿部署策略逐步上线,并通过混沌工程验证系统容错能力。