深度CTR模型进阶：xDeepFM与FiBiNET代码实现详解

在CTR预估任务中，深度学习模型通过特征交叉与高阶非线性建模显著提升了预测精度。本文将聚焦两种具有代表性的深度CTR模型——xDeepFM与FiBiNET，从理论架构、代码实现到优化策略进行系统性解析，帮助开发者掌握其核心机制与工程实践。

一、xDeepFM：显式高阶特征交互的突破

1.1 模型架构解析

xDeepFM的核心创新在于CIN（Compressed Interaction Network）模块，它通过显式方式构建高阶特征交互，解决了传统DNN隐式交叉可解释性差的问题。CIN采用向量级交叉（而非FM的位级交叉），通过逐层递归计算生成不同阶数的交互特征。

输入层：将离散特征嵌入为低维稠密向量，连续特征直接拼接。
CIN层：
- 第k层交互向量通过前k-1层输出与原始嵌入向量交叉生成。
- 使用压缩网络（1x1卷积）降低维度，避免参数爆炸。
输出层：CIN输出与线性部分、DNN部分输出拼接，通过Sigmoid激活。

1.2 代码实现关键点

1.2.1 CIN模块实现

import torch
import torch.nn as nn
class CIN(nn.Module):
    def __init__(self, field_size, embed_dim, hidden_layers=[64, 32]):
        super().__init__()
        self.field_size = field_size
        self.embed_dim = embed_dim
        self.hidden_layers = hidden_layers
        # 初始化CIN层参数
        self.cin_layers = nn.ModuleList()
        for i, h in enumerate(hidden_layers):
            self.cin_layers.append(
                nn.Conv1d(
                    in_channels=field_size if i == 0 else hidden_layers[i-1],
                    out_channels=h,
                    kernel_size=1
                )
            )
    def forward(self, x):
        # x: [batch_size, field_size, embed_dim]
        cin_outputs = []
        for layer in self.cin_layers:
            # 交互计算（简化版，实际需实现外积计算）
            interaction = torch.bmm(x, x.transpose(1, 2))  # [batch, field, field]
            interaction = interaction.reshape(
                interaction.size(0), -1, self.embed_dim
            )  # [batch, field*field, embed]
            # 压缩网络
            conv_out = layer(interaction.transpose(1, 2))  # [batch, h, field*field]
            conv_out = conv_out.reshape(
                conv_out.size(0), -1, self.field_size
            )  # [batch, h, field]
            x = conv_out
            cin_outputs.append(torch.sum(x, dim=2))  # 聚合特征
        return torch.cat(cin_outputs, dim=1)  # 拼接各层输出

1.2.2 完整模型集成

class xDeepFM(nn.Module):
    def __init__(self, feature_sizes, embed_dim, hidden_layers=[64, 32]):
        super().__init__()
        # 线性部分
        self.linear = nn.Linear(sum(feature_sizes), 1)
        # 嵌入层
        self.embeddings = nn.ModuleList([
            nn.Embedding(size, embed_dim) for size in feature_sizes
        ])
        # DNN部分
        self.dnn = nn.Sequential(
            nn.Linear(len(feature_sizes) * embed_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU()
        )
        # CIN部分
        self.cin = CIN(len(feature_sizes), embed_dim, hidden_layers)
    def forward(self, x):
        # 嵌入处理
        embed_outputs = []
        for i, feat in enumerate(x):
            embed_outputs.append(self.embeddings[i](feat))
        x_embed = torch.cat(embed_outputs, dim=1)  # [batch, field, embed]
        # 各部分输出
        linear_out = self.linear(x.flatten(1))  # 线性部分
        dnn_out = self.dnn(x_embed.flatten(1))  # DNN部分
        cin_out = self.cin(x_embed)  # CIN部分
        # 最终输出
        combined = torch.cat([linear_out, dnn_out, cin_out], dim=1)
        return torch.sigmoid(torch.sum(combined, dim=1))

1.3 优化策略

参数初始化：CIN层权重使用Xavier初始化，避免梯度消失。
正则化：对CIN层输出添加Dropout（如0.3），防止过拟合。
阶数控制：通过调整hidden_layers参数平衡模型复杂度与性能。

二、FiBiNET：动态特征加权与双线性交叉

2.1 模型核心机制

FiBiNET的创新在于Squeeze-and-Excitation（SE）模块与双线性特征交叉：

SE模块：动态学习特征重要性权重，增强关键特征贡献。
双线性交叉：通过两种交叉方式（内积与哈达玛积）捕捉复杂交互模式。

2.2 代码实现要点

2.2.1 SE模块实现

class SEBlock(nn.Module):
    def __init__(self, embed_dim, reduction_ratio=4):
        super().__init__()
        self.squeeze = nn.AdaptiveAvgPool1d(1)  # 全局平均池化
        self.excitation = nn.Sequential(
            nn.Linear(embed_dim, embed_dim // reduction_ratio),
            nn.ReLU(),
            nn.Linear(embed_dim // reduction_ratio, embed_dim),
            nn.Sigmoid()
        )
    def forward(self, x):
        # x: [batch, field, embed]
        batch_size, field_num, embed_dim = x.size()
        y = self.squeeze(x.transpose(1, 2)).squeeze(-1)  # [batch, embed]
        y = self.excitation(y)  # [batch, embed]
        y = y.view(batch_size, embed_dim, 1)  # [batch, embed, 1]
        return x * y.expand_as(x)  # 特征加权

2.2.2 双线性交叉实现

class BiLinearInteraction(nn.Module):
    def __init__(self, embed_dim, method='field_all'):
        super().__init__()
        self.method = method
        if method == 'field_each':
            # 每个字段对使用独立权重
            self.weight = nn.Parameter(torch.randn(embed_dim, embed_dim))
    def forward(self, x):
        # x: [batch, field, embed]
        if self.method == 'field_all':
            # 哈达玛积交叉
            x_t = x.transpose(1, 2)  # [batch, embed, field]
            interaction = torch.bmm(x, x_t)  # [batch, field, field]
        else:
            # 内积交叉（带权重）
            batch_size, field_num, embed_dim = x.size()
            x_flat = x.view(batch_size * field_num, embed_dim)
            interaction = torch.matmul(
                x_flat, self.weight.matmul(x_flat.t())
            ).view(batch_size, field_num, field_num)
        return interaction

2.3 完整模型集成

class FiBiNET(nn.Module):
    def __init__(self, feature_sizes, embed_dim):
        super().__init__()
        self.embeddings = nn.ModuleList([
            nn.Embedding(size, embed_dim) for size in feature_sizes
        ])
        self.se_block = SEBlock(embed_dim)
        self.bi_linear = BiLinearInteraction(embed_dim)
        self.dnn = nn.Sequential(
            nn.Linear(len(feature_sizes) * embed_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 1)
        )
    def forward(self, x):
        embed_outputs = []
        for i, feat in enumerate(x):
            embed_outputs.append(self.embeddings[i](feat))
        x_embed = torch.cat(embed_outputs, dim=1)  # [batch, field, embed]
        # SE加权
        x_se = self.se_block(x_embed)
        # 双线性交叉
        interaction = self.bi_linear(x_se)
        interaction_flat = interaction.flatten(1)  # [batch, field*field]
        # 拼接嵌入与交叉特征
        x_flat = x_embed.flatten(1)  # [batch, field*embed]
        combined = torch.cat([x_flat, interaction_flat], dim=1)
        return torch.sigmoid(self.dnn(combined))

三、工程实践建议

3.1 特征处理优化

嵌入维度选择：通常设为16-64，根据数据稀疏性调整。
字段分组：对高频字段与低频字段分开处理，避免维度爆炸。

3.2 训练技巧

学习率调度：使用余弦退火或预热学习率，稳定训练过程。
损失函数：结合LogLoss与AUC优化目标，提升模型排序能力。

3.3 部署优化

模型压缩：通过量化（如INT8）与剪枝减少模型体积。
服务化：使用TensorRT或TorchScript加速推理，满足低延迟需求。

四、总结与对比

模型	核心创新	适用场景	复杂度
xDeepFM	显式高阶交叉（CIN）	特征交互复杂的数据集	高
FiBiNET	动态加权 + 双线性交叉	特征重要性差异大的场景	中

两种模型均通过创新机制提升了特征交互能力，实际选择需结合业务数据特点与计算资源。未来可探索将两者结合，或引入注意力机制进一步优化交互模式。