MongoDB 建模示例

MongoDB 建模示例

关系型模型

嵌入式文档一对一关系模型

嵌入式文档一对一关系模型 - 嵌入式文档模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// patron document
{
_id: "joe",
name: "Joe Bookreader"
}

// address document
{
patron_id: "joe", // reference to patron document
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}

合并为:

1
2
3
4
5
6
7
8
9
10
{
"_id": "joe",
"name": "Joe Bookreader",
"address": {
"street": "123 Fake Street",
"city": "Faketon",
"state": "MA",
"zip": "12345"
}
}

嵌入式文档一对一关系模型 - 子集模式

假设,有一个用于描述电影信息的 collection 定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"_id": 1,
"title": "The Arrival of a Train",
"year": 1896,
"runtime": 1,
"released": ISODate("01-25-1896"),
"poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg",
"plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...",
"fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.",
"lastupdated": ISODate("2015-08-15T10:06:53"),
"type": "movie",
"directors": ["Auguste Lumière", "Louis Lumière"],
"imdb": {
"rating": 7.3,
"votes": 5043,
"id": 12
},
"countries": ["France"],
"genres": ["Documentary", "Short"],
"tomatoes": {
"viewer": {
"rating": 3.7,
"numReviews": 59
},
"lastUpdated": ISODate("2020-01-09T00:02:53")
}
}

在应用中,有的场景只需要显示电影的简单浏览信息,不需要显示类似 fullplot、poster 这样的详细信息。因为,我们可以考虑将原结构一份为二,并通过 id 字段关联起来。

用于展示摘要信息的 movie collection

1
2
3
4
5
6
7
8
9
10
11
12
13
// movie collection

{
"_id": 1,
"title": "The Arrival of a Train",
"year": 1896,
"runtime": 1,
"released": ISODate("1896-01-25"),
"type": "movie",
"directors": ["Auguste Lumière", "Louis Lumière"],
"countries": ["France"],
"genres": ["Documentary", "Short"]
}

用于展示细节信息的 movie_details collection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// movie_details collection

{
"_id": 156,
"movie_id": 1, // reference to the movie collection
"poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg",
"plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...",
"fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.",
"lastupdated": ISODate("2015-08-15T10:06:53"),
"imdb": {
"rating": 7.3,
"votes": 5043,
"id": 12
},
"tomatoes": {
"viewer": {
"rating": 3.7,
"numReviews": 59
},
"lastUpdated": ISODate("2020-01-29T00:02:53")
}
}

嵌入式文档一对多关系模型

嵌入式文档一对多关系模型 - 嵌入式文档模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// patron document
{
_id: "joe",
name: "Joe Bookreader"
}

// address documents
{
patron_id: "joe", // reference to patron document
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}

{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: "12345"
}

合并为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"_id": "joe",
"name": "Joe Bookreader",
"addresses": [
{
"street": "123 Fake Street",
"city": "Faketon",
"state": "MA",
"zip": "12345"
},
{
"street": "1 Some Other Street",
"city": "Boston",
"state": "MA",
"zip": "12345"
}
]
}

嵌入式文档一对多关系模型 - 子集模式

考虑一个电商网站用于表示商品的 collection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"_id": 1,
"name": "Super Widget",
"description": "This is the most useful item in your toolbox.",
"price": { "value": NumberDecimal("119.99"), "currency": "USD" },
"reviews": [
{
"review_id": 786,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
},
{
"review_id": 785,
"review_author": "Trina",
"review_text": "Nice product. Slow shipping.",
"published_date": ISODate("2019-02-17")
},
...{
"review_id": 1,
"review_author": "Hans",
"review_text": "Meh, it's okay.",
"published_date": ISODate("2017-12-06")
}
]
}

评论按时间倒序排列。 当用户访问产品页面时,应用程序将加载十条最近的评论。可以将集合分为两个集合,而不是与产品一起存储所有评论:

产品集合存储有关每个产品的信息,包括产品的十个最新评论:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"_id": 1,
"name": "Super Widget",
"description": "This is the most useful item in your toolbox.",
"price": { "value": NumberDecimal("119.99"), "currency": "USD" },
"reviews": [
{
"review_id": 786,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
}
...
{
"review_id": 776,
"review_author": "Pablo",
"review_text": "Amazing!",
"published_date": ISODate("2019-02-16")
}
]
}

review collection 存储所有的评论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"review_id": 786,
"product_id": 1,
"review_author": "Kristina",
"review_text": "This is indeed an amazing widget.",
"published_date": ISODate("2019-02-18")
}
{
"review_id": 785,
"product_id": 1,
"review_author": "Trina",
"review_text": "Nice product. Slow shipping.",
"published_date": ISODate("2019-02-17")
}
...
{
"review_id": 1,
"product_id": 1,
"review_author": "Hans",
"review_text": "Meh, it's okay.",
"published_date": ISODate("2017-12-06")
}

引用式文档一对多关系模型

考虑以下映射出版商和书籍关系的示例。

该示例说明了引用式文档的优点,以避免重复发布者信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}

{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}

为避免重复出版商数据,可以使用引用型文档,并将出版商信息与书本分开保存。 使用引用时,关系的增长决定了将引用存储在何处。 如果每个出版商的图书数量很少且增长有限,则有时将图书参考存储在出版商文档中可能会很有用。 否则,如果每个发布者的书籍数量不受限制,则此数据模型将导致可变的,不断增长的数组,如以下示例所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [123456789, 234567890, ...]
}

{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}

{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}

为了避免可变的,增长的数组,请将发行者参考存储在书籍文档中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
_id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}

{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}

{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}

树形结构模型

img

具有父节点的树形结构模型

上图结构可以用父引用来表示:

1
2
3
4
5
6
7
8
db.categories.insertMany([
{ "_id": "MongoDB", "parent": "Databases" },
{ "_id": "dbm", "parent": "Databases" },
{ "_id": "Databases", "parent": "Programming" },
{ "_id": "Languages", "parent": "Programming" },
{ "_id": "Programming", "parent": "Books" },
{ "_id": "Books", "parent": null }
])
  • 检索节点的父节点:

    1
    db.categories.findOne( { _id: "MongoDB" } ).parent
  • 可以在父字段上创建索引以启用父节点的快速搜索:

    1
    db.categories.createIndex( { parent: 1 } )
  • 可以通过父字段查询找到其直接子节点:

    1
    db.categories.find( { parent: "Databases" } )
  • 检索子树,可以参考: $graphLookup.

具有子节点的树形结构模型

1
2
3
4
5
6
7
8
db.categories.insertMany([
{ "_id": "MongoDB", "children": [] },
{ "_id": "dbm", "children": [] },
{ "_id": "Databases", "children": ["MongoDB", "dbm"] },
{ "_id": "Languages", "children": [] },
{ "_id": "Programming", "children": ["Databases", "Languages"] },
{ "_id": "Books", "children": ["Programming"] }
])
  • 检索节点的 children:

    1
    db.categories.findOne( { _id: "Databases" } ).children
  • 可以在 children 字段上创建索引以启用子节点的快速搜索:

    1
    db.categories.createIndex( { children: 1 } )
  • 可以在 children 字段中查询节点,以找到其父节点及其兄弟节点:

    1
    db.categories.find( { children: "MongoDB" } )

具有祖先的树形结构模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
db.categories.insertMany([
{
"_id": "MongoDB",
"ancestors": ["Books", "Programming", "Databases"],
"parent": "Databases"
},
{
"_id": "dbm",
"ancestors": ["Books", "Programming", "Databases"],
"parent": "Databases"
},
{
"_id": "Databases",
"ancestors": ["Books", "Programming"],
"parent": "Programming"
},
{
"_id": "Languages",
"ancestors": ["Books", "Programming"],
"parent": "Programming"
},
{ "_id": "Programming", "ancestors": ["Books"], "parent": "Books" },
{ "_id": "Books", "ancestors": [], "parent": null }
])
  • 检索节点的祖先或路径的查询是快速而直接的:

    1
    db.categories.findOne({ "_id": "MongoDB" }).ancestors
  • 可以在 ancestors 字段上创建索引,以启用祖先节点的快速搜索:

    1
    db.categories.createIndex({ "ancestors": 1 })
  • 可以通过 ancestors 字段查询查找其所有后代:

    1
    db.categories.find({ "ancestors": "Programming" })

具有实体化路径的树形结构模型

1
2
3
4
5
6
7
8
db.categories.insertMany([
{ "_id": "Books", "path": null },
{ "_id": "Programming", "path": ",Books," },
{ "_id": "Databases", "path": ",Books,Programming," },
{ "_id": "Languages", "path": ",Books,Programming," },
{ "_id": "MongoDB", "path": ",Books,Programming,Databases," },
{ "_id": "dbm", "path": ",Books,Programming,Databases," }
])
  • 可以查询以检索整个树,并按字段路径排序:

    1
    db.categories.find().sort( { path: 1 } )
  • 可以在 path 字段上使用正则表达式来查找 Programming 的后代

    1
    db.categories.find( { path: /,Programming,/ } )
  • 可以检索 Books 的后代,其中 Books 也位于层次结构的最高级别:

    1
    db.categories.find( { path: /^,Books,/ } )
  • 要在 path 字段上创建索引,请使用以下调用:

    1
    db.categories.createIndex( { path: 1 } )

具有嵌套集的树形结构模型

img

1
2
3
4
5
6
7
8
db.categories.insertMany([
{ _id: 'Books', parent: 0, left: 1, right: 12 },
{ _id: 'Programming', parent: 'Books', left: 2, right: 11 },
{ _id: 'Languages', parent: 'Programming', left: 3, right: 4 },
{ _id: 'Databases', parent: 'Programming', left: 5, right: 10 },
{ _id: 'MongoDB', parent: 'Databases', left: 6, right: 7 },
{ _id: 'dbm', parent: 'Databases', left: 8, right: 9 }
])

可以查询以检索节点的后代:

1
2
3
4
5
var databaseCategory = db.categories.findOne({ _id: 'Databases' })
db.categories.find({
left: { $gt: databaseCategory.left },
right: { $lt: databaseCategory.right }
})

设计模式

大文档,很多列,很多索引

解决方案是:列转行

img

管理文档不同版本

MongoDB 文档格式非常灵活,势必会带来版本维护上的难度。

解决方案是:可以增加一个版本号字段

  • 快速过滤掉不需要升级的文档
  • 升级时,对不同版本的文档做不同处理

统计网页点击量

统计数据精确性要求并不是十分重要。

解决方案:用近似计算

每隔 10 次写一次:

1
{ "$inc": { "views": 1 } }

精确统计

解决方案:使用预聚合

参考资料