graviti logo产品公开数据集关于我们
登录
1351
6
35
Structured3D Dataset
创建来自Coohom Cloud
概要
讨论
代码
活动

Overview

img

Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground truth 3D structure annotations (b) and generate photo-realistic 2D images (c).

Data Preview

3D annotations

2D annotations

Data Distribution

img
Room Type

img
Number of instances

img
Number of rooms

Data Format

The dataset consists of rendering images and corresponding ground truth annotations (e.g. semantic, albedo, depth, surface normal, layout) under different lighting and furniture configurations.
There is a separate subdirectory for every scene (i.e., house design), which is named by a unique ID. Within each scene directory, there are separate directories for different types of data as follows:

scene_<sceneID>
├── 2D_rendering
│   └── <roomID>
│       ├── panorama
│       │   ├── <empty/simple/full>
│       │   │   ├── rgb_<cold/raw/warm>light.png
│       │   │   ├── semantic.png
│       │   │   ├── instance.png
│       │   │   ├── albedo.png
│       │   │   ├── depth.png
│       │   │   └── normal.png
│       │   ├── layout.txt
│       │   └── camera_xyz.txt
│       └── perspective
│           └── <empty/full>
│               └── <positionID>
│                   ├── rgb_rawlight.png
│                   ├── semantic.png
│                   ├── instance.png
│                   ├── albedo.png
│                   ├── depth.png
│                   ├── normal.png
│                   ├── layout.json
│                   └── camera_pose.txt
├── bbox_3d.json
└── annotation_3d.json

Data Annotation

We provide the primitive and relationship based structure annotation for each scene, and oriented bounding box for each object instance.
Structure annotation (annotation_3d.json): see all the room types here.

{
  // PRIMITVIES
  "junctions":[
    {
      "ID":             : int,
      "coordinate"      : List[float]       // 3D vector
    }
  ],
  "lines": [
    {
      "ID":             : int,
      "point"           : List[float],      // 3D vector
      "direction"       : List[float]       // 3D vector
    }
  ],
  "planes": [
    {
      "ID":             : int,
      "type"            : str,              // ceiling, floor, wall
      "normal"          : List[float],      // 3D vector, the normal points to the empty space
      "offset"          : float
    }
  ],
  // RELATIONSHIPS
  "semantics": [
    {
      "ID"              : int,
      "type"            : str,              // room type, door, window
      "planeID"         : List[int]         // indices of the planes
    }
  ],
  "planeLineMatrix"     : Matrix[int],      // matrix W_1 where the ij-th entry is 1 iff l_i is on p_j
  "lineJunctionMatrix"  : Matrix[int],      // matrix W_2 here the mn-th entry is 1 iff x_m is on l_nj
  // OTHERS
  "cuboids": [
    {
      "ID":             : int,
      "planeID"         : List[int]         // indices of the planes
    }
  ]
  "manhattan": [
    {
      "ID":             : int,
      "planeID"         : List[int]         // indices of the planes
    }
  ]
}

Bounding box (bbox_3d.json): the oriented bounding box annotation in world coordinate, same as the SUN RGB-D Dataset.

[
  {
    "ID"        : int,              // instance id
    "basis"     : Matrix[flaot],    // basis of the bounding box, one row is one basis
    "coeffs"    : List[flaot],      // radii in each dimension
    "centroid"  : List[flaot],      // 3D centroid of the bounding box
  }
]

For each image, we provide semantic, instance, albedo, depth, normal, layout annotation and camera position. Please note that we have different layout and camera annotation format for panoramic and perspective images.
Semantic annotation (semantic.png): unsigned 8-bit integers within a PNG. We use NYUv2 40-label set, see all the label ids here.
Instance annotation (instance.png): unsigned 16-bit integers within a PNG. We only provide instance annotation for full configuration. The maximum value (65535) denotes background.
Albedo data (albedo.png): unsigned 8-bit integers within a PNG.
Depth data (depth.png): unsigned 16-bit integers within a PNG. The units are millimeters, a value of 1000 is a meter. A zero value denotes no reading.
Normal data (normal.png): unsigned 8-bit integers within a PNG (x, y, z), where the integer values in the file are 128 * (1 + n), where n is a normal coordinate in range [-1, 1].
Layout annotation for panorama (layout.txt): an ordered list of 2D positions of the junctions (same as LayoutNet and HorizonNet). The order of the junctions is shown in the figure below. In our dataset, the cameras of the panoramas are aligned with the gravity direction, thus a pair of ceiling-wall and floor-wall junctions share the same x-axis coordinates.

img

Layout annotation for perspecitve (layout.json): We also include the junctions that formed by line segments intersecting with each other or image boundary. We consider the visible and invisible part caused by the room structure instead of furniture.

{
  "junctions":[
    {
      "ID"            : int,              // corresponding 3D junction id, none corresponds to fake 3D junction
      "coordinate"    : List[int],        // 2D location in the camera coordinate
      "isvisible"     : bool              // this junction is whether occluded by the other walls
    }
  ],
  "planes": [
    {
      "ID"            : int,              // corresponding 3D plane id
      "visible_mask"  : List[List[int]],  // visible segmentation mask, list of junctions ids
      "amodal_mask"   : List[List[int]],  // amodal segmentation mask, list of junctions ids
      "normal"        : List[float],      // normal in the camera coordinate
      "offset"        : float,            // offset in the camera coordinate
      "type"          : str               // ceiling, floor, wall
    }
  ]
}

Camera location for panorama (camera_xyz.txt): For each panoramic image, we only store the camera location in global coordinates. The direction of the camera is always along the negative y-axis. Global coordinate system is arbitrary, but the z-axis generally points upward.
Camera location for perspective (camera_pose.txt): For each perspective image, we store the camera location and pose in global coordinates.

vx vy vz tx ty tz ux uy uz xfov yfov 1

where (vx, vy, vz) is the eye viewpoint of the camera, (tx, ty, tz) is the view direction, (ux, uy, uz) is the up direction, and xfov and yfov are the half-angles of the horizontal and vertical fields of view of the camera in radians (the angle from the central ray to the leftmost/bottommost ray in the field of view), same as the Matterport3D Dataset.

Instruction

The basic code for viewing the structure annotations of Structured3D dataset.

Citation

@inproceedings{Structured3D,
  title     = {Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling},
  author    = {Jia Zheng and Junfei Zhang and Jing Li and Rui Tang and Shenghua Gao and Zihan Zhou},
  booktitle = {Proceedings of The European Conference on Computer Vision (ECCV)},
  year      = {2020}
}

License

Custom

🎉感谢Coohom Cloud的贡献
数据集信息
应用场景暂无
标注类型Semantic Segmentation 2DInstance Segmentation 2DRoom Layout EstimationBox3DDepth
任务类型Indoor Scene Understanding
LicenseCustom
更新时间2021-03-24 22:50:35
数据概要
数据格式Image
数据数量196k
已标注数量0
文件大小617MB
版权归属方
Coohom
标注方
未知
了解更多和支持
立即开始构建AI
免费开始联系我们