Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KYLIN-5371 fix segment prune #2049

Open
wants to merge 1 commit into
base: kylin4
Choose a base branch
from

Conversation

liuzhao-lz
Copy link

@liuzhao-lz liuzhao-lz commented Dec 19, 2022

Proposed changes

创建model时在partition部分指定了天分区和时分区,格式分别为:yyyy-MM-dd、HH,在根据天分区值查询时结果非预期。原因是这种场景下segment裁剪有bug。

--不加天分区过滤得到的结果(18号3条数据,19号5条数据)
1e9077f923e57f65e3712b75d639a31

--加天分区在修复前查询结果
--18号实际有3条数据
6bbf3bf08dea6b5a269d57c65b4a1b7
--19号实际有5条数据
e4046323d0a7a8521dae904c5030f26
--18号19号共有8条数据
6858ba031d6d44b050edc76a6e96251

--加天分区在修复后查询结果
7f372221624b83ff4ad29023c86d373
38abe82bacc6c48a8b2ee8d1b31a180
559a5f4b672c8d2521cb7f68fd88f72

Branch to commit

  • Branch kylin3 for v2.x to v3.x
  • Branch kylin4 for v4.x
  • Branch kylin5 for v5.x

Types of changes

What types of changes does your code introduce to Kylin?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on Kylin's jira, and have described the bug/feature there in detail
  • Commit messages in my PR start with the related jira ID, like "KYLIN-0000 Make Kylin project open-source"
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged

Further comments

If this is a relatively large or complex change, kick off the discussion at user@kylin.apache.org or dev@kylin.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

try {
if (StringUtils.isNotBlank(pattern)) {
SimpleDateFormat sdf = new SimpleDateFormat(pattern, Locale.getDefault(Locale.Category.FORMAT));
sdf.setTimeZone(TimeZone.getTimeZone("GMT"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard code for timezone, please reference to org.apache.kylin.common.KylinConfigBase#getTimeZone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b834587586ad4ff0235117f3887cbb5

SegFilters(tsRange.startValue, tsRange.endValue, pattern)
// tsRange: 20221219000000_20221219010000、20221219010000_20221219020000, pattern: yyyy-MM-dd
val start = DateFormat.getFormatTimeStamp(tsRange.startValue, pattern)
SegFilters(start, tsRange.endValue, pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about tsRange.endValue? is it same as start?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsRange.endValue 不需要考虑,DateFormat.getFormatTimeStamp(tsRange.startValue, pattern) 这个的作用是将 startValue 格式化成天的时间戳值(忽略小时),原因是 SegFilters 的 foldFilter 中是按天格式化where 分区字段值的(记作ts),后续比较也是 “ts >= start && ts < end” end 如果是到时间其对应的时间戳一定是大于天的时间戳值因而可以不用处理。

2e8dbc69f57fa37e8734d713d2317f2
3b63ed8ab495cedb6de7628b9c27d0e
f2fa18fd622f6a54a4eeaa3f8501557

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for clearance, please comment on the place of tsRange.endValue then help others to understand.

@liuzhao-lz
Copy link
Author

image

@Mukvin
Copy link
Contributor

Mukvin commented Dec 28, 2022

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants