ORC 1.8.8 Released

The ORC team is excited to announce the release of ORC v1.8.8.

The bug fixes:

  • ORC-1528: Fix ClassCastException when reading avro decimal type in benchmark
  • ORC-1602: [C++] Wrong Int128 maximum value

The test changes:

The tasks:

  • ORC-1536 Remove MacOS 11 from GitHub Action CI

ORC 1.7.11 Released

The ORC team is excited to announce the release of ORC v1.7.11.

The bug fixes:

  • ORC-1602 [C++] limit compression block size
  • ORC-1738 [C++] Fix wrong Int128 maximum value

The ‘tests’ fixes:

  • ORC-1540 Remove MacOS 11 from GitHub Action CI and docs
  • ORC-1556 Add Rocky Linux 9 Docker Test
  • ORC-1557 Add GitHub Action CI for Docker Test
  • ORC-1561 Remove Java11 and clang variants from docker/os-list.txt in branch-1.7
  • ORC-1578 Fix SparkBenchmark on sales data according to SPARK-40918
  • ORC-1696 Fix ClassCastException when reading avro decimal type in bechmark

ORC 2.0.2 Released

The ORC team is excited to announce the release of ORC v2.0.2.

The improvements (tools):

  • ORC-1724 JsonFileDump utility should print user metadata
  • ORC-1740 Avoid the dump tool repeatedly parsing ColumnStatistics
  • ORC-1742 Support print the id, name and type of each column in dump tool

The bug fixes:

  • ORC-1732 [C++] Fix detecting Homebrew-installed Protobuf on MacOS
  • ORC-1733 [C++][CMake] Fix CMAKE_MODULE_PATH not to use PROJECT_SOURCE_DIR
  • ORC-1738 [C++] Fix wrong Int128 maximum value
  • ORC-1741 Respect decimal reader isRepeating flag
  • ORC-1749 Fix supportVectoredIO for hadoop version string with optional patch labels
  • ORC-1751 [C++] Fix syntax error in ThirdpartyToolchain

The test changes:

  • ORC-1694 Upgrade gson to 2.9.0 for Benchmarks Hive
  • ORC-1697 Fix IllegalArgumentException when reading json timestamp type in benchmark
  • ORC-1700 Write parquet decimal type data in Benchmark using FIXED_LEN_BYTE_ARRAY type
  • ORC-1743 Upgrade Spark to 4.0.0-preview1
  • ORC-1744 Add ubuntu-24.04 to GitHub Action
  • ORC-1746 Bump netty-all to 4.1.110.Final in bench module
  • ORC-1752 Fix NumberFormatException when reading json timestamp type in benchmark
  • ORC-1753 Use Avro 1.12.0 in bench module

The build and dependency changes:

ORC 1.9.4 Released

The ORC team is excited to announce the release of ORC v1.9.4.

The bug fixes:

  • ORC-1696 Fix ClassCastException when reading avro decimal type in bechmark
  • ORC-1721 Upgrade aircompressor to 0.27
  • ORC-1738 Wrong Int128 maximum value

The test changes:

  • ORC-1619 Add MacOS 14 to GitHub Action
  • ORC-1699 Fix SparkBenchmark in Parquet format according to SPARK-40918

The task changes:

  • ORC-1540 Remove MacOS 11 from GitHub Action CI

ORC 2.0.1 Released

The ORC team is excited to announce the release of ORC v2.0.1.

The improvements (tools):

  • ORC-1644 Add merge tool to merge multiple ORC files into a single ORC file
  • ORC-1647 Tips for supporting ORC in the convert command
  • ORC-1667 Add check tool to check the index of the specified column

The bug fixes:

  • ORC-1646 Close the reader when reading the schema with the convert command
  • ORC-1654 [C++] Count up EvaluatedRowGroupCount correctly
  • ORC-1684 [C++] Find tzdb without TZDIR when in conda-environments
  • ORC-1688 [C++] Do not access TZDB if there is no timestamp type
  • ORC-1696 Fix ClassCastException when reading avro decimal type in bechmark The tasks:
  • ORC-1649 [C++][Conan] Add 2.0.0 to conan recipe and update release guide
  • ORC-1669 [C++] Deprecate HDFS support
  • ORC-1686 [C++] Avoid using std::filesystem

The test changes:

  • ORC-1648 Add test to convert ORC in the convert command
  • ORC-1663 [C++] Enable TestTimezone.testMissingTZDB on Windows
  • ORC-1672 Remove test packages o.a.o.tools.check
  • ORC-1673 Remove test packages o.a.o.tools.[count|merge|sizes]
  • ORC-1676 Use Hive 4.0.0 in benchmark
  • ORC-1681 Remove redundant import statement in tests to fix checkstyle failures
  • ORC-1699 Fix SparkBenchmark in Parquet format according to SPARK-40918
  • ORC-1704 Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
  • ORC-1707 Fix sun.util.calendar IllegalAccessException when SparkBenchmark runs on JDK17
  • ORC-1708 Support data/compress options in Hive benchmark

The build and dependency changes:

The documentation changes:

  • ORC-1668 Add merge command to Java tools documentation

ORC 1.8.7 Released

The ORC team is excited to announce the release of ORC v1.8.7.

The bug fixes:

  • ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create
  • ORC-1602: [C++] limit compression block size

The test changes:

  • ORC-1556 Add Rocky Linux 9 Docker Test
  • ORC-1557 Add GitHub Action CI for Docker Test
  • ORC-1560 Remove Java11 and clang variants from docker/os-list.txt in branch-1.8
  • ORC-1562 Bump guava to 33.0.0-jre
  • ORC-1578 Fix SparkBenchmark on sales data according to SPARK-40918
  • ORC-1621 Switch to oraclelinux9 from rocky9

The documentations:

  • ORC-1536 Remove hive-storage-api link from maven-javadoc-plugin
  • ORC-1563 Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs

ORC 1.9.3 Released

The ORC team is excited to announce the release of ORC v1.9.3.

The bug fixes:

  • ORC-634 Fix the json output for double NaN and infinite
  • ORC-1553 Reading information from Row group, where there are 0 records of SArg column
  • ORC-1563 Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
  • ORC-1578 Fix SparkBenchmark according to SPARK-40918
  • ORC-1586 Fix IllegalAccessError when SparkBenchmark runs on JDK17
  • ORC-1602 [C++] limit compression block size
  • ORC-1607 Fix testDoubleNaNAndInfinite to use TestFileDump.checkOutput
  • ORC-1609 Fix the compilation problem of TestJsonFileDump in branch 1.9

The test changes:

  • ORC-1556 Add Rocky Linux 9 Docker Test
  • ORC-1557 Add GitHub Action CI for Docker Test
  • ORC-1559 Remove Java11 and clang variants from docker/os-list.txt from branch-1.9

The tasks:

  • ORC-1532 Upgrade opencsv to 5.9
  • ORC-1536 Remove hive-storage-api link from maven-javadoc-plugin
  • ORC-1576 Upgrade spark.jackson.version to 2.15.2 in bench module
  • ORC-1591 Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
  • ORC-1592 Suppress KeyProvider missing log
  • ORC-1616 Upgrade aircompressor to 0.26
  • ORC-1618 Disable building tests for snappy

Documentation:

  • ORC-1535 Remove generated Java docs from source tree

ORC 2.0.0 Released

The ORC team is excited to announce the release of ORC v2.0.0.

New Feature and Notable Changes:

  • ORC-998: Refactor compression output buffer within OutStream for better portability
  • ORC-1088: Suport ZSTD_JNI and columnn compress to set compression level
  • ORC-1100: Support vcpkg
  • ORC-1251: Use Hadoop Vectored IO
  • ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
  • ORC-1440: Check for protobuf config based module
  • ORC-1463: Support brotli codec
  • ORC-1507: Use Zulu JDK distribution and switch from 21-ea to 21
  • ORC-1512: Drop Java 8/11 and make Java 17 by default
  • ORC-1531: Create orc-format module and repo
  • ORC-1545: Use orc-format 1.0.0-SNAPSHOT
  • ORC-1546: Use orc-format 1.0.0-alpha
  • ORC-1547: Spin-off ORC Format
  • ORC-1551: Use orc-format 1.0.0-beta
  • ORC-1572: Use Apache ORC Format 1.0.0
  • ORC-1585: [C++] Add orc-format_ep as a dependency of orc

Improvements:

  • ORC-1459: Mark DataBuffer::size() and DataBuffer::capacity() as const
  • ORC-1460: specification: Clarify how dictionary entries are sorted
  • ORC-1461: Mark Int128::getHighBits() and Int128::getLowBits() as const
  • ORC-1472: Replace deprecated method in TestMurmur3.java
  • ORC-1479: Enhance example usage message to use Uber jar
  • ORC-1481: [C++] Better error message when TZDB is unavailable
  • ORC-1504: Add lower bound check in get API for DynamicIntArray
  • ORC-1506: Replacing deprecated valueOf() with recommended forNumber()
  • ORC-1509: Auto grant contributor role to first-time contributors
  • ORC-1520: Remove JDK 8 settings from pom
  • ORC-1567: Add the -ignoreExtension configuration to the sizes and count commands of orc-tools
  • ORC-1570: Add supportVectoredIO API to HadoopShimsCurrent and use it
  • ORC-1571: Supports displaying raw data size in the meta command of orc-tools
  • ORC-1577: Use ZSTD as the default compression
  • ORC-1580: Change default DataBuffer constructor to use reserve instead of resize
  • ORC-1595: Add a short-cut to skip tiny inputs for ZstdCodec.compress
  • ORC-1596: Remove redundant Zstd.isError JNI usage
  • ORC-1597: Set bloom filter fpp to 1%
  • ORC-1600: Reduce getStaticMemoryManager sync block in OrcFile
  • ORC-1601: Reduce get HadoopShims sync block in HadoopShimsFactory
  • ORC-1610: Reduce the number of hash computation in CuckooSetBytes
  • ORC-1613: Zstd decompression supports direct buffer
  • ORC-1631: Supports summary output in sizes command
  • ORC-1637: [C++] Port conan recipe from upstream conan center
  • ORC-1638: Avoid System.exit(0) in count command
  • ORC-1639: [C++] Reduce unnecessary compiler flags in CMake
  • ORC-1641: Remove sourceFileExcludes from maven-javadoc-plugin
  • ORC-1642: Avoid System.exit(0) in scan command
  • ORC-1593: Set orc.compression.zstd.level to 3 by default

Bug Fixes:

  • ORC-634: Fix the json output for double NaN and infinite
  • ORC-1455: [C++] Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
  • ORC-1473: Zero-copy zeroCopyReadRanges and releaseBuffer bugs
  • ORC-1476: Maven build fail with unsupported platform: protoc-3.17.3-osx-aarch_64.exe
  • ORC-1480: [C++] Build failed when the BUILD_CPP_ENABLE_METRICS is ON
  • ORC-1500: [C++] The partition field does not support English special characters
  • ORC-1528: When using the orc.min.disk.seek.size configuration to read extremely large ORC files, a java.nio.BufferOverflowException may occur.
  • ORC-1553: Reading information from Row group, where there are 0 records of SArg column
  • ORC-1563: Fix orc.bloom.filter.fpp default value and orc.compress notes of Spark and Hive config docs
  • ORC-1568: Use readDiskRanges if orc.use.zerocopy is enabled
  • ORC-1575: Use ASF Archive URL instead Download URL
  • ORC-1578: Fix SparkBenchmark according to SPARK-40918
  • ORC-1588: Fix incorrect Decimal assert in LeafFilterFactory
  • ORC-1602: [C++] limit compression block size

Tasks:

  • ORC-1422: Setting version to 2.0.0-SNAPSHOT
  • ORC-1434: Remove org.apache.hadoop from dependabot.yml
  • ORC-1484: Use JIRA_ACCESS_TOKEN in merge_orc_pr.py
  • ORC-1485: Enable checkstyle checks for test classes
  • ORC-1486: Fix checkstyle violations for tests in orc-core module
  • ORC-1492: Fix checkstyle violations for tests in mapreduce, tools, bench modules
  • ORC-1496: Use iterator to suggest backporting branches
  • ORC-1515: Skip publishing orc-example module
  • ORC-1516: Fix minor typo in comments in IOUtils
  • ORC-1518: Remove findbugs folders
  • ORC-1529: Fix minor typos in pom.xml
  • ORC-1530: Rename variables in RecordReaderUtils.ChunkReader#create
  • ORC-1535: Remove generated Java docs from source tree
  • ORC-1536: Remove hive-storage-api link from maven-javadoc-plugin
  • ORC-1540: Remove MacOS 11 from GitHub Action CI
  • ORC-1542: Use Pattern Matching for instanceof (JEP-394)
  • ORC-1549: Update libhdfspp.tar.gz by adding #include <cstdint>
  • ORC-1569: Remove HadoopShimsPre2_3, HadoopShimsPre2_6, HadoopShimsPre2_7 classes
  • ORC-1579: Add ASF Generative Tooling Guidance to PR template
  • ORC-1591: Lower log level from INFO to DEBUG in *ReaderImpl/WriterImpl/PhysicalFsWriter
  • ORC-1592: Suppress KeyProvider missing log
  • ORC-1598: Close reader in orc-examples
  • ORC-1604: Deprecate non-utf8 bloom filter for Java writer

Tests:

  • ORC-1003: Recover java-examples-test
  • ORC-1409: Add stream order description in ORC spec.
  • ORC-1432: Add MacOS 13 GitHub Action Job
  • ORC-1474: Replaced deprecated getMinimum/Maximum in TestColumnStatistics
  • ORC-1475: [C++] ConvertColumnReader.TestConvertNumericToStringVariant fails when compiled with unsigned char
  • ORC-1477: Remove unused imports from Test classes
  • ORC-1478: Add Unit Test for org.apache.orc.impl.DynamicIntArray
  • ORC-1510: Fix package for TestOrcUtils and add more test cases
  • ORC-1541: Add Ubuntu 24.04 LTS Docker Test
  • ORC-1555: Simplify fedora37 docker image
  • ORC-1556: Add Rocky Linux 9 Docker Test
  • ORC-1557: Add GitHub Action CI for Docker Test
  • ORC-1558: Remove ubuntu22_jdk=21 and ubuntu22_jdk=21_cc=clang test combinations from docker/os-list.txt
  • ORC-1574: Update GitHub Action YAML files in branch-2.0
  • ORC-1586: Fix IllegalAccessError when SparkBenchmark runs on JDK17
  • ORC-1607: Fix testDoubleNaNAndInfinite to use TestFileDump.checkOutput
  • ORC-1614: Set ByteBuffer limit in TestBrotli test
  • ORC-1618: Disable building tests for snappy
  • ORC-1619: Add MacOS 14 to GitHub Action
  • ORC-1620: Add Apple Silicon Test Coverage
  • ORC-1621: Switch to oraclelinux9 from rocky9
  • ORC-1623: Use directOut.put(out) instead of directOut.put(out.array()) in TestZstd test
  • ORC-1630: Test using VectoredIO of hadoop to read ORC
  • ORC-1632: Add test for count command
  • ORC-1633: Add test for sizes command
  • ORC-1643: Add test for scan command

Build and dependency changes:

  • ORC-870: Unpin and upgrade jmh to 1.37
  • ORC-1423: Bump build-helper-maven-plugin to 3.4.0
  • ORC-1424: Bump maven-assembly-plugin to 3.6.0
  • ORC-1425: Bump checkstyle to 10.11.0
  • ORC-1427: Use Hadoop 3.3.5 in tools module
  • ORC-1429: Upgrade Maven to 3.8.8
  • ORC-1430: Use Hadoop 3.3.5 shaded clients
  • ORC-1431: Use parquet to 1.13.1 in bench module
  • ORC-1437: Bump checkstyle to 10.12.0
  • ORC-1438: Bump auto-service to 1.1.0
  • ORC-1439: Bump guava to 32.0.0-jre
  • ORC-1442: Update guava to 32.0.1
  • ORC-1445: Bump snappy-java to 1.1.10.1 in bench module
  • ORC-1448: Bump auto-service to 1.1.1
  • ORC-1456: Update Hadoop to 3.3.6
  • ORC-1466: Bump junit to 5.10.0
  • ORC-1467: Upgrade commons-lang3 to 3.13.0
  • ORC-1468: Bump opencsv to 5.8
  • ORC-1469: Update guava to 32.1.2
  • ORC-1470: Update maven-shade-plugin to 3.5.0
  • ORC-1493: Bump byte-buddy to 1.14.6
  • ORC-1502: Upgrade Maven to 3.9.4
  • ORC-1508: Upgrade slf4j to 2.0.9
  • ORC-1513: Upgrade snappy to 1.1.10.4
  • ORC-1514: Remove zookeeper runtime dependency
  • ORC-1517: Bump snappy-java to 1.1.10.5 in bench module
  • ORC-1521: Bump com.google.guava:guava to 32.1.3-jre
  • ORC-1522: Bump commons-cli:commons-cli to 1.6.0
  • ORC-1523: Bump maven-checkstyle-plugin to 3.3.1
  • ORC-1524: Bump maven-shade-plugin to 3.5.1
  • ORC-1526: Bump spotbugs-maven-plugin to 4.8.1.0
  • ORC-1527: Bump junit to 5.10.1
  • ORC-1533: Upgrade commons-lang3 to 3.14.0
  • ORC-1534: Upgrade build-helper-maven-plugin to 3.5.0
  • ORC-1537: Unpin and upgrade spotless to 2.41.0
  • ORC-1538: Unpin and upgrade maven-dependency-plugin to 3.6.1
  • ORC-1543: Bump spotless-maven-plugin to 2.41.1
  • ORC-1544: Unpin and upgrade protobuf-java to 3.25.1
  • ORC-1550: Upgrade Maven to 3.9.6
  • ORC-1562: Bump com.google.guava:guava to 33.0.0-jre
  • ORC-1565: Bump slf4j.version to 2.0.10
  • ORC-1566: Make Brotli dependency as optional
  • ORC-1576: Upgrade spark.jackson.version to 2.15.2 in bench module
  • ORC-1581: Bump slf4j.version to 2.0.11
  • ORC-1582: Bump protobuf-java to 3.25.2
  • ORC-1605: Upgrade brotli4j to 1.16.0
  • ORC-1616: Upgrade aircompressor to 0.26
  • ORC-1624: Upgrade Spark to 3.5.1
  • ORC-1626: Upgrade Mockito to 5.10 and byte-buddy to 1.14.11
  • ORC-1627: Unpin scala-library
  • ORC-1628: Bump protobuf-java to 3.25.3

Documentations:

  • ORC-994: Fix javadoc so that it doesn’t put files into the source tree
  • ORC-1471: Updated README.md to use maven 3.8.8
  • ORC-1491: Update Python documentation with PyArrow 13.0.0 and Dask 2023.8.1
  • ORC-1503: Update README.md to use maven 3.9.4
  • ORC-1552: Update README.md to use maven 3.9.6
  • ORC-1564: Add Java ORC configuration documentation
  • ORC-1584: Remove README about Proto subdirectory
  • ORC-1587: Fix usage command of SparkBenchmark document
  • ORC-1599: Add zstd compression level and windowlog in Java configuration documentation
  • ORC-1612: Document available encodings at orc.compress
  • ORC-1625: Switch to oraclelinux9 from rocky9 in README

Deshan Xiao added as committer

The ORC PMC is happy to add Deshan Xiao as an ORC committer for the work on ORC Java Brotli codec and vcpkg C++ library.

Thank you for your work on ORC, Deshan!

ORC 1.9.2 Released

The ORC team is excited to announce the release of ORC v1.9.2.

The bug fixes:

  • ORC-1475 [C++] Fix the failure of UT when char is unsigned
  • ORC-1480 [C++] Fix build break w/ BUILD_CPP_ENABLE_METRICS=ON
  • ORC-1482 Adaptation to read ORC files created by CUDF
  • ORC-1489 Assign a writer id to CUDF
  • ORC-1525 Fix bad read in RleDecoderV2::readByte

The test changes:

  • ORC-1431 Use parquet to 1.13.1 in bench module
  • ORC-1454 Update Spark to 3.4.1
  • ORC-1487 Enable checkstyle on src/test with checkstyle-suppressions.xml
  • ORC-1498 Add Debian 12 Docker test
  • ORC-1502 Upgrade Maven to 3.9.4
  • ORC-1505 Upgrade Spark to 3.5.0
  • ORC-1511 Bump Avro to 1.11.3 in bench module
  • ORC-1513 Upgrade snappy-java to 1.1.10.4 in bench module
  • ORC-1517 Bump snappy-java to 1.1.10.5 in bench module

The tasks:

  • ORC-1497 Bump maven-enforcer-plugin to 3.4.0
  • ORC-1499 Add MacOS 13 and 14 to building.md
  • ORC-1507 Use Zulu JDK distribution and switch from 21-ea to 21
  • ORC-1518 Remove findbugs folders

Documentation:

  • ORC-1503 Updated README.md with Maven version 3.9.4

ORC 1.8.6 Released

The ORC team is excited to announce the release of ORC v1.8.6.

The bug fixes:

  • ORC-1525 Fix bad read in RleDecoderV2::readByte

The test changes:

  • ORC-1432 Add MacOS 13 GitHub Action Job

Documentations:

  • ORC-1499 Add MacOS 13 and 14 to building.md

ORC 1.7.10 Released

The ORC team is excited to announce the release of ORC v1.7.10.

The bug fixes:

  • ORC-1304 [C++] Fix seeking over empty PRESENT stream
  • ORC-1413 Fix for ORC row level filter issue with ACID table

The task changes:

  • ORC-1482 Adaptation to read ORC files created by CUDF
  • ORC-1489 Assign a writer id to CUDF

ORC 1.8.5 Released

The ORC team is excited to announce the release of ORC v1.8.5.

The bug fixes:

  • ORC-1315: [C++] Byte to integer conversions fail on platforms with unsigned char type
  • ORC-1482: RecordReaderImpl.evaluatePredicateProto assumes floating point stats are always present

The tasks:

ORC 1.9.1 Released

The ORC team is excited to announce the release of ORC v1.9.1.

The bug fixes:

  • ORC-1455 Fix build failure on non-x86 with unused macro in CpuInfoUtil.cc
  • ORC-1457 Fix ambiguous overload of Type::createRowBatch
  • ORC-1462 Bump aircompressor to 0.25 to fix JDK-8081450

The test changes:

ORC 1.9.0 Released

The ORC team is excited to announce the release of ORC v1.9.0.

New Feature and Notable Changes:

  • ORC-961: Expose metrics of the reader
  • ORC-1167: Support orc.row.batch.size configuration
  • ORC-1252: Expose io metrics for write operation
  • ORC-1301: Enforce C++ 17
  • ORC-1310: allowlist Support for plugin filter
  • ORC-1356: Use Intel AVX-512 instructions to accelerate the Rle-bit-packing decode
  • ORC-1385: Support schema evolution from numeric to numeric
  • ORC-1386: Support schema evolution from primitive to string group/decimal/timestamp

Improvements:

  • ORC-827: Utilize Array copyOf
  • ORC-1170: Optimize the RowReader::seekToRow function
  • ORC-1232 Disable metrics collector by default
  • ORC-1278 Update Readme.md cmake to 3.12
  • ORC-1279 Update cmake version
  • ORC-1286 Replace DataBuffer with BlockBuffer in the BufferedOutputStream
  • ORC-1298 Support dedicated ColumnVectorBatch of numeric types
  • ORC-1302 Upgrade Github workflow to build on Windows
  • ORC-1306 Fixed indented code style for Java modules
  • ORC-1307 Add coding style enforcement
  • ORC-1314 Remove macros defined before C++11
  • ORC-1347 Use make_unique and make_shared when creating unique_ptr and shared_ptr
  • ORC-1348 TimezoneImpl constructor should pass std::vector<> & instead of std::vector<>
  • ORC-1349 Remove useless bufStream definition
  • ORC-1352 Remove ORC_[NOEXCEPT|NULLPTR|OVERRIDE|UNIQUE_PTR] macro usages
  • ORC-1355 Writer::addUserMetadata change parameter to reference
  • ORC-1373 Add log when DynamicByteArray length overflow
  • ORC-1401 Allow writing an intermediate footer
  • ORC-1421 Use PyArrow 12.0.0 in document

The bug fixes:

  • ORC-1225 Bump maven-assembly-plugin to 3.4.2
  • ORC-1266 DecimalColumnVector resets the isRepeating flag in the nextVector method
  • ORC-1273 Bump opencsv to 5.7.0
  • ORC-1297 Bump opencsv to 5.7.1
  • ORC-1304 throw ParseError when using SearchArgument with nested struct
  • ORC-1315 Byte to integer conversions fail on platforms with unsigned char type
  • ORC-1320 Fix build break of C++ code on docker images
  • ORC-1363 Upgrade zookeeper to 3.8.1
  • ORC-1368 Bump commons-csv to 1.10.0
  • ORC-1398 Bump aircompressor to 0.24
  • ORC-1399 Fix boolean type with useTightNumericVector enabled
  • ORC-1433 Fix comment in the Vector.hh
  • ORC-1447 Fix a bug in CpuInfoUtil.cc to support ARM platform
  • ORC-1449 Add -Wno-unused-macros for Clang 14.0
  • ORC-1450 Stop enforcing override keyword
  • ORC-1453 Fix fall-through warning cases

The test changes:

  • ORC-1231 Update supported OS list in building.md
  • ORC-1233 Bump junit to 5.9.0
  • ORC-1234 Upgrade objenesis to 3.2 in Spark benchmark
  • ORC-1235 Bump avro to 1.11.1
  • ORC-1240 Update site README to use apache/orc-dev
  • ORC-1241 Use apache/orc-dev DockerHub repository in Docker tests
  • ORC-1250 Bump mockito to 4.7.0
  • ORC-1254 Add spotbugs check
  • ORC-1258 Bump byte-buddy to 1.12.14
  • ORC-1262 Bump maven-checkstyle-plugin to 3.2.0
  • ORC-1265 Upgrade spotbugs to 4.7.2
  • ORC-1267 Bump mockito to 4.8.0
  • ORC-1271 Bump spotbugs-maven-plugin to 4.7.2.0
  • ORC-1272 Bump byte-buddy to 1.12.16
  • ORC-1300 Update Spark to 3.3.1 and its dependencies
  • ORC-1303 Upgrade GoogleTest to 1.12.1
  • ORC-1318 Upgrade mockito.version to 4.9.0
  • ORC-1319 Upgrade byte-buddy to 1.12.19
  • ORC-1321 Bump checkstyle to 10.5.0
  • ORC-1322 Upgrade centos7 docker image to use gcc9
  • ORC-1324 Use Java 19 instead of 18 in GHA
  • ORC-1333 Bump mockito to 4.10.0
  • ORC-1341 Bump mockito to 4.11.0
  • ORC-1353 Bump byte-buddy to 1.12.21
  • ORC-1359 Bump byte-buddy to 1.12.22
  • ORC-1366 Bump checkstyle to 10.7.0
  • ORC-1367 Bump maven-enforcer-plugin to 3.2.1
  • ORC-1369 Bump byte-buddy to 1.12.23
  • ORC-1370 Bump snappy-java to 1.1.9.1
  • ORC-1374 Update Spark to 3.3.2
  • ORC-1379 Upgrade spotbugs to 4.7.3.2
  • ORC-1380 Upgrade checkstyle to 10.8.0
  • ORC-1394 Bump maven-assembly-plugin to 3.5.0
  • ORC-1397 Bump checkstyle to 10.9.2
  • ORC-1405 Bump spotbugs-maven-plugin to 4.7.3.4
  • ORC-1406 Bump maven-enforcer-plugin to 3.3.0
  • ORC-1408 Add testVectorBatchHasNull test case and comment
  • ORC-1415 Add Java 20 to GitHub Action CI
  • ORC-1417 Bump checkstyle to 10.10.0
  • ORC-1418 Bump junit to 5.9.3
  • ORC-1426 Use Java 21-ea instead of 20 in GitHub Action
  • ORC-1435 Bump maven-checkstyle-plugin to 3.3.0
  • ORC-1436 Bump snappy-java to 1.1.10.0
  • ORC-1452 Use the latest OS versions in variant tests

The tasks:

  • ORC-1164 Setting version to 1.9.0-SNAPSHOT
  • ORC-1218 Bump apache pom to 27
  • ORC-1219 Remove redundant toString
  • ORC-1237 Remove a wrong image link to article-footer.png
  • ORC-1239 Upgrade maven-shade-plugin to 3.3.0
  • ORC-1256 Publish test-jar to maven central
  • ORC-1259 Bump slf4j to 2.0.0
  • ORC-1269 Remove FindBugs
  • ORC-1270 Move opencsv dependency to the tools module.
  • ORC-1274 Add a checkstyle rule to ban starting LAND and LOR
  • ORC-1275 Bump maven-jar-plugin to 3.3.0
  • ORC-1276 Bump slf4j to 2.0.1
  • ORC-1277 Bump maven-shade-plugin to 3.4.0
  • ORC-1284 Add permissions to GitHub Action labeler
  • ORC-1296 Bump reproducible-build-maven-plugin to 0.16
  • ORC-1311 Bump maven-shade-plugin to 3.4.1
  • ORC-1316 Bump slf4j.version to 2.0.4
  • ORC-1334 Bump slf4j.version to 2.0.6
  • ORC-1335 Bump netty-all to 4.1.86.Final
  • ORC-1351 Update PR Labeler definition
  • ORC-1358 Use spotless to format pom files
  • ORC-1371 Remove unsupported SLF4J bindings from classpath
  • ORC-1372 Bump zstd to v1.5.4
  • ORC-1375 Cancel old running ci tasks when a pr has a new commit
  • ORC-1377 Enforce override keyword
  • ORC-1383 Upgrade aircompressor to 0.22
  • ORC-1395 Enforce license check
  • ORC-1396 Bump slf4j to 2.0.7
  • ORC-1410 Bump zstd to v1.5.5
  • ORC-1411 Remove Ubuntu18.04 from docker-based tests
  • ORC-1419 Bump protobuf-java to 3.22.3
  • ORC-1428 Setup GitHub Action CI on branch-1.9
  • ORC-1443 Enforce Java version
  • ORC-1444 Enforce JDK Bytecode version
  • ORC-1446 Publish snapshot from branch-1.9

ORC 1.8.4 Released

The ORC team is excited to announce the release of ORC v1.8.4.

The bug fixes:

  • ORC-1304: [C++] Fix seeking over empty PRESENT stream
  • ORC-1400: Use Hadoop 3.3.5 on Java 17+ and benchmark
  • ORC-1413: Fix for ORC row level filter issue with ACID table

The test changes:

  • ORC-1404 Bump parquet to 1.13.0
  • ORC-1414 Upgrade java bench module to spark3.4
  • ORC-1416 Upgrade Jackson dependency to 2.14.2 in bench module
  • ORC-1420 Pin net.bytebuddy package to 1.12.x

The tasks:

  • ORC-1395 Enforce license check via github action

ORC 1.7.9 Released

The ORC team is excited to announce the release of ORC v1.7.9.

The bug fixes:

  • ORC-1382 Fix secondary config names org.sarg.* to orc.sarg.*
  • ORC-1395 Enforce license check
  • ORC-1407 Upgrade cyclonedx-maven-plugin to 2.7.6

The test changes:

ORC 1.8.3 Released

The ORC team is excited to announce the release of ORC v1.8.3.

The bug fixes:

  • ORC-1357: Handle missing compression block size
  • ORC-1382: Fix secondary config names org.sarg.* to orc.sarg.*
  • ORC-1384: Fix ArrayIndexOutOfBoundsException when reading dictionary stream bigger then dictionary
  • ORC-1393: Add reset(DiskRangeList input, long length) to InStream impl class

The test changes:

The tasks:

  • ORC-1358 Use spotless to format pom files

ORC 1.7.8 Released

The ORC team is excited to announce the release of ORC v1.7.8.

The improvements:

  • ORC-1342 Publish SBOM artifacts
  • ORC-1344 Skip SBOM generation during CMake
  • ORC-1345 Use makeBom and skip snapshot check in GitHub Action publish_snapshot job

The bug fixes:

  • ORC-1332 Avoid NegativeArraySizeException when using searchArgument
  • ORC-1343 Ignore orc.create.index

The test changes:

  • ORC-1323 Make docker/reinit.sh support target OS arguments

ORC 1.8.2 Released

The ORC team is excited to announce the release of ORC v1.8.2.

The bug fixes:

  • ORC-1332 Avoid NegativeArraySizeException when using searchArgument
  • ORC-1343 Disable ENABLE_INDEXES

The improvements:

  • ORC-1327 Exclude the proto files from the nohive jar
  • ORC-1328 Exclude the proto files from the shaded protobuf jar
  • ORC-1329 Add OrcConf.getStringAsList method
  • ORC-1338 Set bloom filter fpp to 1%
  • ORC-1342 Publish SBOM artifacts
  • ORC-1344 Skip SBOM generation during CMake
  • ORC-1345 Use makeBom and skip snapshot check in GitHub Action publish_snapshot job

The test changes:

  • ORC-1323 Make docker/reinit.sh support target OS arguments
  • ORC-1330 Add TestOrcConf
  • ORC-1339 Remove orc.sarg.to.filter default value assumption in test cases
  • ORC-1350 Upgrade setup-java to v3

The tasks:

  • ORC-1331 Improve PyArrow page
  • ORC-1336 Protect .asf.yaml, api, ORC-Deep-Dive-2020.pptx files in website
  • ORC-1337 Make .htaccess up to date
  • MINOR: Add .swp to .gitignore
  • MINOR: Link to Apache ORC orc_proto instead of Hive one
  • MINOR: Update DOAP file

ORC 1.8.1 Released

The ORC team is excited to announce the release of ORC v1.8.1.

The bug fixes:

  • ORC-1283 ENABLE_INDEXES does not take effect
  • ORC-1288 Invalid memory freeing with ZLIB compression
  • ORC-1291 NullPointerException in TypeDescription

The improvements:

  • ORC-1268 Set CMP0135 policy for CMake 3.24+
  • ORC-1282 Add slf4j impl to avoid warning message
  • ORC-1294 Build error when skip tests build
  • ORC-1295 Improve ORC Spec example (Decoding RLE v2 direct)
  • ORC-1299 benchmark can’t work for data resource 403
  • ORC-1305 Add more orc java examples
  • ORC-1308 Avoid star import

The test changes:

  • ORC-1290 Bump spotbugs to 4.7.3
  • ORC-1300 Update Spark to 3.3.1 and its dependencies

The tasks:

  • ORC-1269 Remove FindBugs
  • ORC-1270 Move opencsv dependency to the tools module
  • ORC-1292 Add paragraph in java documentation

ORC 1.7.7 Released

The ORC team is excited to announce the release of ORC v1.7.7.

The bug fixes:

  • ORC-1283 ENABLE_INDEXES does not take effect

The test changes:

The tasks:

  • ORC-1256 Publish tests jar to maven central
  • ORC-1268 Set CMP0135 policy for CMake 3.24+

William Hyun elected as Chair

The Apache ORC Project Management Committee (PMC) elected William Hyun as the Chair on September 12nd and Apache Software Foundation (ASF) Board approved it and appointed him as Vice President for Apache ORC on September 21st.

William has been leading many areas. He helped Apache ORC PMC add a new member, served as a release manager for 1.7.4/1.7.5/1.7.6/1.8.0, made an important contribution on inter-ASF project collaboration and ORC integration across several projects to help all ORC users, improved ORC infra like ASF ORC DockerHub Setup, docker tests, and GitHub Action, and revamped user experiences through updating websites and Homebrew.

ORC 1.8.0 Released

The ORC team is excited to announce the release of ORC v1.8.0.

New Feature and Notable Changes:

  • ORC-450 Support selecting list indices without materializing list items
  • ORC-824 Add column statistics for List and Map
  • ORC-1004 Java ORC writer supports the selection vector
  • ORC-1075 Support reading ORC files with no column statistics
  • ORC-1125 Support decoding decimals in RLE
  • ORC-1136 Optimize reads by combining multiple reads without significant separation into a single read
  • ORC-1138 Seek vs Read Optimization
  • ORC-1172 Add row count limit config for one stripe
  • ORC-1212 Upgrade protobuf-java to 3.17.3
  • ORC-1220 Set min.hadoop.version to 2.7.3
  • ORC-1248 Redefine Hadoop dependency for Apache ORC 1.8.0
  • ORC-1256 Publish test-jar to maven central
  • ORC-1260 Publish shaded-protobuf classifier artifacts

Improvements:

  • ORC-825 Use Empty Array For Collections toArray
  • ORC-826 Do Not Use Collection Contains/Get
  • ORC-828 Improve Fetch Data Set Process
  • ORC-829 Optimize Serialization percentileBits
  • ORC-831 Do Not Copy String When Flushing Dictionary
  • ORC-833 RunLengthIntegerReaderV2 Calculate Batch Size Once
  • ORC-834 Do Not Convert to String in DecimalFromTimestampTreeReader
  • ORC-835 Cache TRUE/FALSE Bytes in StringGroupFromBooleanTreeReader
  • ORC-836 StringGroupFromDoubleTreeReader Use Double toString
  • ORC-837 Reuse HiveDecimalWritable in ConvertTreeReaderFactory
  • ORC-838 Simplify compareTo/equals/putBuffer of ByteBufferAllocatorPool
  • ORC-840 Remove Superfluous Array Fill in RecordReaderImpl
  • ORC-841 Remove Superfluous Array Fill in StringHashTableDictionary
  • ORC-842 Remove newKey from StringHashTableDictionary
  • ORC-844 Improve hashCode Methods
  • ORC-847 Do Not Create Empty Array in StringGroupFromBinaryTreeReader
  • ORC-852 Allow DynamicByteArray to Return a ByteBuffer
  • ORC-853 Optimize writeDouble Implementation
  • ORC-855 Remove Unused isRepeating from RunLengthIntegerReaderV2
  • ORC-865 Bump opencsv from 3.9 to 5.5.1
  • ORC-883 Dependency Audit and QA
  • ORC-897 optimization loop termination condition in readerIsCompatible method
  • ORC-935 Bump commons-csv from 1.8 to 1.9.0
  • ORC-937 Replace deprecated method
  • ORC-958 Convert command support overwrite option
  • ORC-969 Evaluate SearchArguments using file and stripe level stats
  • ORC-975 Avoid double counting closestFixedBits in percentileBits method
  • ORC-982 Extract checkstyle to a single file, help newcomers check code style
  • ORC-988 Bump opencsv from 5.5.1 to 5.5.2
  • ORC-992 Reached max repeat length, we can directly decide to use DELTA encoding
  • ORC-1005 Make that the java and C++ implementations of determineEncoding in RunLengthIntegerWriterV2 are consistent.
  • ORC-1007 Fix a warning from the shade plugin
  • ORC-1013 Renaming a parameter in constructors of TreeWriter’s derived classes
  • ORC-1014 Add details when we get IOExceptions from file system
  • ORC-1020 Improve orc::RleDecoderV2::nextDirect
  • ORC-1027 Filter processing to allow filter injections that cannot be represented via SArgs
  • ORC-1047 Handle quoted field names during string schema parsing
  • ORC-1077 Remove commons-codec dependency and use java.util.Base64
  • ORC-1099 Extend ReadIntent to support MAP and UNION type
  • ORC-1101 Improve malformed STRUCT handling
  • ORC-1122 Add buffer to decode the whole run in RleDecoderV2
  • ORC-1137 Improve float/double conversion in DoubleColumnReader::next()
  • ORC-1149 Bump slf4j.version to 1.7.36
  • ORC-1150 Improve RowReaderImpl::computeBatchSize()
  • ORC-1152 Support encoding short decimals in RLEv2
  • ORC-1156 Update opencsv to 5.6
  • ORC-1163 Bump zookeeper from 3.7.0 to 3.8.0
  • ORC-1169 Use Hadoop 3.3.2 on Java 17+
  • ORC-1178 Use hadoop 3.3.3 on Java 17+

ORC 1.7.6 Released

The ORC team is excited to announce the release of ORC v1.7.6.

The bug fixes:

  • ORC-1204 ORC MapReduce writer to flush when long arrays
  • ORC-1205 nextVector should invoke ensureSize when reusing vectors
  • ORC-1215 Remove a wrong NotNull annotation on value of setAttribute
  • ORC-1222 Upgrade tools.hadoop.version to 2.10.2
  • ORC-1227 Use Constructor.newInstance instead of Class.newInstance
  • ORC-1228 Fix setAttribute to handle null value

The test changes:

  • ORC-932 Bump byte-buddy from 1.10.19 to 1.11.12 (#842)
  • ORC-1169 Use Hadoop to 3.3.2 on Java 17+ (#1113)
  • ORC-1178 Use Hadoop 3.3.3 on Java 17+ (#1129)
  • ORC-1193 Bump parquet.version to 1.12.3
  • ORC-1207 Upgrade Spark to 3.3.0
  • ORC-1210 Upgrade maven to 3.8.6
  • ORC-1234 Upgrade objenesis to 3.2 in Spark benchmark
  • ORC-1235 Bump avro.version to 1.11.1
  • ORC-1240 Update site README to use apache/orc-dev DockerHub image
  • ORC-1241 Use apache/orc-dev DockerHub repository in Docker tests
  • ORC-1244 Upgrade byte-buddy to 1.12.13 in branch-1.7
  • ORC-1245 Use Hadoop 3.3.4 on Java 17+ and benchmark

The documentation changes:

  • MINOR: Update DOAP with new releases (#1127)
  • ORC-900 Update doap_orc.rdf for Apache Projects page (#806)
  • ORC-1231 Update supported OS list in building.md
  • ORC-1237 Remove a wrong image link to article-footer.png
  • ORC-1238 Update DOAP with 1.7.5

The tasks:

  • ORC-1185 Add merge_orc_pr.py
  • ORC-1187 Use main instead of master in merge_orc_pr.py
  • ORC-1213 Use https in ThirdpartyToolchain.cmake
  • ORC-1226 Add a deprecation warning for Hadoop 2.7.2 and below

ORC 1.7.5 Released

The ORC team is excited to announce the release of ORC v1.7.5.

The bug fixes:

  • ORC-1151 Fix ColumnWriter for non-UTC Timestamp columns
  • ORC-1160 Fix seekToRow can’t seek within selected row group
  • ORC-1133 Fix csv-import tool options
  • ORC-1183 Upgrade gson to 2.9.0
  • ORC-1186 Limit family in aarch64 profile
  • ORC-1188 Fix ORC_PREFER_STATIC_ZLIB

The improvements:

  • ORC-1198 Add a new PhysicalFsWriter constructor with FSDataOutputStream parameter
  • ORC-1199 Use Google mirror of Maven Central as the primary

The test changes:

  • ORC-1155 Add Ubuntu 22.04 to docker tests
  • ORC-1154 Bump hive.version from 3.1.2 to 3.1.3
  • ORC-1161 Add MacOS 12 and remove MacOS 10
  • ORC-1174 Add Ubuntu 22.04 to GitHub Action
  • ORC-1182 Use slf4j-simple instead of deprecated slf4j-log4j12
  • ORC-1184 Use Hadoop 3.3.3 in benchmark module
  • ORC-1189 Update README.md and help command message in benchmark module and .gitignore
  • ORC-1190 Fix ORCWriterBenchMark dumpDir initialization
  • ORC-1191 Updated TLC Taxi Benchmark Dataset
  • ORC-1192 Use orc.zstd instead of orc.none
  • ORC-1196 Add Spark benchmark integration tests to GHA
  • ORC-1201 Remove Debian 9 from Docker Tests

The documentation changes:

  • Add ASF verification instruction link

Pavan Lanka added as committer

The ORC PMC is happy to add Pavan Lanka as an ORC committer for the work on introducing LazyIO of non-filter columns and optimizing stripe index and data reads.

Thank you for your work on ORC, Pavan!

ORC adds Yiqun Zhang to PMC

The Apache ORC Project Management Committee (PMC) is happy to announce that Yiqun Zhang has joined us as a new member of the PMC. Yiqun has been showing consistent contributions as a committer, and participated in both major and maintenance releases by actively helping the release managers with testing the release candidates.

Please welcome Yiqun to the ORC PMC!

ORC 1.7.4 Released

The ORC team is excited to announce the release of ORC v1.7.4.

The bug fixes:

  • ORC-1120 Remove C++ library limitation about write version
  • ORC-1121 Fix column conversion check bug which causes column filters don’t work
  • ORC-1127 Add missing version of UNSTABLE-PRE-2.0
  • ORC-1146 Float category missing check if the statistic sum is a finite value
  • ORC-1147 Use isNaN instead of isFinite to determine the contain NaN values

The improvements:

  • ORC-236 Support UNION type in Java Convert tool
  • ORC-1116 Fix csv-import tool when exporting long bytes
  • ORC-1123 Add estimationMemory method for writer

The test changes:

  • ORC-1145 Add Java 18 to GitHub Action CI
  • ORC-1118 Support Java 17 and ARM64 docker tests

The documentation changes:

  • ORC-1117 Add Dask page at Using in Python section
  • ORC-1119 Remove timestamp from ORC API docs

ORC 1.6.14 Released

The ORC team is excited to announce the release of ORC v1.6.14.

The bug fixes:

  • ORC-1121 Fix column coversion check bug which causes column filters don’t work
  • ORC-1146 Float category missing check if the statistic sum is a finite value
  • ORC-1147 Use isNaN instead of isFinite to determine the contain NaN values

The ‘tests’ fixes:

  • ORC-1016 Use openssl@1.1 in GitHub Action MacOS CIs
  • ORC-1113 Remove CentOS 8 from docker-based tests

Quanlong Huang added as committer

The ORC PMC is happy to add Quanlong Huang as an ORC committer for the work on ORC C++ library and Apache Impala integration.

Thank you for your work on ORC, Quanlong!

ORC 1.7.3 Released

The ORC team is excited to announce the release of ORC v1.7.3.

The ‘bug’ fixes:

  • ORC-1060 Reduce memory usage when vectorized reading dictionary string encoding columns
  • ORC-1065 Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail
  • ORC-1067 [C++] Upgrade ZSTD to 1.5.1
  • ORC-1078 Row group end offset doesn’t accommodate all the blocks
  • ORC-1081 Fix heap-use-after-free in SearchArgumentBuilderImpl::end()
  • ORC-1087 [C++] Handle unloaded seek positions when seeking in an uncompressed chunk
  • ORC-1092 [C++] Upgrade LZ4 to version 1.9.3
  • ORC-1102 [C++] Upgrade ZSTD to 1.5.2

The ‘tools’ improvements:

  • ORC-1055 [C++] Add the timezone option for the csv-import tool
  • ORC-1082 Improve FileDump and JsonFileDump to be robust on missing column statistics
  • ORC-1092 [C++] Support specifying type ids or column names in cpp tools

The ‘documentation’ patches:

  • ORC-1050 Update ORC site README.md and release process page
  • ORC-1069 Update building.md
  • ORC-1071 Update ‘adopters’ page
  • ORC-1091 Add ‘Tests’ section at ORC ‘develop’ page
  • ORC-1112 Add ‘Using with Python’ web page
  • ORC-1114 Update ‘Using with Python’ page with ‘PyArrow’ 7.0.0

The ‘task’ patches:

  • ORC-1070 Upgrade site docker image to use Ubuntu 20.04
  • ORC-1072 Add ‘Stale’ GitHub Action job
  • ORC-1094 Enable GitHub issues tab
  • ORC-1095 Deprecate ‘UnknownFormatException’

The ‘tests’ fixes:

  • ORC-875 Add GitHub Action job for Windows Server 2019
  • ORC-878 Bump auto-service from 1.0-rc7 to 1.0
  • ORC-881 Bump slf4j.version from 1.7.30 to 1.7.32
  • ORC-989 Bump checkstyle from 8.45.1 to 9.0
  • ORC-993 Bump junit.version from 5.7.2 to 5.8.0
  • ORC-1018 Bump checkstyle from 9.0 to 9.0.1
  • ORC-1033 Bump junit.version from 5.8.0 to 5.8.1
  • ORC-1044 Bump reproducible-build-maven-plugin to 0.14
  • ORC-1048 Bump checkstyle from 9.0.1 to 9.1
  • ORC-1052 Bump avro.version from 1.10.2 to 1.11.0
  • ORC-1057 Bump junit.version from 5.8.1 to 5.8.2
  • ORC-1061 Bump checkstyle from 9.1 to 9.2
  • ORC-1066 Bump guava from 30.1.1-jre to 31.0.1-jre
  • ORC-1068 [C++] Stabilize HAS_POST_2038 test
  • ORC-1073 Remove appveyor.yml
  • ORC-1076 Remove Travis PR Builder Link from README.md
  • ORC-1079 Add Linux Clang 11 GitHub Action test coverage
  • ORC-1080 Remove .travis.yml
  • ORC-1084 Bump checkstyle from 9.2 to 9.2.1
  • ORC-1086 Bump reproducible-build-maven-plugin from 0.14 to 0.15
  • ORC-1090 Disable Clang 13.0-specific compilation warnings
  • ORC-1093 Remove debian8 specific code in run-one.sh
  • ORC-1096 Bump slf4j.version to 1.7.33
  • ORC-1103 Use Maven 3.8.4
  • ORC-1104 Use Spark 3.2.1 in benchmark
  • ORC-1105 fetch-data.sh should use zsh instead of bash
  • ORC-1106 Use transitive commons-lang3 dependency in bench module
  • ORC-1107 Fix NPE at benchmark data schema loading
  • ORC-1108 Use RawLocalFileSystem to skip checksum files during benchmark data generation
  • ORC-1109 Use zstd instead of none in the default compress option
  • ORC-1111 Bump build-helper-maven-plugin from 3.2.0 to 3.3.0
  • ORC-1113 Remove CentOS 8 from docker-based tests
  • ORC-1115 Suppress Illegal reflective access warnings on Java9+ Tests

ORC 1.6.13 Released

The ORC team is excited to announce the release of ORC v1.6.13.

The bug fixes:

  • ORC-1065 Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail
  • ORC-1078 Row group end offset doesn’t accommodate all the blocks

The ‘tests’ fixes:

  • ORC-875 Add GitHub Action job for Windows Server 2019
  • ORC-941 Move MacOS 10.15/11.5 test from Travis to GitHub Action
  • ORC-1079 Add Linux Clang 11 GitHub Action test coverage
  • ORC-1080 Remove .travis.yml

ORC 1.7.2 Released

The ORC team is excited to announce the release of ORC v1.7.2.

The bug fixes:

  • ORC-492 Avoid potential ArrayIndexOutOfBoundsException when getting WriterVersionn
  • ORC-1053 Fix time zone offset precision when convert tool converts LocalDateTime to Timestamp is not consistent with the internal default precision of ORC
  • ORC-1041 Use memcpy during LZO decompression
  • ORC-1059 Align findColumns behaviour between 1.6 and 1.7 release

The ‘tools’ improvements:

  • ORC-1012 Support specifying columns in orc-scan
  • ORC-1017 Add sizes tool to determine and display the sizes of each column in a set of files
  • ORC-1023 Support writing bloom filters in ConvertTool

The ‘tests’ fixes:

  • ORC-915 Remove io.netty.netty from Spark benchmark
  • ORC-938 Bump netty-all from 4.1.42.Final to 4.1.66.Final
  • ORC-948 Add hive benchmark integration tests
  • ORC-957 Bump netty-all from 4.1.66.Final to 4.1.67.Final
  • ORC-1021 Add -fno-omit-frame-pointer in DEBUG and RELWITHDEBINFO builds
  • ORC-1051 Update benchmark dependencies

ORC 1.7.1 Released

The ORC team is excited to announce the release of ORC v1.7.1.

The bug fixes of ORC 1.7:

  • ORC-879 Flaky Test for TestJsonReader
  • ORC-1000 Use Java 17 in GitHub Action
  • ORC-1002 Add java17 profile for Java17 unit testing
  • ORC-1008 Overflow detection code is incorrect in IntegerColumnStatisticsImpl
  • ORC-1009 [C++] Missing string include causes build failure with MSVC++
  • ORC-1010 Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
  • ORC-1011 Activate java17 profile automatically
  • ORC-1015 Update OrcFile.WriterOptions::memory javadoc
  • ORC-1016 Use openssl@1.1 in GitHub Action MacOS CIs
  • ORC-1024 BloomFilter hash computation is inconsistent between Java and C++ clients
  • ORC-1029 Could not load ‘org.apache.orc.DataMask.Provider’ when using orc encryption and spark executor with multi cores!
  • ORC-1030 Java Tools Recover File command does not accurately find OrcFile.MAGIC
  • ORC-1032 Bump parquet.version from 1.12.0 to 1.12.2
  • ORC-1034 The search byte array algorithm is incorrectly implemented in FileDump.java
  • ORC-1035 backupDataPath may be incorrect in recoverFile
  • ORC-1036 Due to tzdata upgrade, the fixed download links in CI are often not working
  • ORC-1037 Bump spark.version from 3.1.2 to 3.2.0
  • ORC-1039 Make FileDump.recoverFile handle side files only if they exist
  • ORC-1040 Add Debian 11 docker test
  • ORC-1042 Ignore unused-function C++ compile warning on CentOS 7
  • ORC-1043 Fix C++ conversion compilation error in CentOS 7

ORC 1.6.12 Released

The ORC team is excited to announce the release of ORC v1.6.12.

The bug fixes of ORC 1.6.12:

  • ORC-1008 Overflow detection code is incorrect in IntegerColumnStatisticsImpl
  • ORC-1010 Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
  • ORC-1024 BloomFilter hash computation is inconsistent between Java and C++ clients
  • ORC-1029 Could not load ‘org.apache.orc.DataMask.Provider’ when using orc encryption and spark executor with multi cores!
  • ORC-1034 The search byte array algorithm is incorrectly implemented in FileDump.java
  • ORC-1035 backupDataPath may be incorrect in recoverFile
  • ORC-1036 Due to tzdata upgrade, the fixed download links in CI are often not working
  • ORC-1040 Add Debian 11 docker test
  • ORC-1042 Ignore unused-function C++ compile warning on CentOS 7
  • ORC-1043 Fix C++ conversion compilation error in CentOS 7

ORC adds William Hyun to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that William Hyun has joined the PMC. William has led several areas including Java 17/Apple Silicon support, Java Tools improvement, Code quality improvement using static analysis, CI/Docker test coverage improvement, and Apache ORC 1.7 migration support at Apache Arrow/Druid/Iceberg.

Please join me in welcoming William to the ORC PMC!

ORC 1.7.0 Released

The ORC team is excited to announce the release of ORC v1.7.0.

The new features of ORC 1.7:

  • ORC-377 Support Snappy compression in C++ Writer
  • ORC-577 Support row-level filtering
  • ORC-716 Build and test on Java 17-EA
  • ORC-731 Improve Java Tools
  • ORC-742 LazyIO of non-filter columns
  • ORC-751 Implement Predicate Pushdown in C++ Reader
  • ORC-755 Introduce OrcFilterContext
  • ORC-757 Add Hashtable implementation for dictionary
  • ORC-780 Support LZ4 Compression in C++ Writer
  • ORC-797 Allow writers to get the stripe information
  • ORC-818 Build and test in Apple Silicon
  • ORC-861 Bump CMake minimum requirement to 2.8.12
  • ORC-867 Upgrade hive-storage-api to 2.8.1
  • ORC-984 Save the software version that wrote each ORC file

Known issues:

ORC 1.6.11 Released

The ORC team is excited to announce the release of ORC v1.6.11.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.13 Released

The ORC team is excited to announce the release of ORC v1.5.13.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.10 Released

The ORC team is excited to announce the release of ORC v1.6.10..

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.9 Released

The ORC team is excited to announce the release of ORC v1.6.9.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.8 Released

The ORC team is excited to announce the release of ORC v1.6.8.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

William Hyun added as committer

The ORC PMC is happy to add William Hyun as an ORC committer for the work on improving ORC’s code quality and integration to Apache Spark and Apache Iceberg.

Thank you for your work on ORC, William!

ORC adds Panagiotis Garefalakis to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Panagiotis Garefalakis has joined the PMC. Panagiotis has radically improved the integration between Hive and ORC.

Please join me in welcoming Panagiotis to the ORC PMC!

ORC 1.6.7 Released

The ORC team is excited to announce the release of ORC v1.6.7.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.6 Released

The ORC team is excited to announce the release of ORC v1.6.6.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.6.5 Released

The ORC team is excited to announce the release of ORC v1.6.5.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.12 Released

The ORC team is excited to announce the release of ORC v1.5.12.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.4 Released

The ORC team is excited to announce the release of ORC v1.6.4.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

  • ORC-667 Positional mapping for nested struct types should not applied by default

ORC 1.5.11 Released

The ORC team is excited to announce the release of ORC v1.5.11.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-667 Positional mapping for nested struct types should not applied by default

ORC 1.5.10 Released

The ORC team is excited to announce the release of ORC v1.5.10.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.3 Released

The ORC team is excited to announce the release of ORC v1.6.3.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.9 Released

The ORC team is excited to announce the release of ORC v1.5.9.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC adds Dongjoon Hyun to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Dongjoon Hyun has joined the PMC. Dongjoon has radically improved the integration between Spark and ORC.

Please join me in welcoming Dongjoon to the ORC PMC!

ORC 1.4.5 Released

The ORC team is excited to announce the release of ORC v1.4.5.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

ORC 1.6.2 Released

The ORC team is excited to announce the release of ORC v1.6.2.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

ORC 1.5.8 Released

The ORC team is excited to announce the release of ORC v1.5.8.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

ORC 1.6.1 Released

The ORC team is excited to announce the release of ORC v1.6.1.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

  • ORC-571 ArrayIndexOutOfBoundsException in StripePlanner.readRowIndex

ORC 1.5.7 Released

The ORC team is excited to announce the release of ORC v1.5.7.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.6.0 Released

The ORC team is excited to announce the release of ORC v1.6.0.

The new features of ORC 1.6:

  • ORC-14 Add column encryption.
  • ORC-189 Add timestamp with local timezone
  • ORC-203 Trim minimum and maximum string values
  • ORC-363 Add zstd support in Java
  • ORC-397 Support selectively disabling dictionaries
  • ORC-522 Add type annotations

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-555 IllegalArgumentException when reading files with large footers

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

  • ORC-571 ArrayIndexOutOfBoundsException in StripePlanner.readRowIndex

ORC 1.5.6 Released

The ORC team is excited to announce the release of ORC v1.5.6.

Users are advised that as of ORC 1.5.6, ORCReaders that aren’t used to create RecordReaders should be closed.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-525 Users must close ORC Readers after use

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

Renat Vailiullin and Sandeep More added as committers

The ORC PMC is happy to add Renat Vailiullin and Sandeep More as an ORC committers. Renat has done a lot of work to improve the Windows builds and Sandeep has been working on the data masking and statistics.

Thank you for your work on ORC, Renat and Sandeep!

ORC 1.5.5 Released

The ORC team is excited to announce the release of ORC v1.5.5.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC adds Gang Wu to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Gang Wu has joined the PMC. Gang has been doing great work on the C++ code base.

Please join me in welcoming Gang to the ORC PMC!

Dongjoon Hyun added as committer

The ORC PMC is happy to add Dongjoon Hyun as an ORC committer for the work on improving ORC’s integration to Spark.

Thank you for your work on ORC, Dongjoon!

ORC 1.5.4 Released

The ORC team is excited to announce the release of ORC v1.5.4.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.3 Released

The ORC team is excited to announce the release of ORC v1.5.3.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.2 Released

The ORC team is excited to announce the release of ORC v1.5.2.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.1 Released

The ORC team is excited to announce the release of ORC v1.5.1.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.5.0 Released

The ORC team is excited to announce the release of ORC v1.5.0.

The new features of ORC 1.5:

  • ORC-179 Add ORC C++ Writer
  • ORC-91 Support for variable length blocks in HDFS.
  • ORC-199 Implement a CSV to ORC converter
  • ORC-344 Support for using Decimal64ColumnVector
  • ORC-345 Adding Decimal64StatisticsImpl
  • ORC-331 Support for building C++ under MSVC.
  • ORC-234 Support for older versions of Hadoop (>= 2.2.x)
  • ORC-305 Added statistics for size on disk

Known issues:

  • ORC-367 Boolean columns are read incorrectly when using seek.

  • ORC-414 ORC files with malformed protobuf objects can crash C++ reader

  • ORC-562 Don’t wrap the readerSchema with ACID fields, if it already is

  • ORC-569 The first index entry may have empty positions

ORC 1.4.4 Released

The ORC team is excited to announce the release of ORC v1.4.4.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

ORC 1.4.3 Released

The ORC team is excited to announce the release of ORC v1.4.3.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

ORC 1.4.2 Released

The ORC team is excited to announce the release of ORC v1.4.2.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.4.1 Released

The ORC team is excited to announce the release of ORC v1.4.1.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.4 Released

The ORC team is excited to announce the release of ORC v1.3.4.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC adds Eugene and Deepak to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Eugene Koifman and Deepak Majeti have joined the PMC. Eugene has been critical working on ACID and Deepak has been doing great work on the C++ code base.

Please join me in welcoming Eugene and Deepak to the ORC PMC!

Deepak Majeti added as committer

The ORC PMC is happy to add Deepak Majeti as an ORC committer for the work on the C++ ORC reader including both contributions and reviews of other’s patches. Thank you for your work on ORC, Deepak!

ORC 1.4.0 Released

The ORC team is excited to announce the release of ORC v1.4.0.

The new features of ORC 1.4:

  • ORC-72 Add benchmark code for file formats.
  • ORC-87 Fix timestamp statistics in C++.
  • ORC-150 Add tool to convert from JSON.
  • ORC-151 Reduce the size of tools.jar.
  • ORC-174 Create a nohive variant of the jars.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.3 Released

The ORC team is excited to announce the release of ORC v1.3.3.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.2 Released

The ORC team is excited to announce the release of ORC v1.3.2.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.1 Released

The ORC team is excited to announce the release of ORC v1.3.1.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.3.0 Released

The ORC team is excited to announce the release of ORC v1.3.0.

The new features of ORC 1.3:

  • ORC-58 Split C++ Reader into Reader and RowReader
  • ORC-120 Add backwards compatibility mode for schema evolution.
  • ORC-124 Fast decimal improvements
  • ORC-128 Add ability to get statistics from writer

Known issues:

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC adds Gopal Vijayaraghavan to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Gopal Vijayaraghavan has joined the PMC. Gopal has done an amazing job at speeding up ORC in many ways.

Please join me in welcoming Gopal to the ORC PMC!

Congratulations Gopal!

ORC adds new committers

As part of the removal of the ORC code base from Hive, the ORC PMC has offered to make any existing Hive committers into ORC committers. The new ORC committers coming from Hive are:

  • Aihua Xu
  • Ashutosh Chauhan
  • Carl Steinbach
  • Chaoyu Tang
  • Chinna Rao Lalam
  • Daniel Dai
  • Eugene Koifman
  • Ferdinand Xu
  • Jason Dere
  • Jesus Camacho Rodriguez
  • Jimmy Xiang
  • Lars Francke
  • Matthew McCline
  • Mithun Radhakrishnan
  • Naveen Gangam
  • Pengcheng Xiong
  • Rajesh Balamohan
  • Rui Li
  • Sergio Pena
  • Siddharth Seth
  • Vaibhav Gumashta
  • Wei Zheng
  • Yongzhi Chen

ORC 1.2.3 Released

The ORC team is excited to announce the release of ORC v1.2.3. This release fixes some bugs in the Java schema evolution code.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.2.2 Released

The ORC team is excited to announce the release of ORC v1.2.2.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.2.1 Released

The ORC team is excited to announce the release of ORC v1.2.1.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.2.0 Released

The ORC team is excited to announce the release of ORC v1.2.0.

The new features of ORC 1.2:

  • ORC-54 Evolve schemas based on field name rather than index
  • ORC-84 Create a separate java tool module.
  • ORC-77 and ORC-81 Implement LZO and LZ4 compression codecs.
  • ORC-92 Add support for nested column id selection in C++
  • ORC-69 Add batch option support in orc-scan tools.

Important fixes:

  • HIVE-14214 ORC schema evolution and predicate push down do not work together.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.1.2 Released

The ORC team is excited to announce the release of ORC v1.1.2. This release contains the Java reader and writer and the native C++ ORC reader and tools.

The major new features in ORC 1.1 are:

  • ORC-1 Copy the Java ORC code from Hive.
  • ORC-10 Fix the C++ reader to correctly read timestamps from timezones with different daylight savings rules.
  • ORC-52 Add mapred and mapreduce connectors.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • HIVE-14214 Schema evolution and predicate pushdown don’t work together.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

File format benchmark

I gave a talk at Hadoop Summit San Jose 2016 about a file format benchmark that I’ve contributed as ORC-72. The benchmark focuses on real data sets that are publicly available. The data sets represent a wide variety of use cases:

  • NYC Taxi Data - very dense data with mostly numeric types
  • Github Archives - very sparse data with a lot of complex structure
  • Sales - a real production schema from a sales table with a synthetic generator

The benchmarks look at a set of three very common use cases:

  • Full table scan - read all columns and rows
  • Column projection - read some columns, but all of the rows
  • Column projection and predicate push down - read some columns and some rows

You can see the slides here:

File Format Benchmarks: Avro, JSON, ORC, & Parquet

ORC 1.1.1 Released

The ORC team is excited to announce the release of ORC v1.1.1. This release contains the Java reader and writer and the native C++ ORC reader and tools.

The major new features in ORC 1.1 are:

  • ORC-1 Copy the Java ORC code from Hive.
  • ORC-10 Fix the C++ reader to correctly read timestamps from timezones with different daylight savings rules.
  • ORC-52 Add mapred and mapreduce connectors.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • HIVE-14214 Schema evolution and predicate pushdown don’t work together.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.1.0 Released

The ORC team is excited to announce the release of ORC v1.1.0. This release contains the Java reader and writer and the native C++ ORC reader and tools.

Release Artifacts:

The major new features in ORC 1.1 are:

  • ORC-1 Copy the Java ORC code from Hive.
  • ORC-10 Fix the C++ reader to correctly read timestamps from timezones with different daylight savings rules.
  • ORC-52 Add mapred and mapreduce connectors.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • HIVE-14214 Schema evolution and predicate pushdown don’t work together.

  • ORC-101 Bloom filters for string and decimal use inconsistent encoding

  • ORC-135 Predicate push down is incorrect on timestamps when moved between timezones

  • ORC-285 Empty vector batches of floats or doubles cause EOFException

ORC 1.0.0 Released

The ORC team is excited to announce the release of ORC v1.0.0. This release contains the native C++ ORC reader and some tools.

The major features:

  • Portable pure C++ ORC reader
  • The C++ reader is known to work on:
    • CentOS and RHEL 5, 6, and 7
    • Debian 6 and 7
    • Ubuntu 12 and 14
    • Mac OS 10.10 and 10.11
  • A file-contents command that prints the contents of the file as json records.
  • A file-metadata command that prints the metadata of the file.
  • Docker files for building and testing on various Linux distributions.
  • Memory estimation for the reader.

Known issues:

  • CVE-2018-8015 ORC files with malformed types cause stack overflow.

  • ORC-10 When moving ORC files between timezones, different daylight savings rules will cause timestamps to shift in the C++ reader.

ORC adds Aliaksei Sandryhaila to PMC

On behalf of the Apache ORC Project Management Committee (PMC), it gives me great pleasure to announce that Aliaksei Sandryhaila has joined the Apache ORC PMC. He has done lot of good work on ORC and I’m looking forward to more.

Please join me in welcoming Aliaksei to ORC PMC!

Congratulations Aliaksei!

ORC adopts new logo

The ORC project has adopted a new logo. We hope you like it.

orc logo

Other great options included a big white hand on a black shield. smile

ORC adds 7 committers

The ORC project management committee today added seven new committers for their work on ORC. Welcome all!

  • Gunther Hagleitner
  • Aliaksei Sandryhaila
  • Sergey Shelukhin
  • Gopal Vijayaraghavan
  • Stephen Walkauskas
  • Kevin Wilfong
  • Xuefu Zhang

ORC becomes an Apache Top Level Project

Today Apache ORC became a top level project at the Apache Software Foundation. This step represents a major step forward for the project, and is representative of its momentum.

Back in January 2013, we created ORC files as part of the initiative to massively speed up Apache Hive and improve the storage efficiency of data stored in Apache Hadoop. We added it as a feature of Hive for two reasons:

  1. To ensure that it would be well integrated with Hive
  2. To ensure that storing data in ORC format would be as simple as stating “stored as ORC” to your table definition.

In the last two years, many of the features that we’ve added to Hive, such as vectorization, ACID, predicate push down and LLAP, support ORC first, and follow up with other storage formats later.

The growing use and acceptance of ORC has encouraged additional Hadoop execution engines, such as Apache Pig, Map-Reduce, Cascading, and Apache Spark to support reading and writing ORC. However, there are concerns that depending on the large Hive jar that contains ORC pulls in a lot of other projects that Hive depends on. To better support these non-Hive users, we decided to split off from Hive and become a separate project. This will not only allow us to support Hive, but also provide a much more streamlined jar, documentation and help for users outside of Hive.

Although Hadoop and its ecosystem are largely written in Java, there are a lot of applications in other languages that would like to natively access ORC files in HDFS. Hortonworks, HP, and Microsoft are developing a pure C++ ORC reader and writer that enables C++ applications to read and write ORC files efficiently without Java. That code will also be moved into Apache ORC and released together with the Java implementation.