-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managed streaming - better estimation for decision for streaming vs queuing #368
base: master
Are you sure you want to change the base?
Conversation
This is a very big PR, can you summarize the changes in more detail? Also, for disableRetries, maybe we want to fold this in to a more robust Policy object like @yogilad suggested and worked on. |
@@ -107,6 +107,7 @@ | |||
<ignoredUsedUndeclaredDependencies> | |||
<dependency>com.microsoft.azure:msal4j:jar</dependency> | |||
<dependency>io.projectreactor:reactor-core:jar</dependency> | |||
<dependency>com.fasterxml.jackson.core:jackson-core:jar</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? don't we have problems with jackson in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just means we get it from Data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sure we removed it as a dep because of blacklisting, maybe just the core part is ok? this is confusing me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure we can have anything good here
but anyway Spark today releases a shaded Jar - i hope this will obsolete those problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was with one of our connectors, which blacklisted jackson. But maybe it was only jackson json and not jackson core. We'll see I guess
...src/main/java/com/microsoft/azure/kusto/data/auth/endpoints/WellKnownKustoEndpointsData.java
Outdated
Show resolved
Hide resolved
data/src/main/java/com/microsoft/azure/kusto/data/exceptions/ExceptionsUtils.java
Show resolved
Hide resolved
data/src/main/java/com/microsoft/azure/kusto/data/http/HttpClientFactory.java
Outdated
Show resolved
Hide resolved
|
||
if (properties.isDisableRetries()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the http level, which I think is completely separate from what we do in streaming ingest.
Did you check what it actually does and when it runs? I think it does save the request, which will work for the "normal" streaming client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well unless i missed it - it didnt - and thats why i use it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i should try and make it throw again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you?
ingest/src/main/java/com/microsoft/azure/kusto/ingest/ManagedStreamingIngestClient.java
Outdated
Show resolved
Hide resolved
ingest/src/main/java/com/microsoft/azure/kusto/ingest/ManagedStreamingIngestClient.java
Outdated
Show resolved
Hide resolved
ingest/src/main/java/com/microsoft/azure/kusto/ingest/ManagedStreamingIngestClient.java
Outdated
Show resolved
Hide resolved
ingest/src/main/java/com/microsoft/azure/kusto/ingest/ManagedStreamingIngestClient.java
Show resolved
Hide resolved
ingest/src/main/java/com/microsoft/azure/kusto/ingest/ManagedStreamingQueuingPolicy.java
Show resolved
Hide resolved
@@ -107,6 +107,7 @@ | |||
<ignoredUsedUndeclaredDependencies> | |||
<dependency>com.microsoft.azure:msal4j:jar</dependency> | |||
<dependency>io.projectreactor:reactor-core:jar</dependency> | |||
<dependency>com.fasterxml.jackson.core:jackson-core:jar</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sure we removed it as a dep because of blacklisting, maybe just the core part is ok? this is confusing me
data/src/main/java/com/microsoft/azure/kusto/data/exceptions/ExceptionsUtils.java
Show resolved
Hide resolved
ingest/src/main/java/com/microsoft/azure/kusto/ingest/source/AbstractSourceInfo.java
Show resolved
Hide resolved
this.rawSizeInBytes = rawSizeInBytes; | ||
} | ||
|
||
// An estimation of the raw (uncompressed, un-indexed) size of the data, for binary formatted files - use only if known |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about v2 or not v2, it's about code organization - Both StreamSourceInfo and FileSourceInfo support Compression, but in Stream it's a part of the class and for file it's calculated outside of the class.
I think it should be part of the abstract class, and FileSourceInfo can define it to be the extension or whatever.
try { | ||
return ingestFromStreamImpl(streamSourceInfo, | ||
ingestionProperties); | ||
} catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why do you need it in the signature? Shouldn't it always be wrapped and then it's also not a breaking change
ingest/src/main/java/com/microsoft/azure/kusto/ingest/ManagedStreamingIngestClient.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically done, just go over the rest of the comments. Most of them are nothing
@@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file. | |||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). | |||
|
|||
## Unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Unreleased]
(look at the url for example)
@@ -107,6 +107,7 @@ | |||
<ignoredUsedUndeclaredDependencies> | |||
<dependency>com.microsoft.azure:msal4j:jar</dependency> | |||
<dependency>io.projectreactor:reactor-core:jar</dependency> | |||
<dependency>com.fasterxml.jackson.core:jackson-core:jar</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was with one of our connectors, which blacklisted jackson. But maybe it was only jackson json and not jackson core. We'll see I guess
|
||
if (properties.isDisableRetries()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you?
@@ -73,7 +75,9 @@ public static StreamingClient createStreamingClient(ConnectionStringBuilder csb) | |||
* @throws URISyntaxException if the cluster URL is invalid | |||
*/ | |||
public static StreamingClient createStreamingClient(ConnectionStringBuilder csb, HttpClientProperties properties) throws URISyntaxException { | |||
return new ClientImpl(csb, properties); | |||
HttpClientProperties httpClientProperties = Optional.ofNullable(properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That answers 1 (though I don't like it both places, but whatever)
But unlike the one from HttpClientFactory, you do disableRetries here, which I think should be commented
ingest/src/main/java/com/microsoft/azure/kusto/ingest/source/AbstractSourceInfo.java
Show resolved
Hide resolved
try { | ||
return ingestFromStreamImpl(streamSourceInfo, | ||
ingestionProperties); | ||
} catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember now - I think what I meant is that you can catch and throw our exception inside the impl.
EIther way it probably doesn't matter if the impl isn't public
public static final String CLASS_NAME = ManagedStreamingIngestClient.class.getSimpleName(); | ||
final QueuedIngestClientImpl queuedIngestClient; | ||
final StreamingIngestClient streamingIngestClient; | ||
private final ExponentialRetry exponentialRetryTemplate; | ||
private CloseableHttpClient httpClient = null; | ||
private ManagedStreamingQueuingPolicy queuingPolicy = ManagedStreamingQueuingPolicy.Default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it's unlikely to grow if we're going to replace most of it for v2, but sure
* If the size of the stream is bigger than {@value MAX_STREAMING_SIZE_BYTES}, it will fall back to the queued streaming client. | ||
* By default the policy for choosing a queued ingestion on the first try is the checking of weather the size of the estimated | ||
* raw stream size (a conversion to compressed CSV) is bigger than 4MB, it will fall back to the queued streaming client. | ||
* Use SourceInfo.size to override size estimations, alternatively - use setQueuingPolicy to override the predicate heuristics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the name is still wrong (it's SourceInfo.ge/settRawSizeInBytes)
- I think just putting it right next to the previous line with this wording is confusing, since it makes it seem that they are related.
Added
an estimation against the 4mb limit after dividing or multiplying by some factor described by the consts:
// Regardless of the format, we don't want to stream more than 10mb
int MAX_STREAMING_STREAM_SIZE_BYTES = 10 * 1024 * 1024;
// Used against the users input of raw data size
int MAX_STREAMING_RAW_SIZE_BYTES = 6 * 1024 * 1024;
double JSON_UNCOMPRESSED_FACTOR = 1.5d;
int NON_BINARY_FACTOR = 2;
double BINARY_COMPRESSED_FACTOR = 2d;
double BINARY_UNCOMPRESSED_FACTOR = 1.5d;
This will also allow users to stream bigger than 4mb non-compressed data